Wow! First, thanks for this extensive feedback! That's very helpful, and appreciated.

I'm going to reply to a few points. On a general note, you can see headlines for the main topics we're currently planning to work on in the Mbed TLS roadmap at https://developer.trustedfirmware.org/w/mbed-tls/roadmap/ . Please note that everything I write is based on current plans, which may change through the TrustedFirmware planning process or if real life shows that a current plan is not doable.

Over time, we're transitioning the API for the crypto part of the library from the current mbedtls_xxx functions to psa_xxx, which have a somewhat different philosophy: less exposure of internals, more protection against misuse, no reliance on malloc. Mbed TLS 3.0 will start the transition.

On 14/04/2020 21:10, Torsten Schuetze via mbed-tls wrote
1. I really missed an Initialize, Update, Finalize (IUF) interface for
   CCM.

   For GCM, we have mbedtls_gcm_init(), mbedtls_gcm_setkey(),
   mbedtls_gcm_starts(), mbedtls_gcm_update() iterated,
   mbedtls_gcm_finish(), mbedtls_gcm_free() or the comfort functions
   mbedtls_gcm_crypt_and_tag() and mbedtls_gcm_auth_decrypt(). For
   CCM, only mbedtls_ccm_init(), mbedtls_ccm_setkey(),
   mbedtls_ccm_encrypt_and_tag() or mbedtls_ccm_auth_decrypt() and
   mbedtls_ccm_free(). With this interface it was only possible to
   encrypt and tag 128 kByte on my target system, while with GCM I
   could encrypt much larger files.

   see Github issue #662 and my comment there

In Mbed TLS 3.0, mbedtls_ccm_xxx() will not be a public interface anymore. We are planning to add support for multipart CCM in the psa_aead_xxx() interface (the prototypes are already in psa/crypto.h but their implementation is planned for some time in the next few months). We are not currently planning to add support for multipart CCM through mbedtls_cipher_xxx(), which in Mbed TLS 3 will be legacy functions. However, we would probably accept such support if it was contributed externally.
2. The next step, of course, is to integrate this into the higher
   mbedtls_cipher layer.

   Regarding higher, abstract layers: I often didn't understand which
   interface I was supposed to use. In general, I like to use the
   lowest available interface, for example, #include
   "mbedtls/sha512.h" when I want to use sha512. However, if I need
   HMAC-SHA-512 or HKDF-HMAC-SHA-512 then I have to use the interface
   in md.h. For hash functions this is fine. Almost all hash functions
   are supported via md.h. (I missed SHA-512/256 which is sometimes
   preferable to SHA-256 on 64bit systems).

   But with cipher.h, I can only access Chacha20Poly1305 and AES-GCM,
   not AES-CCM.

In Mbed TLS 3, there will generally be a single public layer. Exposing lower layers helps with code size on resource-constrained devices, but it also has downsides, including locking down the APIs.

4. That I couldn't configure AES-256 only, i.e. without AES-128 and
   AES-192, was to be expected (and the code overhead is not that
   much). But in modern modes of operations nobody needs AES
   decryption, only the forward direction. Sometimes modern
   publications as Schwabe/Stoffelen "All the AES you need on
   Cortex-M3 and M4" provide only the forward direction.

   So, it would be fine if one could configure an AES (ECB) encryption
   only without decryption.

   Of course, this is only possible if we don't use CBC mode, etc.
   This wouldn't only save the AES decryption code but also the rather
   large T-tables for decryption.

For information, there is a branch of Mbed TLS called "baremetal", forked from Mbed TLS 2.16, which you can find on GitHub: https://github.com/ARMmbed/mbedtls/blob/baremetal . This branch is optimized for small code size, sometimes at the expense of speed and often at the expense of features. It has build options MBEDTLS_AES_ONLY_ENCRYPT and MBEDTLS_AES_ONLY_128_BIT_KEY_LENGTH. However I would not recommend using it in production because Arm (who still maintain this branch even after Mbed TLS itself has moved to TrustedFirmware) does not make any promise of stability. A feature that you rely on may be removed without notice.

I mention this branch because eventually, we do plan to port the improvements that don't sacrifice features to the Mbed TLS development branch. I can't give a timeline for this however.

With the current Mbed TLS, if you don't use CBC, I think you can save some code in aes.o by defining MBEDTLS_AES_DECRYPT_ALT and MBEDTLS_AES_SETKEY_DEC_ALT and providing functions mbedtls_aes_setkey_dec() and mbedtls_internal_aes_decrypt() that do nothing.

5. Regarding AES or better the AES context-type definition

[snip]

6. In general, the contexts of mbedTLS are rather full of
   implementation specific details. Most extreme is mbedtls_ecp_group
   in ecp.h. Wouldn't it be clearer if one separates the standard
   things (domain parameters in this case) from implementation
   specific details?

As a general design principle, context types in Mbed TLS 3 will be opaque. This will let us, for example, redesign mbedtls_aes_context and mbedtls_cipher_context.

9. Regarding ECC examples: I found it very difficult that there isn't
   a single example with known test vectors as in the relevant crypto
   standards, i.e. FIPS 186-4 and ANSI X9.62-2005, with raw public
   keys. What I mean are (defined) curves, public key value Q=(Qx,Qy)
   and known signature values r and s. In the example ecdsa.c you
   generate your own key pair and read/write the signature in
   serialized form. In the example programs/pkey/pk_sign.c and
   pk_verify.c you use a higher interface pk.h and keys in PEM format.

   So, it took me a while for a program to verify (all) known answer
   tests in the standards (old standards as ANSI X9.62 1998 have more
   detailed known answer tests). One needs this interface with raw
   public keys for example for CAVP tests, see The FIPS 186-4 Elliptic
   Curve Digital Signature Algorithm Validation System (ECDSA2VS).
11. In the moment, there is no single known answer tests for ECDSA
    (which could be activated with #define MBEDTLS_SELF_TEST). I
    wouldn't say that you need an example for every curve and hash
    combination, as it is done in ECDSA2VS CAVP, but one example for
    one of the NIST curves and one for Curve25519 and - if I have a
    wish free - one for Brainpool would be fine. And this would solve
    #9 above.



I don't get the point here. ECDSA is randomized, so you can't have a known answer test. The test suite does have known answer tests for deterministic ECDSA.



10. While debugging mbedtls_ecdsa_verify() in my example program, I
    found out, that the ECDSA, ECC and MPI operations are very, let's
    say, nested. So, IMHO there is a lot of function call overhead and
    special cases. It would be interesting to see what's the
    performance impact of a clean, straight-forward
    mbedtls_ecdsa_verify without restartable code, etc. to the current
    one.

As far as I remember, the refactoring done to add the restartable code had no measurable impact on performance. What does have a significant impact on performance is that the bignum module uses malloc all the time. We would like to completely rewrite bignum operations at some point during the 3.x series, not only for performance but also because its design makes it hard not to leak information through side channels.

12. Just a minor issue: I only needed ECDSA signature verification,
    therefore I only included MBEDTLS_ASN1_PARSE_C. But it is not
    possible to compile without MBEDTLS_ASN1_WRITE_C needed for ECDSA
    signature generation.

I'm not sure if I've written it down anywhere, but I'd like to remove the dependency of ECDSA on ASN1 altogether. Parsing and writing a SEQUENCE of two INTEGERs can be done with ad hoc code. Likewise for what little ASN1 the RSA module uses. And then asn1*.o can move out of libmbedcrypto and libpsacrypto, and into libmbedx509 where it belongs.

Having only signature verification would be useful, indeed. That may happen with the bignum rewrite I mentioned above, if signature verification ends up using some faster non-constant-time code (this is also relevant for #13).

14. Design question: In the moment, both GCM and CCM use their own
    implementation of CTR encryption which is very simple. But then we
    have mbedtls_aes_crypt_ctr() in aes.h which is very simple, too.
    Let's assume at one day we have a performance optimized CTR
    encryption (for example from Schwabe & Stoffelen) with all fancy
    stuff like counter-mode caching etc. Then this would have to be
    replaced at three places at minimum.  While isn't the code at this
    point more modularized? Is this a dedicated design decision?

Having a single implementation of CTR is on the PSA roadmap because if there's a hardware accelerator that does it, we want to use it everywhere it's relevant.

In Mbed TLS (or more precisely in its ancestor PolarSSL, if not _its_ ancestor XySSL), there was a conscious design decision to make each .c file as independent from the others as possible, which explains why camellia_ctr is completely independent from aes_ctr. But I don't know why ccm and gcm reimplement ctr.

    Why do I find at so many places

    for( i = 0; i < 16; i++ )                                              
        y[i] ^= b[i];

    instead of a fast 128-bit XOR macro with 32bit aligned data?

As a programmer who doesn't write compilers, I think this is the right way to xor 16 bytes, and it's the compiler's job to optimize it to word or vector operations if possible. Admittedly this does mean the compiler has to know that the data is well-aligned, which can be hard to guarantee and easy to forget.


So, that's it for the moment. I hope I could give some hints for the
further development of mbedTLS. Feel free to discuss any of the above
points. It's clear to me that we cannot have both: clear and simple to
understand code and performance records.


Right. Also maintainable code and minimal code size, because minimal code size comes from letting the application developer #ifdef out everything that they don't care about, but this is a nightmare to test. It's one of the topics we're thinking about for Mbed TLS 3 and beyond.

In general, Mbed TLS is primarily targeted at embedded systems, and is likely to privilege 1. security (including side channel resistance) and 2. code size. This doesn't mean that we don't care about performance, just that it isn't our top priority. That being said, we also do have some code that's optimized for performance (without compromising security) and not code size: the library already includes X25519 from Project Everest (https://project-everest.github.io/) (turn it on with MBEDTLS_ECDH_VARIANT_EVEREST_ENABLED). This is code that's formally proven not only for functional correctness, but also for side channel resistance; the implementation has aggressive inlining which makes it very fast, but obviously also large in terms of code size. Hopefully other algorithms will follow soon.

Ciao,

Torsten

Once again, thanks for the detailed feedback, and I hope we can improve Mbed TLS for everyone!


--
Gilles Peskine
Mbed TLS developer


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.