Some thoughts towards mbed TLS 3.0 - mbed-tls

15 Apr 2020


      Hi all,
this will be a long mail. Sorry for that.
In the past weeks I've been using mbedTLS 2.16.5 for implementing
crypto on an ARM Cortex M4 (STM32F479). This was my first experience
with mbedTLS, but I have some (almost 20 years) experience with
applied and high-assurance crypto. So maybe the following thoughts fit
into the discussion of plans for version 3.0 of Mbed TLS.
In the end, I achieved everything that was required for my project with
mbedTLS, but some things surprised me or it took a while to find out.
I'll enumerate the following points for easier reference. Nothing of
the following is meant to embarrass anyone, just my personal thoughts.
1. I really missed an Initialize, Update, Finalize (IUF) interface for
   CCM.
For GCM, we have mbedtls_gcm_init(), mbedtls_gcm_setkey(),
   mbedtls_gcm_starts(), mbedtls_gcm_update() iterated,
   mbedtls_gcm_finish(), mbedtls_gcm_free() or the comfort functions
   mbedtls_gcm_crypt_and_tag() and mbedtls_gcm_auth_decrypt(). For
   CCM, only mbedtls_ccm_init(), mbedtls_ccm_setkey(),
   mbedtls_ccm_encrypt_and_tag() or mbedtls_ccm_auth_decrypt() and
   mbedtls_ccm_free(). With this interface it was only possible to
   encrypt and tag 128 kByte on my target system, while with GCM I
   could encrypt much larger files.
see Github issue #662 and my comment there
2. The next step, of course, is to integrate this into the higher
   mbedtls_cipher layer.
Regarding higher, abstract layers: I often didn't understand which
   interface I was supposed to use. In general, I like to use the
   lowest available interface, for example, #include
   "mbedtls/sha512.h" when I want to use sha512. However, if I need
   HMAC-SHA-512 or HKDF-HMAC-SHA-512 then I have to use the interface
   in md.h. For hash functions this is fine. Almost all hash functions
   are supported via md.h. (I missed SHA-512/256 which is sometimes
   preferable to SHA-256 on 64bit systems).
But with cipher.h, I can only access Chacha20Poly1305 and AES-GCM,
   not AES-CCM.
3. For certification and evaluation purposes I need some test vectors
   for each crypto function on target. While I know about the
   comprehensive self-test program I'm now talking about built-in
   functions like mbedtls_sha512_self_test(), etc to be enabled with
   #define MBEDTLS_SELF_TEST.
These self-tests are very different in coverage. For SHA-384 and
   SHA-512 they are fine, for HMAC-SHA-384 and HMAC-SHA-512 I couldn't
   find any as well as for HKDF-HMAC-SHA-256 (in RFC 5869) or
   HKDF-HMAC-SHA-384/512 (official test vectors difficult to find).
   AES-CTR and AES-XTS are only tested with key length 128 bit, not with
   256 bit. AES-CCM is not tested with 256 bit and even for 128 bit,
   the test vector from the standard NIST SP 800-38C with long
   additional data is not used.
   The builtin self-test for GCM is the best I've seen with mbedtls:
   all three key lengths are tested as well as the IUF-interface and
   the comfort function. Bravo!
4. That I couldn't configure AES-256 only, i.e. without AES-128 and
   AES-192, was to be expected (and the code overhead is not that
   much). But in modern modes of operations nobody needs AES
   decryption, only the forward direction. Sometimes modern
   publications as Schwabe/Stoffelen "All the AES you need on
   Cortex-M3 and M4" provide only the forward direction.
So, it would be fine if one could configure an AES (ECB) encryption
   only without decryption.
Of course, this is only possible if we don't use CBC mode, etc.
   This wouldn't only save the AES decryption code but also the rather
   large T-tables for decryption.
5. Regarding AES or better the AES context-type definition
typedef struct mbedtls_aes_context
{
    int nr;                     /*!< The number of rounds. */
    uint32_t *rk;               /*!< AES round keys. */
    uint32_t buf[68];           /*!< Unaligned data buffer. This buffer can
                                     hold 32 extra Bytes, which can be
used for
                                     one of the following purposes:
                                     <ul><li>Alignment if VIA padlock is
                                             used.</li>
                                     <li>Simplifying key expansion in
the 256-bit
                                         case by generating an extra
round key.
                                         </li></ul> */
}
mbedtls_aes_context;
I really don't understand why we need additional 2176 bit in EVERY
   AES context. I would understand 128 bit (one block size) or even 512
   bit (for example for CTR optimization which is not used!). But 2176
   bit in every AES context? The VIA padlock is not very common, I
   suppose. But even if it were, this doesn't justify such memory
   overhead.
How wasteful this is, one can see in the next type definition
/**
 * \brief The AES XTS context-type definition.
 */
typedef struct mbedtls_aes_xts_context
{
    mbedtls_aes_context crypt; /*!< The AES context to use for AES block
                                        encryption or decryption. */
    mbedtls_aes_context tweak; /*!< The AES context used for tweak
                                        computation. */
} mbedtls_aes_xts_context;
The tweak context is for the encryption of exactly 128 bit, not
  more.
6. In general, the contexts of mbedTLS are rather full of
   implementation specific details. Most extreme is mbedtls_ecp_group
   in ecp.h. Wouldn't it be clearer if one separates the standard
   things (domain parameters in this case) from implementation
   specific details?
7. While at Elliptic Curve Cryptography: I assume that some of you
   know that projectives coordinates as outer interface to ECC are
   dangerous, see David Naccache, Nigel P. Smart, Jacques Stern:
   Projective Coordinates Leak, Eurocrypt 2004, pp. 257–267.
   Therefore, the usual interface in ECC standards are either affine
   points or compressed affine points (Okay, with the modern curves
   Curve25519 and Curve 448 it's X only.).
Now with
/**
 * \brief           The ECP point structure, in Jacobian coordinates.
 *
 * \note            All functions expect and return points satisfying
 *                  the following condition: <code>Z == 0</code> or
 *                  <code>Z == 1</code>. Other values of \p Z are
 *                  used only by internal functions.
 *                  The point is zero, or "at infinity", if <code>Z ==
0</code>.
 *                  Otherwise, \p X and \p Y are its standard (affine)
 *                  coordinates.
 */
typedef struct mbedtls_ecp_point
{
    mbedtls_mpi X;          /*!< The X coordinate of the ECP point. */
    mbedtls_mpi Y;          /*!< The Y coordinate of the ECP point. */
    mbedtls_mpi Z;          /*!< The Z coordinate of the ECP point. */
}
mbedtls_ecp_point;
you have Jacobian coordinates, i.e. projective coordinates, as outer
  interface. In the comment, its is noted that only the affine part is
  used, but can this be assured? In all circumstances?
8. In my personal opinion the definition
/**
 * \brief    The ECP key-pair structure.
 *
 * A generic key-pair that may be used for ECDSA and fixed ECDH, for
example.
 *
 * \note    Members are deliberately in the same order as in the
 *          ::mbedtls_ecdsa_context structure.
 */
typedef struct mbedtls_ecp_keypair
{
    mbedtls_ecp_group grp;      /*!<  Elliptic curve and base point     */
    mbedtls_mpi d;              /*!<  our secret value                  */
    mbedtls_ecp_point Q;        /*!<  our public value                  */
}
mbedtls_ecp_keypair;
is dangerous. Why not differentiate between private and public key
   and domain parameters? How often does it happen by accident with
   this structure that you give the private key (unneeded and
   dangerous) together with the public key to ECDSA signature
   verification? Obviously this was known (and perhaps it happened) to
   the authors of programs\ecdsa.c with the following comment
/*
 * Transfer public information to verifying context
 *
 * We could use the same context for verification and signatures, but we
 * chose to use a new one in order to make it clear that the verifying
 * context only needs the public key (Q), and not the private key (d).
 */
What is sometimes useful, is to have the public key at hand when you
   have performed a private key operation (as countermeasure against
   fault attacks, verify after signing). But for ECC the verification
   procedure if often too expensive (in contrast to cheap RSA verify).
9. Regarding ECC examples: I found it very difficult that there isn't
   a single example with known test vectors as in the relevant crypto
   standards, i.e. FIPS 186-4 and ANSI X9.62-2005, with raw public
   keys. What I mean are (defined) curves, public key value Q=(Qx,Qy)
   and known signature values r and s. In the example ecdsa.c you
   generate your own key pair and read/write the signature in
   serialized form. In the example programs/pkey/pk_sign.c and
   pk_verify.c you use a higher interface pk.h and keys in PEM format.
So, it took me a while for a program to verify (all) known answer
   tests in the standards (old standards as ANSI X9.62 1998 have more
   detailed known answer tests). One needs this interface with raw
   public keys for example for CAVP tests, see The FIPS 186-4 Elliptic
   Curve Digital Signature Algorithm Validation System (ECDSA2VS).
10. While debugging mbedtls_ecdsa_verify() in my example program, I
    found out, that the ECDSA, ECC and MPI operations are very, let's
    say, nested. So, IMHO there is a lot of function call overhead and
    special cases. It would be interesting to see what's the
    performance impact of a clean, straight-forward
    mbedtls_ecdsa_verify without restartable code, etc. to the current
    one.
11. In the moment, there is no single known answer tests for ECDSA
    (which could be activated with #define MBEDTLS_SELF_TEST). I
    wouldn't say that you need an example for every curve and hash
    combination, as it is done in ECDSA2VS CAVP, but one example for
    one of the NIST curves and one for Curve25519 and - if I have a
    wish free - one for Brainpool would be fine. And this would solve
    #9 above.
12. Just a minor issue: I only needed ECDSA signature verification,
    therefore I only included MBEDTLS_ASN1_PARSE_C. But it is not
    possible to compile without MBEDTLS_ASN1_WRITE_C needed for ECDSA
    signature generation.
13. Feature request: Since it was irrelevant for my task (only
    verification, no generation) I didn't have a detailed look a your
    ECC side-channel countermeasures. But obviously you use the same
    protected code for scalar multiplication in verify and sign,
    right? Wouldn't it be possible to use Shamir's trick in
    verification with fast unprotected multi-scalar multiplication. In
    the moment, mbedtls_ecdsa_verify is a factor 4-5 slower than
    mbedtls_ecdsa_sign, while OpenSSLs verify is faster than sign.
14. Design question: In the moment, both GCM and CCM use their own
    implementation of CTR encryption which is very simple. But then we
    have mbedtls_aes_crypt_ctr() in aes.h which is very simple, too.
    Let's assume at one day we have a performance optimized CTR
    encryption (for example from Schwabe & Stoffelen) with all fancy
    stuff like counter-mode caching etc. Then this would have to be
    replaced at three places at minimum.  While isn't the code at this
    point more modularized? Is this a dedicated design decision?
    Why do I find at so many places
for( i = 0; i < 16; i++ )                                              
        y[i] ^= b[i];
instead of a fast 128-bit XOR macro with 32bit aligned data?
So, that's it for the moment. I hope I could give some hints for the
further development of mbedTLS. Feel free to discuss any of the above
points. It's clear to me that we cannot have both: clear and simple to
understand code and performance records.
Ciao,
Torsten