On 15/04/2020 22:00, Gilles Peskine wrote:
Wow! First, thanks for this extensive feedback! That's very helpful, and appreciated. ...
Thank you for your reply.
Over time, we're transitioning the API for the crypto part of the library from the current mbedtls_xxx functions to psa_xxx, which have a somewhat different philosophy: less exposure of internals, more protection against misuse, no reliance on malloc. Mbed TLS 3.0 will start the transition.
On 14/04/2020 21:10, Torsten Schuetze via mbed-tls wrote
- I really missed an Initialize, Update, Finalize (IUF) interface for
CCM.
see Github issue #662 and my comment there
In Mbed TLS 3.0, mbedtls_ccm_xxx() will not be a public interface anymore. We are planning to add support for multipart CCM in the psa_aead_xxx() interface (the prototypes are already in psa/crypto.h but their implementation is planned for some time in the next few months).
Ah, okay, I saw the interface but missed the implementation. This explains all.
We are not currently planning to add support for multipart CCM through mbedtls_cipher_xxx(), which in Mbed TLS 3 will be legacy functions. However, we would probably accept such support if it was contributed externally.
Don't know if Corona takes soo long that I'll have time for that ;-(. We'll see.
- The next step, of course, is to integrate this into the higher
mbedtls_cipher layer.
Regarding higher, abstract layers: I often didn't understand which interface I was supposed to use.
In Mbed TLS 3, there will generally be a single public layer. Exposing lower layers helps with code size on resource-constrained devices, but it also has downsides, including locking down the APIs.
I don't mind if there are two accessible layers. In many crypto libs you have an expert layer and a simple layer. Hope this public layer doesn't get too abstract.
- That I couldn't configure AES-256 only, i.e. without AES-128 and
AES-192, was to be expected (and the code overhead is not that much).
For information, there is a branch of Mbed TLS called "baremetal", forked from Mbed TLS 2.16, which you can find on GitHub: https://github.com/ARMmbed/mbedtls/blob/baremetal . This branch is optimized for small code size, sometimes at the expense of speed and often at the expense of features. It hasbuild options MBEDTLS_AES_ONLY_ENCRYPT and MBEDTLS_AES_ONLY_128_BIT_KEY_LENGTH. However I would not recommend using it in production because Arm (who still maintain this branch even after Mbed TLS itself has moved to TrustedFirmware) does not make any promise of stability. A feature that you rely on may be removed without notice.
Okay, thanks for this information. In the moment, I can live with the current situation. With a smaller Cortex M3 I would probably switch.
I mention this branch because eventually, we do plan to port the improvements that don't sacrifice features to the Mbed TLS development branch. I can't give a timeline for this however.
With the current Mbed TLS, if you don't use CBC, I think you can save some code in aes.o by defining MBEDTLS_AES_DECRYPT_ALT and MBEDTLS_AES_SETKEY_DEC_ALT and providing functions mbedtls_aes_setkey_dec() and mbedtls_internal_aes_decrypt() that do nothing.
This is a clever trick. I'll remember that for the next application.
- Regarding AES or better the AES context-type definition
[snip]
- In general, the contexts of mbedTLS are rather full of
implementation specific details. Most extreme is mbedtls_ecp_group in ecp.h. Wouldn't it be clearer if one separates the standard things (domain parameters in this case) from implementation specific details?
As a general design principle, context types in Mbed TLS 3 will be opaque. This will let us, for example, redesign mbedtls_aes_context and mbedtls_cipher_context.
Hmm, this just means that you access it with dedicated functions only and that the internals can change at any time. Empirical knowledge shows that some people will nevertheless use the internal structure (I remember TLS session tickets which shall be opaque, too. But many implementations rely on that internal structure)
- Regarding ECC examples: I found it very difficult that there isn't
a single example with known test vectors as in the relevant crypto standards, i.e. FIPS 186-4 and ANSI X9.62-2005, with raw public keys. What I mean are (defined) curves, public key value Q=(Qx,Qy) and known signature values r and s. In the example ecdsa.c you generate your own key pair and read/write the signature in serialized form. In the example programs/pkey/pk_sign.c and pk_verify.c you use a higher interface pk.h and keys in PEM format.
So, it took me a while for a program to verify (all) known answer tests in the standards (old standards as ANSI X9.62 1998 have more detailed known answer tests). One needs this interface with raw public keys for example for CAVP tests, see The FIPS 186-4 Elliptic Curve Digital Signature Algorithm Validation System (ECDSA2VS). 11. In the moment, there is no single known answer tests for ECDSA (which could be activated with #define MBEDTLS_SELF_TEST). I wouldn't say that you need an example for every curve and hash combination, as it is done in ECDSA2VS CAVP, but one example for one of the NIST curves and one for Curve25519 and - if I have a wish free - one for Brainpool would be fine. And this would solve #9 above.
I don't get the point here. ECDSA is randomized, so you can't have a known answer test. The test suite does have known answer tests for deterministic ECDSA.
This remark is also useful for your thread Running Mbed TLS unit tests on embedded targets
Okay, my fault. I didn't explain it correctly.
In this context, I was only talking about ECDSA signature verification. And this is, of course, deterministic. Specifically, I wanted to use the CAVP test from SigVer.rsp in 186-3ecdsatestvectors.zip. My strategy with NIST CAVP tests is, that I take the simple response files (text) and built just with search and replace a structure that can be compiled within a program. So, no file system access, no perl or python (at least not at the target system). In the ECDSA case, this looks like
struct { const char *msg; /* message as hex string */ const char *Qx; /* public key, x-coordinate as hex string */ const char *Qy; /* public key, y-coordinate as hex string */ const char *r; /* signature r as hex string */ const char *s; /* signature s as hex string */ int Result; /* 1 on expected fail or 0 on expected success */ } ecdsa_verify_test[1] = { {"9dd789ea25c04745d57a381f22de01fb0abd3c72dbdefd44e43213c189583eef85ba662" "044da3de2dd8670e6325154480155bbeebb702c75781ac32e13941860cb576fe37a05b7" "57da5b5b418f6dd7c30b042e40f4395a342ae4dce05634c33625e2bc524345481f7e253" "d9551266823771b251705b4a85166022a37ac28f1bd", "cb908b1fd516a57b8ee1e14383579b33cb154fece20c5035e2b3765195d1951d75bd78f" "b23e00fef37d7d064fd9af144", "cd99c46b5857401ddcff2cf7cf822121faf1cbad9a011bed8c551f6f59b2c360f79bfbe" "32adbcaa09583bdfdf7c374bb", "33f64fb65cd6a8918523f23aea0bbcf56bba1daca7aff817c8791dc92428d605ac629de" "2e847d43cee55ba9e4a0e83ba", "4428bb478a43ac73ecd6de51ddf7c28ff3c2441625a081714337dd44fea8011bae71959" "a10947b6ea33f77e128d3c6ae", 0}, }; /* example 2 [P-384,SHA-384] from SigVer.rsp 186-3ecdsatestvectors.zip */
And for this, I couldn't find any example with mbedTLS. It isn't that difficult, but as always with ECC you have several interfaces to consider.
mbedtls_mpi_init(&r); mbedtls_mpi_init(&s); mbedtls_ecp_point_init(&Q); mbedtls_ecdsa_init(&ctx); mbedtls_ecp_group_init(&ctx.grp);
ret = mbedtls_ecp_group_load(&ctx.grp, ECPARAMS); /* load domain parameters for named curve */
if (verbosity != 0) mbedtls_printf( "-------------------- ECDSA verify testing -------------------\r\n");
/* hash message */ (void)hexstring2bytestring((unsigned char *)ecdsa_verify_test[0].msg, (unsigned int)strlen(ecdsa_verify_test[0].msg), byte_string); if ((ret = mbedtls_sha512_ret((unsigned char *)byte_string, strlen(ecdsa_verify_test[0].msg) / 2, sha512sum, is384)) != 0) ref_test_error = 1;
memcpy(sha384sum, sha512sum, 48); /* sha384sum contains SHA-384(msg) */ (void)string2hexstring(sha384sum_hex, sha512sum, 48); /* load public key */ ret = mbedtls_ecp_point_read_string(&Q, 16, ecdsa_verify_test[0].Qx, ecdsa_verify_test[0].Qy); ret = mbedtls_ecp_check_pubkey(&ctx.grp, &Q); /* check public key */ ret = mbedtls_mpi_read_string(&r, 16, ecdsa_verify_test[0].r); ret = mbedtls_mpi_read_string(&s, 16, ecdsa_verify_test[0].s); /* set signature value */ ret = mbedtls_ecdsa_verify(&ctx.grp, sha384sum, sizeof(sha384sum), &Q, &r, &s); /* verify signature */
Most crypto standards have KATs in this raw form. With the structure above (it even works for very long strings, at least with gcc), you just copy and paste. And the code compiles. The mbedTLS unit test examples in tests/suites/test_suite_ecdsa.data or test_suite_ecdsa.function are similar, but not self explaining and (data) cannot be compiled on their own.
In this way, I implemented all relevant CAVP tests for me. And they fit onto the target (Cortex M4 with 384 kB RAM). On a smaller M3 smart card processor, you often have problems with code size and/or RAM.
To be honest, I didn't look into the mbedTLS unit tests very deeply. I just implemented the necessary CAVP for me.
Regarding (deterministic) ECDSA and examples. Of course, classical ECDSA is randomized. Often, there is at least one example in the standards with fixed ephemeral key k and all intermediate values. This would be a KAT for me. I know that it is dangerous to fix an ephemeral key (Sony play station desaster), but for testing it is sometimes necessary.
Obviously, you or mbedTLS prefers deterministic ECDSA over classical ECDSA. But with deterministic ECDSA you have other problems as fault attacks. We had a discussion about this at the NIST ECC Workshop in 2015 or 16, I think. Shortly thereafter, there were the first attacks against EdDSA etc. So, if you can't handle random numbers/ephemeral keys you shouldn't do crypto. All countermeasures against full power analysis nowadays use random masking.
(I confess, I'm a little bit biased. I do RNG evaluation most of the time)
- While debugging mbedtls_ecdsa_verify() in my example program, I
found out, that the ECDSA, ECC and MPI operations are very, let's say, nested. So, IMHO there is a lot of function call overhead and special cases. It would be interesting to see what's the performance impact of a clean, straight-forward mbedtls_ecdsa_verify without restartable code, etc. to the current one.
As far as I remember, the refactoring done to add the restartable code had no measurable impact on performance. What does have a significant impact on performance is that the bignum module uses malloc all the time. We would like to completely rewrite bignum operations at some point during the 3.x series, not only for performance but also because its design makes it hard not to leak information through side channels.
The mbedTLS mpi design is inspired by GMP, right? Some years ago, when microprocessors were smaller and/or malloc was strictly forbidden within deeply embedded devices (automotive) I liked the MIRACL mpi design very much. Or the good old Fortran way (one huge work array, index calculation by hand). Just joking.
- Just a minor issue: I only needed ECDSA signature verification,
therefore I only included MBEDTLS_ASN1_PARSE_C. But it is not possible to compile without MBEDTLS_ASN1_WRITE_C needed for ECDSA signature generation.
I'm not sure if I've written it down anywhere, but I'd like to remove the dependency of ECDSA on ASN1 altogether. Parsing and writing a SEQUENCE of two INTEGERs can be done with ad hoc code. Likewise for what little ASN1 the RSA module uses. And then asn1*.o can move out of libmbedcrypto and libpsacrypto, and into libmbedx509 where it belongs.
I completely agree with you. ASN.1 isn't crypto (and there was a time when every two or three weeks, there was a bug in ASN.1 parsing). But I don't know where to draw the line between raw public keys and PEM/DER/PKCS#8, for example.
Why do I find at so many places
for( i = 0; i < 16; i++ ) y[i] ^= b[i];
instead of a fast 128-bit XOR macro with 32bit aligned data?
As a programmer who doesn't write compilers, I think this is the right way to xor 16 bytes, and it's the compiler's job to optimize it to word or vector operations if possible. Admittedly this does mean the compiler has to know that the data is well-aligned, which can be hard to guarantee and easy to forget.
Okay, then I'm qualified to write compilers now ;-) May be I was working on crypto hardware with fixed registers for too long. But if you look at Microsoft's SymCrypt you'll find exactly this in SymCryptXorBytes() and they use it a lot in their crypto. I liked their comment: If your data is not aligned, then you don't care for performance.
Right. Also maintainable code and minimal code size, because minimal code size comes from letting the application developer #ifdef out everything that they don't care about, but this is a nightmare to test. It's one of the topics we're thinking about for Mbed TLS 3 and beyond.
In general, Mbed TLS is primarily targeted at embedded systems, and is likely to privilege 1. security (including side channel resistance) and 2. code size. This doesn't mean that we don't care about performance, just that it isn't our top priority.
I hope I haven't been misunderstood. We choose mbedTLS in the first place because of its modular structure. I wouldn't have had a chance to port all the crypto of OpenSSL, for example, to my SoC. There is a fine balance between performance and simplicity/maintainability/portability.
Once again, thanks for the detailed feedback, and I hope we can improve Mbed TLS for everyone!
-- Gilles Peskine Mbed TLS developer
Thank you for your detailed answer. I thing I'll closely follow the Mbed TLS development.
Torsten