After recently updating mbedtls I noticed a considerable slowdown (over 70% on my cortex-m7 board) in the sha256 implementation, and after some digging I found the offending commit: https://github.com/Mbed-TLS/mbedtls/commit/76749aea784cfec245390d0d6f0ab0a2d...
I understand the motivation behind the commit, but I think it may not be relevant to all use cases. So my question is if an option to disable the clearing of internal buffers in mbedtls_config.h would be a reasonable improvement? Or would that be considered to much of a foot gun? Regards, Joel Petersson
Hi Joel, thanks a lot for pointing that to us. We would probably prefer not adding a new configuration option for this. I have created the issue 6220 to investigate how we could address this performance regression. We have rapidly discussed on possible solutions (see solution hints). Would you be able to experiment with the first proposition to see if it makes a difference on your board ?
Thanks, Ronald.
-----Original Message----- From: joel.petersson--- via mbed-tls mbed-tls@lists.trustedfirmware.org Sent: 22 August 2022 13:23 To: mbed-tls@lists.trustedfirmware.org Subject: [mbed-tls] Adding option to disable the zeroisation of internal buffers
After recently updating mbedtls I noticed a considerable slowdown (over 70% on my cortex-m7 board) in the sha256 implementation, and after some digging I found the offending commit: https://github.com/Mbed-TLS/mbedtls/commit/76749aea784cfec245390d0d6f0ab0a2d...
I understand the motivation behind the commit, but I think it may not be relevant to all use cases. So my question is if an option to disable the clearing of internal buffers in mbedtls_config.h would be a reasonable improvement? Or would that be considered to much of a foot gun? Regards, Joel Petersson
Hi Ronald! Thank you for the suggestions. I have run some tests and have some numbers for you.
Hashing 100kb with sha256 on cortex-m7 board: 1. Current performance on master branch: ~5.2 million cycles 2. Removing call to mbedtls_platform_zeroize in mbedtls_internal_sha256_process_c: ~2.7 million cycles 3. Only calling mbedtls_platform_zeroize on the last block (by passing another argument to mbedtls_internal_sha256_process_c) : ~2.9 million cycles
Turns out I overestimated the slowdown in my original post, as it is closer to 50% - but with your suggestion we are back to almost the same numbers as before. // Joel
Hi Joel
Can I check which version of Mbed TLS you were running your tests on?
The original commit you pointed to (76749aea78) is quite old, and the current code issues a single mbedtls_platform_zeroize() call (since 4cb56f83cb).
(I just want to make sure we are looking at the same code)
Thanks
Tom ________________________________ From: joel.petersson--- via mbed-tls mbed-tls@lists.trustedfirmware.org Sent: 23 August 2022 09:53 To: mbed-tls@lists.trustedfirmware.org mbed-tls@lists.trustedfirmware.org Subject: [mbed-tls] Re: Adding option to disable the zeroisation of internal buffers
Hi Ronald! Thank you for the suggestions. I have run some tests and have some numbers for you.
Hashing 100kb with sha256 on cortex-m7 board: 1. Current performance on master branch: ~5.2 million cycles 2. Removing call to mbedtls_platform_zeroize in mbedtls_internal_sha256_process_c: ~2.7 million cycles 3. Only calling mbedtls_platform_zeroize on the last block (by passing another argument to mbedtls_internal_sha256_process_c) : ~2.9 million cycles
Turns out I overestimated the slowdown in my original post, as it is closer to 50% - but with your suggestion we are back to almost the same numbers as before. // Joel -- mbed-tls mailing list -- mbed-tls@lists.trustedfirmware.org To unsubscribe send an email to mbed-tls-leave@lists.trustedfirmware.org
Hi again, Just wanted to add one thing: The performance pre 4cb56f83cb was even worse than what I am seeing on 869298b, so it did help to only call mbedtls_platform_zeroize once. // Joel
Hi Joel, thanks for the figures, this is helpful.
Ronald.
-----Original Message----- From: joel.petersson--- via mbed-tls mbed-tls@lists.trustedfirmware.org Sent: 23 August 2022 10:53 To: mbed-tls@lists.trustedfirmware.org Subject: [mbed-tls] Re: Adding option to disable the zeroisation of internal buffers
Hi Ronald! Thank you for the suggestions. I have run some tests and have some numbers for you.
Hashing 100kb with sha256 on cortex-m7 board: 1. Current performance on master branch: ~5.2 million cycles 2. Removing call to mbedtls_platform_zeroize in mbedtls_internal_sha256_process_c: ~2.7 million cycles 3. Only calling mbedtls_platform_zeroize on the last block (by passing another argument to mbedtls_internal_sha256_process_c) : ~2.9 million cycles
Turns out I overestimated the slowdown in my original post, as it is closer to 50% - but with your suggestion we are back to almost the same numbers as before. // Joel -- mbed-tls mailing list -- mbed-tls@lists.trustedfirmware.org To unsubscribe send an email to mbed-tls-leave@lists.trustedfirmware.org
Hello,
To answer the general question: you can customize how the clearing of buffers is done by setting MBEDTLS_PLATFORM_ZEROIZE_ALT to a function of your choice. Technically, you can make it a no-op.
Now, whether you _should_ make it a no-op is a different matter. Zeroization is a second line of defense: if there are no bugs anywhere, zeroization is useless, since you can't observe its results. So it's a matter of how much you trust all the code that's running in the same memory space: Mbed TLS, the platform's standard library, other libraries, your application, the operating system, the hardware... Experience shows that as much as we try, bugs (including unintended indirect data leaks via side channels) do happen. Most security certifications require clearing secrets in some form.
The one case where I would personally suggest making mbedtls_platform_zeroize a no-op is if you're using the crypto part of the library in a crypto service running in its dedicated memory space, with a platform that takes care of zeroization. Specifically, if the platform's free() function zeroizes memory (takes care of heap data) and the crypto service wipes its stack (in some platform-specific way) after responding to each request. Under these assumptions, the only places where mbedtls_platform_zeroize() has an effect is if the memory is reused by crypto code, and in those few cases I'm reasonably confident that secrets won't leak. I'm less confident with TLS code (which has far more complex dynamic memory management and variable-size buffers), so keep zeroization active there.
(And yes, it would be nice if Mbed TLS had a more flexible interface, so that you could for example make mbedtls_platform_zeroize() a no-op only when it's called before mbedtls_free() if your platform's free() zeroizes. Unfortunately, implementing that would be a huge task.)
Also, I don't know if it would make much of a difference in practice, but if your platform has a zeroization function in its standard library (e.g. memset_s() or explicit_bzero()), it might allow the compiler to optimize slightly better (for example, it might figure out that zeroizing twice is redundant, which it can't with our portable implementation because it works very hard to hide what it's doing so that the compiler doesn't completely optimize it out).
Best regards,
mbed-tls@lists.trustedfirmware.org