Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy
Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert via TF-M tf-m@lists.trustedfirmware.org Sent: Thursday, September 15, 2022 3:39 PM To: tf-m@lists.trustedfirmware.org Subject: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert via TF-M tf-m@lists.trustedfirmware.org *Sent:* Thursday, September 15, 2022 3:39 PM *To:* tf-m@lists.trustedfirmware.org *Subject:* [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks,
Jeremy
Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert via TF-M <tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org> Sent: Thursday, September 15, 2022 3:39 PM To: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Subject: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at here https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903 .
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Friday, September 16, 2022 7:48 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks,
Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert via TF-M tf-m@lists.trustedfirmware.org *Sent:* Thursday, September 15, 2022 3:39 PM *To:* tf-m@lists.trustedfirmware.org *Subject:* [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks,
Jeremy
Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert via TF-M <tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org> Sent: Thursday, September 15, 2022 3:39 PM To: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Subject: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
*The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32*. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. *In practice, this requirement is almost as important as the secrecy of the key. *
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker.
So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements.
For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via *PS_CRYPTO_AEAD_ALG* build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patch https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Sunday, September 18, 2022 10:15 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems:
- Nonce reuse in AES-GCM breaks the encryption to the point of it being
near useless
- A random nonce is not recommended for AES-GCM as the nonce size is a
bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
- If you don't use a non-volatile counter, resetting the device will
cause nonce reuse => AES-GCM broken.
- If you do use a non-volatile counter stored on internal flash, it seems
that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter.
- If you store the nonce for AES-GCM in external flash, and then read it
back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose:
- I realise this is probably unrealistic, but it is the best solution
nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device
- Use an external device like the ATECC608 which has an encrypted,
authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts)
- Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant
of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks,
Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at here https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903 .
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Friday, September 16, 2022 7:48 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks,
Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert via TF-M tf-m@lists.trustedfirmware.org *Sent:* Thursday, September 15, 2022 3:39 PM *To:* tf-m@lists.trustedfirmware.org *Subject:* [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks,
Jeremy
Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own the NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert via TF-M <tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org> Sent: Thursday, September 15, 2022 3:39 PM To: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Subject: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy
Hi,
In case it is not clear why a loss of rollback protection via the NV counter can lead to full loss of confidentiality of material stored in the whole of PS, let me try to outline a possible attack once rollback of the PS data stored in external flash is possible:
1. Make copy (A) of the PS data blocks from the external flash 2. Cause the application or system to update a data item in PS 3. Make another copy (B1) of the updated PS data blocks 4. Restore the original PS content (A) to the external flash 5. Cause the application or system to update the same data item in PS to a different value (from step 2) 6. Make another copy (B2) of the updated PS data blocks
At this point, the data item that was modified in steps 2 and 5 will have been stored in PS using the same nonce (because PS increments the data item nonce when it is updated, and the nonce for that data item was the same at the start of step 2 and step 5, because of the rollback of the PS data in step 4). So after the attacker locates the data item in B1 and B2, they can use this to defeat the AES encryption as per the literature on AES-GCM nonce reuse. The attacker can repeat steps 4-6 to obtain additional ciphertexts relating to the data item, reusing the nonce, to improve the ability to recover the AES key.
To defend against this scenario for a system in which the rollback-protection failure cannot be sufficiently mitigated (e.g. by wear levelling as suggested), a nonce-reuse-resistant AEAD algorithm might mitigate the risk of such an attack.
Regards, Andrew
From: Sherry Zhang via TF-M tf-m@lists.trustedfirmware.org Sent: 21 September 2022 05:46 To: Jeremy Herbert jeremy.006@gmail.com; tf-m@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: [TF-M] Re: Failure mode of protected storage with faulty NV counters?
Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own the NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert via TF-M <tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org> Sent: Thursday, September 15, 2022 3:39 PM To: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Subject: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy
Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Hi Sherry,
I haven't registered, I will try to do it soon and let you know.
I guess my issue is more around the AES-GCM encryption for external flash - is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? If so, it seems like the AES-GCM encryption for PS is broken by design as you must not under any circumstances allow nonce reuse with the same key.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 14:57, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patch https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting *PLATFORM_DEFAULT_NV_COUNTERS* OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Wednesday, September 21, 2022 11:13 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
*The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32*. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. *In practice, this requirement is almost as important as the secrecy of the key. *
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks,
Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker.
So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements.
For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via *PS_CRYPTO_AEAD_ALG* build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patch https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Sunday, September 18, 2022 10:15 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems:
- Nonce reuse in AES-GCM breaks the encryption to the point of it being
near useless
- A random nonce is not recommended for AES-GCM as the nonce size is a
bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
- If you don't use a non-volatile counter, resetting the device will
cause nonce reuse => AES-GCM broken.
- If you do use a non-volatile counter stored on internal flash, it seems
that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter.
- If you store the nonce for AES-GCM in external flash, and then read it
back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose:
- I realise this is probably unrealistic, but it is the best solution
nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device
- Use an external device like the ATECC608 which has an encrypted,
authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts)
- Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant
of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks,
Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at here https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903 .
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Friday, September 16, 2022 7:48 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks,
Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Thanks for bringing up this question.
Hi Jeremy,
- is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? Do you think adding NV counter on a non-volatile memory to prevent rollback is not good enough to solve this problem?
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Wednesday, September 21, 2022 3:56 PM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I haven't registered, I will try to do it soon and let you know.
I guess my issue is more around the AES-GCM encryption for external flash - is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? If so, it seems like the AES-GCM encryption for PS is broken by design as you must not under any circumstances allow nonce reuse with the same key.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 14:57, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Hi Sherry,
It is good enough in theory, but the problem is a) the flash wearout and b) if you disable rollback protection nonce on the internal flash, then you don't just lose rollback protection but you *also* lose the encryption on the external device because it is trivially vulnerable to nonce reuse.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 18:07, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
- is it correct that if replay protection is disabled, an attacker can
force the nonce used as the IV for encryption to be reused by rolling back the external flash image?
Do you think adding NV counter on a non-volatile memory to prevent rollback is not good enough to solve this problem?
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Wednesday, September 21, 2022 3:56 PM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I haven't registered, I will try to do it soon and let you know.
I guess my issue is more around the AES-GCM encryption for external flash
- is it correct that if replay protection is disabled, an attacker can
force the nonce used as the IV for encryption to be reused by rolling back the external flash image? If so, it seems like the AES-GCM encryption for PS is broken by design as you must not under any circumstances allow nonce reuse with the same key.
Thanks,
Jeremy
On Wed, 21 Sept 2022 at 14:57, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patch https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting *PLATFORM_DEFAULT_NV_COUNTERS* OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Wednesday, September 21, 2022 11:13 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
*The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32*. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. *In practice, this requirement is almost as important as the secrecy of the key. *
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks,
Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker.
So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements.
For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via *PS_CRYPTO_AEAD_ALG* build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patch https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Sunday, September 18, 2022 10:15 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems:
- Nonce reuse in AES-GCM breaks the encryption to the point of it being
near useless
- A random nonce is not recommended for AES-GCM as the nonce size is a
bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
- If you don't use a non-volatile counter, resetting the device will
cause nonce reuse => AES-GCM broken.
- If you do use a non-volatile counter stored on internal flash, it seems
that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter.
- If you store the nonce for AES-GCM in external flash, and then read it
back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose:
- I realise this is probably unrealistic, but it is the best solution
nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device
- Use an external device like the ATECC608 which has an encrypted,
authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts)
- Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant
of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks,
Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at here https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903 .
Regards,
Sherry Zhang
*From:* Jeremy Herbert jeremy.006@gmail.com *Sent:* Friday, September 16, 2022 7:48 AM *To:* Sherry Zhang Sherry.Zhang2@arm.com *Cc:* tf-m@lists.trustedfirmware.org; nd nd@arm.com *Subject:* Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks,
Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang Sherry.Zhang2@arm.com wrote:
Hi Jeremy,
Thanks for bringing up this question.
Hi Jeremy,
a) the flash wearout After this fixhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723, the flash wear out should be detected when it happens. Later one possible enhancement we can do is adding support of wear leveling from my rough thought. As for how to select the devices to make the PS service endurance more reasonable, it is up to users.
b) if you disable rollback protection nonce on the internal flash,… In the PS integration guide, we can add the description that in case of GCM(the default configuration) rollback protection is essential. And in the code, we can add that runtime checkhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16798 to ensure that rollback protection is enabled.
- I haven't registered, I will try to do it soon and let you know. If there is any problem in the process, do not hesitate to contact us.
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Wednesday, September 21, 2022 5:12 PM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
It is good enough in theory, but the problem is a) the flash wearout and b) if you disable rollback protection nonce on the internal flash, then you don't just lose rollback protection but you *also* lose the encryption on the external device because it is trivially vulnerable to nonce reuse.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 18:07, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
- is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? Do you think adding NV counter on a non-volatile memory to prevent rollback is not good enough to solve this problem?
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 3:56 PM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I haven't registered, I will try to do it soon and let you know.
I guess my issue is more around the AES-GCM encryption for external flash - is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? If so, it seems like the AES-GCM encryption for PS is broken by design as you must not under any circumstances allow nonce reuse with the same key.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 14:57, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field (in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well before the actual limit of the NV counter. I am not sure I entirely agree about putting the check in the flash driver - I think it would be better placed in the nv_counters code, because as far as I can see, the CMSIS flash driver API does not care whether an erase/write is reliable or not (or at least, it seems to me that this is left unspecified), and this functionality is critical to the NV counters functionality as it will otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage glitching resistance - if one was to voltage glitch at the flash erase/write, the counter value would not be incremented correctly (and I imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS documentation about write endurance? Basically, that you are limited by the write endurance of the internal flash, even if you are using external devices. My initial naive thought when seeing the functionality was that the write endurance was limited by the endurance of the external storage (I was planning on using an external FRAM which has much higher endurance), but this is not correct. Since TF-M is supposed to solve a lot of cryptographic problems for us mere mortals, I can definitely see people enabling it and then just using it to write sensor data to an external flash for example. If there was a readback check enabled, it would at least fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure method (unless it was a non-volatile variable), as the attacker could just reboot the device to start the counter again? The IV of 0 would be used again (or whatever the reset value was), meaning that the security is now completely broken with GCM. It's concerning if this is indeed how it is implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment - how to realistically use AES-GCM on an MCU with a non-ephemeral key. As far as I can recall the NIST guidelines do not recommend a random IV as the 96 bit IV is considered a little bit small as far as the birthday paradox is concerned. At the moment it seems pretty close to impossible for my skill level ;)
Thanks, Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
Thanks for bringing up this question.
Hi Jeremy,
a) the flash wearout After this fixhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723, the flash wear out should be detected when it happens. Later one possible enhancement we can do is adding support of wear leveling from my rough thought. As for how to select the devices to make the PS service endurance more reasonable, it is up to users.
b) if you disable rollback protection nonce on the internal flash,… In the PS integration guide, we can add the description that in case of GCM(the default configuration) rollback protection is essential. And in the code, we can add that runtime checkhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16798 to ensure that rollback protection is enabled.
- I haven't registered, I will try to do it soon and let you know. If there is any problem in the process, do not hesitate to contact us.
Regards, Sherry Zhang
From: Jeremy Herbert jeremy.006@gmail.com Sent: Wednesday, September 21, 2022 5:12 PM To: Sherry Zhang Sherry.Zhang2@arm.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
It is good enough in theory, but the problem is a) the flash wearout and b) if you disable rollback protection nonce on the internal flash, then you don't just lose rollback protection but you *also* lose the encryption on the external device because it is trivially vulnerable to nonce reuse.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 18:07, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
- is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? Do you think adding NV counter on a non-volatile memory to prevent rollback is not good enough to solve this problem?
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 3:56 PM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I haven't registered, I will try to do it soon and let you know.
I guess my issue is more around the AES-GCM encryption for external flash - is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? If so, it seems like the AES-GCM encryption for PS is broken by design as you must not under any circumstances allow nonce reuse with the same key.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 14:57, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in
Hi Jeremy,
Many thanks for bringing up this topic and the share your experience to improve TF-M security and increase user awareness of certain limitations. The patches, mentioned by Sherry, intended to fix the issue at least partially. Welcome to join the project and comment/contribute to the common codebase. That will be very appreciated.
In addition to wear levelling, a device lifecycle can by moved to the decommissioned state when storage integrity problem detected. This needs more thought, and the best solution is use case dependent, probably. In some cases, a flash performance and footprint might be more preferred than crypto algorithm selection (without compromising security, of cause).
Assume you aware about the Tech Forumhttps://www.trustedfirmware.org/meetings/tf-m-technical-forum/ which is a good place to discuss such topics online.
Thanks and best regards, Anton
From: Sherry Zhang via TF-M tf-m@lists.trustedfirmware.org Sent: Thursday, September 22, 2022 5:51 AM To: Jeremy Herbert jeremy.006@gmail.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: [TF-M] Re: Failure mode of protected storage with faulty NV counters?
Hi Jeremy,
a) the flash wearout After this fixhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723, the flash wear out should be detected when it happens. Later one possible enhancement we can do is adding support of wear leveling from my rough thought. As for how to select the devices to make the PS service endurance more reasonable, it is up to users.
b) if you disable rollback protection nonce on the internal flash,… In the PS integration guide, we can add the description that in case of GCM(the default configuration) rollback protection is essential. And in the code, we can add that runtime checkhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16798 to ensure that rollback protection is enabled.
- I haven't registered, I will try to do it soon and let you know. If there is any problem in the process, do not hesitate to contact us.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 5:12 PM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
It is good enough in theory, but the problem is a) the flash wearout and b) if you disable rollback protection nonce on the internal flash, then you don't just lose rollback protection but you *also* lose the encryption on the external device because it is trivially vulnerable to nonce reuse.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 18:07, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
- is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? Do you think adding NV counter on a non-volatile memory to prevent rollback is not good enough to solve this problem?
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 3:56 PM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I haven't registered, I will try to do it soon and let you know.
I guess my issue is more around the AES-GCM encryption for external flash - is it correct that if replay protection is disabled, an attacker can force the nonce used as the IV for encryption to be reused by rolling back the external flash image? If so, it seems like the AES-GCM encryption for PS is broken by design as you must not under any circumstances allow nonce reuse with the same key.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 14:57, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, Have you registered access to the Gerrit? I can add you as a reviewer of this patch if possible so that you can comment on this patch directly. Your description is very detailed. Thanks for that.
For nonce reuse issue, I think what is missed is that the template implementation of nv counter layer should check whether the NV counter is programmed successfully. There is a patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16723 for fixing that. Thanks for pointing this out. By the way, this is a template implementation. Users can implement their own NV counter APIs defined in tfm_plat_nv_counters.h setting PLATFORM_DEFAULT_NV_COUNTERS OFF.
(Removed part of the initial email thread as it reached the 80 KB maximum size.)
Thanks!
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Wednesday, September 21, 2022 11:13 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Can I suggest that the following be added instead? I'm not sure that is clear enough to explain the pitfalls to non-experts like myself.
"If this flag is enabled, the lifecycle of the PS service depends on the minimum write endurance of both the device that stores the assets and the device that stores the NV counters - typically the NV counters can only be stored to on-die storage (such as internal flash) for security reasons. As an example, if the NV counter is stored inside the internal flash of a microcontroller with a write endurance of 10k writes, and the PS assets are stored in an external flash with a write endurance of 100k writes, the useful number of writes to the external flash is constrained by the endurance of the internal flash, after which point the rollback protection will fail as the internal flash can no longer correctly store the NV counter."
Also, this doesn't mention that the AES-GCM encryption can be trivially broken if the rollback protection is not enabled. To quote SP 800-38D section 8:
The probability that the authenticated encryption function ever will be invoked with the same IV and the same key on two (or more) distinct sets of input data shall be no greater than 2-32. Compliance with this requirement is crucial to the security of GCM. Across all instances of the authenticated encryption function with a given key, if even one IV is ever repeated, then the implementation may be vulnerable to the forgery attacks that are described in Ref [5] and summarized in Appendix A. In practice, this requirement is almost as important as the secrecy of the key.
I understand it may not be up to you and that there are already silicon vendors with shipped products, but if the scope of TF-M is advertised to end developers as taking care of security from the beginning, it does seem like it should probably be considered in-scope that the PS encryption is not trivially defeated using the default implementation.
Thanks, Jeremy
On Wed, 21 Sept 2022 at 12:50, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy, I totally agree with your analysis on the necessity of NV counter in ps service. I think another reason why rollback protection is necessary is to prevent the ps data itself rollback. For example, if the ps asset with a specific pair of (uid, client_id) has been once created, then later the asset is updated to another value. Without rollback protection, the old data may be recovered and used by an attacker. So, we need a trusted non-volatile memory to store the NV counter. As for which memory device should be used(like FRAM or other flash devices), it is beyond our scope. Users can make the decision based on their own specific requirements. For your option 3, if the algorithm is compatible with psa crypto interface, users can configure the ps algorithm via PS_CRYPTO_AEAD_ALG build flag with that algorithm. Even nonce reused protection is not needed, I think rollback protection is still necessary.
I created this patchhttps://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/16797 to remind the users that the lifecycle of PS service may also depends on the device that stores NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Sunday, September 18, 2022 10:15 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
I have actually come up against this problem a bunch over the past few years. Here are my thoughts.
It seems like there are two major problems: 1. Nonce reuse in AES-GCM breaks the encryption to the point of it being near useless 2. A random nonce is not recommended for AES-GCM as the nonce size is a bit small, but having a secure non-volatile counter on an MCU requires specific hardware support as far as having high endurance non-volatile memory (see ATECC608 for example) - and basically nobody has this in their MCUs (though I know STM32s sometimes come with internal EEPROM which can take ~100k writes per word). Secure NFC tags like the DESFire EV3 are often rated for 1million+ writes per word, and with wear leveling you can basically make this number beyond the useful life of the device.
This results in it being impossible to implement external encrypted flash with rollback protection if the device ever loses power or is reset (or at least as far as I can see):
1. If you don't use a non-volatile counter, resetting the device will cause nonce reuse => AES-GCM broken. 2. If you do use a non-volatile counter stored on internal flash, it seems that eventually the area of flash will fail to all bits cleared (ie 0) - given typical numbers, you are looking at a maximum of 50k erases *per page*, not per byte (though most vendors guarantee way less than this ie nordic is 10k). And given in TF-M all of the different counters are probably stored in the same page, that means that writing to any counter uses up one of your cycles. You could potentially wear level across pages, but with 1K/4K page sizes that is a lot of flash to consume just for a counter. 3. If you store the nonce for AES-GCM in external flash, and then read it back and increment for the next operation, the attacker can just roll back the flash => nonce reuse => AES-GCM broken.
Also it appears that you *must* always enable rollback protection in TF-M, otherwise you just need to dump the external flash, reset the device so it does another write, dump the flash again and you have caused nonce reuse => encryption is broken.
The solutions I can propose: 1. I realise this is probably unrealistic, but it is the best solution nonetheless: require silicon vendors to add a small amount of some special non-volatile memory on their chip like FRAM (maybe 64 bits?) to get a "Security Lvl. 99 (TM): Extreme" certification that they can put on the marketing pages for their device 2. Use an external device like the ATECC608 which has an encrypted, authenticated link and high endurance non-volatile storage for counters (except normal people like me have a 2+ year lead time on this and similar parts) 3. Don't use AES-GCM. There is a much newer, nonce-reuse resistant variant of AES-GCM called AES-GCM-SIV which has only a small performance penalty (I have seen ~30% quoted). I had planned to do it a while ago, but I have just spent the last few days or so hacking this cipher into mbedtls, and it isn't too far from AES-GCM in terms of functionality: https://github.com/Mbed-TLS/mbedtls/pull/6294 - I hope it can eventually be merged once I tidy it up. While this doesn't solve the problem of the rollback counter failing due to being stored in internal flash (you need option 1 or 2 for that), if the counter fails to a constant, at least you just lose the rollback protection rather than losing all security.
If you have any other suggestions as to how I can solve this problem or something that I am missing, I am all ears!
Thanks, Jeremy
On Fri, 16 Sept 2022 at 17:41, Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> wrote: Hi Jeremy,
About which flash should be the limitation of the PS service usage, this is really a good perspective to assess the storage service. I thought over it again and I think it is hard to completely separate the PS away from the internal flash(as we need a trusted area for implementing the rollback protection). Also, the NV counter area is not only used by PS but also by other components like BL2. MCUboot writes the BL2 NV counter once each time booting up. That can also make the NV area flash wear out. Do you have any suggestions/proposal on making PS storage only lives on the external flash?
I checked with someone who is working on the CMSIS drivers and got confirmation that the CMSIS Flash driver API is intended to only check the status of the operation without verification of the written data. But I wonder if the write error can be detected by the flash device itself with ECC check internally when flash wear out happen or flash data corrupt happen. It maybe depend on the specific flash device. So, I propose to add the check in the NV counter layer and leave that check as an optional choice. User can decide whether enable that check or not based on their specific flash device. How do you think about?
About your proposal of adding the warnings about endurance, this is a good suggestion. We will add that later. Thanks for the feedback!
The nonce value is stored into flash together with the ps object table. After reboot, it is read out from flash and the read out value is used as the start value after the reboot(not starts from 0 each time reboot). See code at herehttps://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/secure_fw/partitions/protected_storage/ps_object_table.c#n903.
Regards, Sherry Zhang
From: Jeremy Herbert <jeremy.006@gmail.commailto:jeremy.006@gmail.com> Sent: Friday, September 16, 2022 7:48 AM To: Sherry Zhang <Sherry.Zhang2@arm.commailto:Sherry.Zhang2@arm.com> Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd <nd@arm.commailto:nd@arm.com> Subject: Re: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about this - thank you Sebastian). I have not yet encountered a write endurance failure, but I am developing a device which will be deployed for a long time. Using the nrf53 example, writing encrypted sensor data to external EEPROM/FRAM/MRAM once per hour would cause the device to fail in
tf-m@lists.trustedfirmware.org