Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear levelling for that. But I wonder whether the PS NV counter area wears faster than some other areas like areas where the ITS assets are stored and areas where the image itself lays. One PS nv counter area should only be written three times(three PS nv counters are reserved) in one PS asset creation cycle – ie when updating the ps object table. The area storing the PS/ITS assets is very likely to be accessed more than three time(as file system is used) in one PS asset creation. Is it the PS NV counter area that wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is detected: the PS object table authentication will fail in that case. But no check like “if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends on the return value from CMSIS flash driver to detect any error. If SUCCESS is returned, it is supposed the value is truly be flashed. If flash wear our happen and the flash driver still returns a SUCCESS, then the PS partition cannot detect it. A lower NV counter value can be used silently. I think the right place for adding the check you mentioned is in the implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with the table data, is used as the additional data in the encryption. A value which is increased by 1 each time is used as nonce in the encryption instead of the NV counter.
Regards, Sherry Zhang
From: Jeremy Herbert via TF-M tf-m@lists.trustedfirmware.org Sent: Thursday, September 15, 2022 3:39 PM To: tf-m@lists.trustedfirmware.org Subject: [TF-M] Failure mode of protected storage with faulty NV counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly question.
The protected storage APIs appear to require the use of on-die flash to store a non-volatile counter that is used for rollback protection. This is severely limiting in terms of the number of writes, because basically you get as many writes as the endurance of the flash on the MCU (for example, the nordic cortex M33 devices have a rated write endurance of 10k cycles per page, and I don't think there is any wear levelling in TF-M). For example, assuming that a device was configured to write to the protected storage on boot, one could pretty easily exhaust this flash in a few hours by continuously power cycling it. Even if the 10k writes is a very conservative rating, it seems pretty likely that the counter flash will fail before UINT32_MAX.
My question is: what happens to the security and functionality of the protected storage if the internal NV flash write fails silently? I don't know much about the semiconductor physics at play here, but presumably it could fail to make the counter a constant number, or fail to a random number.
I had a quick look but there don't appear to be any checks in the code to ensure that a value was actually written correctly to the NV counters flash in case of silent corruption - it seems to just assume that any error would be detectable by some return code from the flash write driver. I was looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the counter fails to a constant (which is not UINT32_MAX) then presumably the rollback protection would be broken for all writes after that point (and maybe some before depending on the constant). If it fails to a random number, then it would be broken in a more "random" way - ie it would randomly work/not work depending on the value of the counter, until all UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail catastrophically with nonce reuse and the protected storage is indeed AEAD (though I can't quite work out yet which cipher is used), if these non-volatile counters are used to generate a nonce then potentially the encryption of the device could be broken just by rebooting the device until the flash is worn out, and then the nonce will be reused if the flash fails to a constant value.
Could someone please help me clear up if my understanding here is correct? As is, I am struggling a bit to understand how to use the protected storage API in a secure way with this constraint, because if an attacker has any way to repeatedly cause a flash write it is basically game over. Any help would be greatly appreciated.
Thanks, Jeremy