[TF-M] Re: Failure mode of protected storage with faulty NV counters?

16 Sep 2022


      Hi Sherry,
Thanks for your reply. (I also received an off-list reply from Nordic about
this - thank you Sebastian). I have not yet encountered a write endurance
failure, but I am developing a device which will be deployed for a long
time. Using the nrf53 example, writing encrypted sensor data to external
EEPROM/FRAM/MRAM once per hour would cause the device to fail in the field
(in terms of the rated internal flash endurance) after only 1 year or so.
It does seem that indeed the rollback protection will fail insecure well
before the actual limit of the NV counter. I am not sure I entirely agree
about putting the check in the flash driver - I think it would be better
placed in the nv_counters code, because as far as I can see, the CMSIS
flash driver API does not care whether an erase/write is reliable or not
(or at least, it seems to me that this is left unspecified), and this
functionality is critical to the NV counters functionality as it will
otherwise fail insecure as far as rollback protection is concerned.
I also realised that this could also have implications as far as voltage
glitching resistance - if one was to voltage glitch at the flash
erase/write, the counter value would not be incremented correctly (and I
imagine flash writing is easier to glitch than most things).
Also, can I suggest that there are some clear warnings added to the PS
documentation about write endurance? Basically, that you are limited by the
write endurance of the internal flash, even if you are using external
devices. My initial naive thought when seeing the functionality was that
the write endurance was limited by the endurance of the external storage (I
was planning on using an external FRAM which has much higher endurance),
but this is not correct. Since TF-M is supposed to solve a lot of
cryptographic problems for us mere mortals, I can definitely see people
enabling it and then just using it to write sensor data to an external
flash for example. If there was a readback check enabled, it would at least
fail securely but no longer work.
As far as the nonce: I don't believe incrementing a variable is a secure
method (unless it was a non-volatile variable), as the attacker could just
reboot the device to start the counter again? The IV of 0 would be used
again (or whatever the reset value was), meaning that the security is now
completely broken with GCM. It's concerning if this is indeed how it is
implemented in TF-M?
This is actually the fundamental problem I am trying to solve at the moment
- how to realistically use AES-GCM on an MCU with a non-ephemeral key. As
far as I can recall the NIST guidelines do not recommend a random IV as the
96 bit IV is considered a little bit small as far as the birthday paradox
is concerned. At the moment it seems pretty close to impossible for my
skill level ;)
Thanks,
Jeremy
On Thu, 15 Sept 2022 at 19:40, Sherry Zhang Sherry.Zhang2@arm.com wrote:
...
Hi Jeremy,
Thanks for bringing up this question.
Yes, the flash area for non-volatile counter is fixed and there is no wear
levelling for that. But I wonder whether the PS NV counter area wears
faster than some other areas like areas where the ITS assets are stored and
areas where the image itself lays. One PS nv counter area should only be
written three times(three PS nv counters are reserved) in one PS asset
creation cycle – ie when updating the ps object table. The area storing the
PS/ITS assets is very likely to be accessed more than three time(as file
system is used) in one PS asset creation. Is it the PS NV counter area that
wore out firstly on your device?
In PS, we only ensure that a NV counter change between a reset is
detected: the PS object table authentication will fail in that case. But no
check like “if (value_to_write != value_read_back) return
FLASH_WORN_OUT_ERROR” is used. As you described, the implementation depends
on the return value from CMSIS flash driver to detect any error. If SUCCESS
is returned, it is supposed the value is truly be flashed. If flash wear
our happen and the flash driver still returns a SUCCESS, then the PS
partition cannot detect it. A lower NV counter value can be used silently.
I think the right place for adding the check you mentioned is in the
implementation of flash driver as it is a common behavior for the flash.
PSA_ALG_GCM is used in the PS table encryption. NV counter, together with
the table data, is used as the additional data in the encryption. A value
which is increased by 1 each time is used as nonce in the encryption
instead of the NV counter.
Regards,
Sherry Zhang
*From:* Jeremy Herbert via TF-M tf-m@lists.trustedfirmware.org
*Sent:* Thursday, September 15, 2022 3:39 PM
*To:* tf-m@lists.trustedfirmware.org
*Subject:* [TF-M] Failure mode of protected storage with faulty NV
counters?
Hi,
I am not too familiar with TF-M, so please forgive me if this is a silly
question.
The protected storage APIs appear to require the use of on-die flash to
store a non-volatile counter that is used for rollback protection. This is
severely limiting in terms of the number of writes, because basically you
get as many writes as the endurance of the flash on the MCU (for example,
the nordic cortex M33 devices have a rated write endurance of 10k cycles
per page, and I don't think there is any wear levelling in TF-M). For
example, assuming that a device was configured to write to the protected
storage on boot, one could pretty easily exhaust this flash in a few hours
by continuously power cycling it. Even if the 10k writes is a very
conservative rating, it seems pretty likely that the counter flash will
fail before UINT32_MAX.
My question is: what happens to the security and functionality of the
protected storage if the internal NV flash write fails silently? I don't
know much about the semiconductor physics at play here, but presumably it
could fail to make the counter a constant number, or fail to a random
number.
I had a quick look but there don't appear to be any checks in the code to
ensure that a value was actually written correctly to the NV counters flash
in case of silent corruption - it seems to just assume that any error would
be detectable by some return code from the flash write driver. I was
looking for some check like:
if (value_to_write != value_read_back) return FLASH_WORN_OUT_ERROR;
But I wasn't able to find it. So assuming it isn't actually there, if the
counter fails to a constant (which is not UINT32_MAX) then presumably the
rollback protection would be broken for all  writes after that point (and
maybe some before depending on the constant). If it fails to a random
number, then it would be broken in a more "random" way - ie it would
randomly work/not work depending on the value of the counter, until all
UINT32_MAX numbers are randomly selected as the counter value.
Also, given that typical AEAD ciphers like AES-GCM typically fail
catastrophically with nonce reuse and the protected storage is indeed AEAD
(though I can't quite work out yet which cipher is used), if these
non-volatile counters are used to generate a nonce then potentially the
encryption of the device could be broken just by rebooting the device until
the flash is worn out, and then the nonce will be reused if the flash fails
to a constant value.
Could someone please help me clear up if my understanding here is correct?
As is, I am struggling a bit to understand how to use the protected storage
API in a secure way with this constraint, because if an attacker has any
way to repeatedly cause a flash write it is basically game over. Any help
would be greatly appreciated.
Thanks,
Jeremy

2026

2025

2024

2023

2022

2021

2020

2019

2018

[TF-M] Re: Failure mode of protected storage with faulty NV counters?