Hi Louis,
The term PAL is also used for platform abstraction layer so it "could" be confusing to have this being used for something different.
Not necessarily suggesting to change anything but it might be worth considering this aspect.
Thanks,
Abhishek
-----Original Message-----
From: TF-A <tf-a-bounces(a)lists.trustedfirmware.org> On Behalf Of Louis Mayencourt via TF-A
Sent: 19 November 2019 13:40
To: tf-a(a)lists.trustedfirmware.org
Subject: [TF-A] Design proposal: Property Access Layer
Hi everyone,
Please let me introduce the `Property Access Layer` prototype:
The Property Access Layer (PAL) is an abstraction layer for platform specific data, allowing a "property" to be queried and a value retrieved without the requesting entity knowing what backing store is being used to hold the data. It is used to bridge new and old ways of providing platform-specific data:
Today, information like the Chain of Trust is held within several, nested platform-defined tables. In the future, it may be provided as part of a device tree blob, along with the information about images to load.
Introducing this abstraction layer will make migration easier and will preserve functionality for platforms that cannot / don't want to use device tree.
Please have a look at the patches: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/2559/1
Regards,
Louis
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Hi Andre,
I'm trying to get CPU offlining work for Linux on the RPi4. In fact,
this is working already with current ATF master and the Raspberry kernel
4.19.85. Not it would be "nice" if onlining worked as well:
[ 94.959294] CPU1: shutdown
[ 94.959311] psci: CPU1 killed.
[ 106.750960] CPU1: failed to come online
[ 106.755425] CPU1: failed in unknown state : 0x0
Is this a known limitation? Or am I using the wrong kernel (obviously,
it's a downstream one ;) )?
Thanks,
Jan
PS: Current RPi firmware requires an explicit "armstub=armstub8.bin" in
config.txt, no automatic pickup.
Hi Soby,
Thanks for your response.
>>it is needed to ensure the ordering of the succeeding sev().
Agree. Thanks for the clarification.
>>Was this an issue that actually manifested on a hardware or is this
something that you caught while reviewing the code?
Noticed it while reviewing code and I have not observed it on hardware.
Thanks
-Raghu
On November 26, 2019 at 8:55 AM, Soby Mathew <Soby.Mathew(a)arm.com> wrote:
On 26/11/2019 16:30, Raghupathy Krishnamurthy via TF-A wrote:
Hello!
Reposting this from (https://developer.trustedfirmware.org/T589).
bakery_lock_get() uses a dmbld() after lock acquisition which is insufficient in a lock acquire situation. With just dmbld(), stores in a critical section can be reordered before the dmbld() and indeed before the lock acquisition has taken place. similarly, bakery_lock_release() only uses dmbst(). A load in the critical section could be reordered after the dmbst() and write to the lock data structure releasing the lock. This is likely less of a problem but lock release needs to provide release semantics, and dmbst() is insufficient. For ex: A load in the critical section of CPU1 can be reordered after the store for the lock release, and it could read from a store that is executed on CPU2 in the same critical section, since CPU2 saw the store for the lock release first, and raced into the critical section.
Hi Raghu,
You are right on this. The dmbld() and dmbst() does not provide
sufficient guarantees in the cases you mention.
Was this an issue that actually manifested on a hardware or is this
something that you caught while reviewing the code ?
Also the dsb() after the write to the lock seems unnecessary. Am I missing something here ? It looks like the same issue is present even in bakery_lock_normal.
If you are referring to the dsb() at this line :
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/lib/locks/…
it is needed to ensure the ordering of the succeeding sev().
Best Regards
Soby Mathew
Thanks
Raghu
Hello Sumit,
Sorry for not getting back to you earlier on this. I started looking at
your patches and although I've not completely got my head around them
yet, I've got some early comments and questions. I've tried to classify
them by themes below.
First of all, let me say that I'm no security expert and I did not know
about the concept of authenticated encryption/decryption before looking
at your patches. I spent a couple of hours reading about such algorithms
in general and AES-GCM in particular. Most of what I've learned so far
is based on my understanding of RFC 5116 [1].
What's the motivation for choosing GCM to start with? I've seen that it
is free of patent [2], which I am guessing was a strong argument for it.
I've also read that it is supposed to be quite lightweight and can take
full advantage of parallel processing, although I've not looked into the
details. Were these the reasons? Any other reasons?
Key management
--------------
fip_file_read() retrieves the key from the platform and stores it in a
buffer on the stack. I don't see any code wiping it out of memory once
we're done with it. Did I miss it? Unlike the root of trust public key,
this is a (symmetric) secret key so it is sensitive data that we must
not leave for grabs, even if the stack is in Trusted RAM and that it's
likely to be overwritten by subsequent stack usage.
Also, I am still trying to get my head around how this would integrate
with a cryptographic engine where the key does not leave the chip. I can
imagine that we could get the address of the encrypted firmware image
from the FIP, pass that to a cryptographic engine, request it to decrypt
it and store the result somewhere in Trusted RAM. In this case, we
wouldn't call plat_get_fip_encryption_key(). Do you have any idea how we
would pull this off? Like how the different modules (IO layer, crypto
module, image parser module, ...) would integrate together?
I have some concerns around the generation of the initialization vectors
in the encrypt_fw tool. Right now, IVs are simply a random sequence of
bytes (obtained through a call to OpenSSL's RAND_bytes() API). Now, I
would imagine that RAND_bytes() is typically based on a good random
number generator and thus will generate different sequences every time
it is called. At least, as long as it is called from the same machine
every time. But what if we encrypt a new FIP bundle from a different
machine, say in the context of a firmware update? Is it not possible
that it might choose the same IV out of bad luck?
Perhaps that's an issue left to provisioning/manufacturing time and is
out of the scope here. But it worries me because AFAIU, the security of
AES-GCM is critically undermined if the same nonce is used multiple
times with the same key (see section 5.1.1. "Nonce reuse" in RFC 5116).
If the encryption key is the SSK (rather than the BSSK) then I guess the
probability is even higher, as it is shared amongst a class of devices.
Impact on memory footprint and performance
------------------------------------------
Do you know what the performance impact is when this feature is enabled
in TF-A, to decrypt images at boot time? Obviously it depends on the
platform and whether there is a dedicated cryptographic engine, and I
suppose you cannot really get any relevant measurements out of QEMU but
I would be interested if you've got any rough numbers.
And what's the memory footprint impact? IIUC, AES-GCM almost does not
inflate the size of the data it encrypts. The size of the ciphertext
seems to be the same as the plaintext + the size of the authentication
tag. So I guess there's no real impact on flash storage and Trusted RAM
usage to hold decrypted firmware. But what about the mbedTLS primitives
to decrypt the images? How much code and data does this add?
encrypt_fw tool
---------------
We have some floating ideas around re-implementing the tools (fiptool,
certtool) in a scripting language (possibly python) in the future and
also doing a better job at sharing a common description of the list of
images to boot/authenticate between the firmware and the host tools. But
we're not there yet, so I agree that implementing this new tool in C
from the same "mold" as fiptool and certtool is what makes the most
sense today. It's just another tool we will have to rework if and when
we get there.
I did not understand why this new tool needs to know what image it is
encrypting. For example, one possible invocation could be:
tools/encrypt_fw/encrypt_fw \
-k 1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef \
--soc-fw bl31.bin \
--soc-fw-enc bl31_enc.bin \
--tos-fw bl32.bin \
--tos-fw-enc bl32_enc.bin
Why not invoking the tool once per image instead? As in:
encrypt_fw -k key -in ifile -out ofile
for BL31, then for BL32? Does the tool do anything different based on
the type of image it receives?
Regards,
Sandrine
[1] https://tools.ietf.org/html/rfc5116
[2]
https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents…
Hi All,
The buildsystem of TF-A became complex and loaded with technical debt during the years, and it's time to do something about this.
We made some plans and prototyping work to move to a CMake based solution and we would like to get feedback on the idea.
Why CMake?
In summary CMake is a mature tool having a wide acceptance in C and C++ projects.
Also it has benefits of decreasing fragmentation in the developer community if we sync up with TF-M.
How will it happen?
This will be a slow process where the old build system will co-exist for a period with the new one. How long that period will be is an open question.
For a more detailed summary please see https://developer.trustedfirmware.org/w/tf_a/cmake-buildsystem-proposal/
The design discussion will follow the design review proposal process of TF.org, as described on this page:
https://ci.trustedfirmware.org/job/tf-m-build-test-nightly/lastSuccessfulBu…
You can find the patch for capturing the design decisions and discussion here: https://review.trustedfirmware.org/#/c/TF-A/trusted-firmware-a/+/2662/
If you would like to contribute or have an opinion or any ideas please reply to this email or add a comment on Gerrit (link above).
Regards,
Balint
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 26/11/2019 16:30, Raghupathy Krishnamurthy via TF-A wrote:
> Hello!
>
> Reposting this from (https://developer.trustedfirmware.org/T589).
>
> bakery_lock_get() uses a dmbld() after lock acquisition which is insufficient in a lock acquire situation. With just dmbld(), stores in a critical section can be reordered before the dmbld() and indeed before the lock acquisition has taken place. similarly, bakery_lock_release() only uses dmbst(). A load in the critical section could be reordered after the dmbst() and write to the lock data structure releasing the lock. This is likely less of a problem but lock release needs to provide release semantics, and dmbst() is insufficient. For ex: A load in the critical section of CPU1 can be reordered after the store for the lock release, and it could read from a store that is executed on CPU2 in the same critical section, since CPU2 saw the store for the lock release first, and raced into the critical section.
Hi Raghu,
You are right on this. The dmbld() and dmbst() does not provide
sufficient guarantees in the cases you mention.
Was this an issue that actually manifested on a hardware or is this
something that you caught while reviewing the code ?
> Also the dsb() after the write to the lock seems unnecessary. Am I missing something here ? It looks like the same issue is present even in bakery_lock_normal.
>
If you are referring to the dsb() at this line :
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/lib/locks/…
it is needed to ensure the ordering of the succeeding sev().
Best Regards
Soby Mathew
> Thanks
> Raghu
>
Hello!
Reposting this from (https://developer.trustedfirmware.org/T589).
bakery_lock_get() uses a dmbld() after lock acquisition which is insufficient in a lock acquire situation. With just dmbld(), stores in a critical section can be reordered before the dmbld() and indeed before the lock acquisition has taken place. similarly, bakery_lock_release() only uses dmbst(). A load in the critical section could be reordered after the dmbst() and write to the lock data structure releasing the lock. This is likely less of a problem but lock release needs to provide release semantics, and dmbst() is insufficient. For ex: A load in the critical section of CPU1 can be reordered after the store for the lock release, and it could read from a store that is executed on CPU2 in the same critical section, since CPU2 saw the store for the lock release first, and raced into the critical section. Also the dsb() after the write to the lock seems unnecessary. Am I missing something here ? It looks like the same issue is present even in bakery_lock_normal.
Thanks
Raghu