Hafnium December 2021

hafnium@lists.trustedfirmware.org

8 participants
7 discussions

ffa_msg_send from Linux Primary to Secondary Fails

by Friedrich

Hi, Can someone help me figure this out? I am trying to send a message from my Linux primary VM to a secondary VM, but for some reason ffa_msg_send keeps on failing and I keep getting an EAGAIN error: resource temporarily unavailable. Any suggestions? Best, Friedrich

4 years, 1 month

How to debug with pc value?

by Chenxu Wang

Hi all, When data or instruction abort happens, Hafnium provides the esr_el2, far, pc values and etc. I think this info is useful, but I want to know how to debug with the PC value. I know TF-A provides some dump files (e.g., bl31.dump). So, when some panic or errors happen, I can locate the "bug instruction", its functions, and other useful context quickly. Will Hafnium provides similar dump files? If not, can we create them? How? Sincerely, Wang

4 years, 1 month

Re: [Hafnium] .git submodules increase hafnium code size

by Olivier Deprez

Hi Varun, Just a quick sync on what's been going since this last email. A number of changes have been made to: -remove the gcc dependency, hafnium now only requires clang to build. -ability to provide a toolchain different from the one stored in prebuilt submodule (merged yesterday) -migrated and tested with recent clang versions (was clang 9) -ability to build on arm64 host (our need) The prebuilt submodule still exists but the dependency to hardcoded toolchains has weakened. I intend to tell more about those recent changes and what remains during a tech forum early next year, among: -remove toolchains from prebuilt? -split the hypervisor and spmc test configs into separate project/* directories -lessen the dependency to other 3rd party projects Regards, Olivier. ________________________________________ From: Hafnium <hafnium-bounces(a)lists.trustedfirmware.org> on behalf of Olivier Deprez via Hafnium <hafnium(a)lists.trustedfirmware.org> Sent: 08 June 2021 16:00 To: Varun Wadekar Cc: Bo Yan; hafnium(a)lists.trustedfirmware.org Subject: Re: [Hafnium] .git submodules increase hafnium code size Hi Varun, As indicated earlier, we (Arm side) don't expect to progress on this front before Q3. I'm gathering requirements and expect to discuss through a short presentation later in a tech forum or the ML. Regards, Olivier. ________________________________________ From: Varun Wadekar <vwadekar(a)nvidia.com> Sent: 08 June 2021 15:21 To: Arunachalam Ganapathy; Olivier Deprez Cc: Bo Yan; hafnium(a)lists.trustedfirmware.org Subject: RE: .git submodules increase hafnium code size Thanks Arun. We have taken the same approach. The changes I posted earlier expect all dependencies to be out of tree and provide mechanisms to pass the locations to the make system during compilation. This is a real problem for us, and we would like a solution that works for the community too. @Olivier how should we move forward? -----Original Message----- From: Arunachalam Ganapathy <Arunachalam.Ganapathy(a)arm.com> Sent: Tuesday, June 8, 2021 2:08 PM To: Varun Wadekar <vwadekar(a)nvidia.com>; Olivier Deprez <Olivier.Deprez(a)arm.com> Cc: Bo Yan <byan(a)nvidia.com>; hafnium(a)lists.trustedfirmware.org Subject: Re: .git submodules increase hafnium code size External email: Use caution opening links or attachments Hi Varun, >> 1- First in context of Total Compute delivery from Arm OSS platforms: >> a. ability to build only the SPMC on TC0 platform (not all reference targets such as qemu, rpi4, fvp) >> b. use a Yocto provided toolchain. >> @Arun, your view on how those two items were solved is beneficial to further elaborate our plans. For Total Compute we wanted to skip cloning submodules (like driver/linux, linux, dtc) as only secure_hafnium.bin was required. Basically build only reference spm. Like: make PROJECT=reference_spm PLAT=TC this builds only secure hafnium for one platform. There were some efforts put on 1.a and was able to build reference_spm for one platform. But Hafnium build inside yocto forced to clone all submodules (due to dependencies on prebuilt toolchains), so the changes were dropped and also the changes weren't clean enough to be upstreamed. Regarding 1.b we didn't try hafnium build using yocto toolchain. Thanks, Arun -----Original Message----- From: Varun Wadekar <vwadekar(a)nvidia.com> Sent: Monday, June 7, 2021 14:43 To: Varun Wadekar <vwadekar(a)nvidia.com>; Olivier Deprez <Olivier.Deprez(a)arm.com>; Arunachalam Ganapathy <Arunachalam.Ganapathy(a)arm.com> Cc: Bo Yan <byan(a)nvidia.com>; hafnium(a)lists.trustedfirmware.org <hafnium(a)lists.trustedfirmware.org> Subject: RE: .git submodules increase hafnium code size Hi, >> @Arun, your view on how those two items were solved is beneficial to further elaborate our plans. @Arunachalam Ganapathy your comments on this topic would be very helpful. Thanks. -----Original Message----- From: Hafnium <hafnium-bounces(a)lists.trustedfirmware.org> On Behalf Of Varun Wadekar via Hafnium Sent: Monday, May 31, 2021 1:49 PM To: Olivier Deprez <Olivier.Deprez(a)arm.com>; hafnium(a)lists.trustedfirmware.org; Arunachalam Ganapathy <Arunachalam.Ganapathy(a)arm.com> Cc: Bo Yan <byan(a)nvidia.com> Subject: Re: [Hafnium] .git submodules increase hafnium code size External email: Use caution opening links or attachments Hi Olivier, Thanks for answering my queries. We are looking to deploy the following use case at NVIDIA. <snip> -ability to build only the SPMC (not all reference targets such as qemu, rpi4, fvp) -A distribution only requiring the Hypervisor/SPMC output binary ("out/reference/.../hafnium.bin") using any toolchain (be it arm64 or x86 host, and arbitrary clang version). <snip> >> As you noticed, the Hafnium Hypervisor/SPMC and test environment builds are closely coupled by the use of ninja/gn flow and scripts. We intend to approach those problems in the course of Q3 in Arm OSS roadmap. [VW] Are there any local changes to decouple hafnium from its dependencies? We can evaluate Arm;s approach against what we use internally. Our changes moved the dependencies out of the tree and passed file locations to the build system with the help of command line arguments. -Varun -----Original Message----- From: Olivier Deprez <Olivier.Deprez(a)arm.com> Sent: Monday, May 31, 2021 11:03 AM To: hafnium(a)lists.trustedfirmware.org; Varun Wadekar <vwadekar(a)nvidia.com>; Arunachalam Ganapathy <Arunachalam.Ganapathy(a)arm.com> Cc: Bo Yan <byan(a)nvidia.com> Subject: Re: .git submodules increase hafnium code size External email: Use caution opening links or attachments Hi Varun, We had similar requests raised internally. 1- First in context of Total Compute delivery from Arm OSS platforms: a. ability to build only the SPMC on TC0 platform (not all reference targets such as qemu, rpi4, fvp) b. use a Yocto provided toolchain. @Arun, your view on how those two items were solved is beneficial to further elaborate our plans. 2- A similar request as 1.b to build Hafnium as part of a distribution on arm64 host: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper… In my view there are two consumers: -A distribution only requiring the Hypervisor/SPMC output binary ("out/reference/.../hafnium.bin") using any toolchain (be it arm64 or x86 host, and arbitrary clang version). -The Hf CI framework/automation needs the above, plus the test framework and tests (dependency on googletest, linux submodules etc). It's important to keep this item alive while trying to solve above item. As you noticed, the Hafnium Hypervisor/SPMC and test environment builds are closely coupled by the use of ninja/gn flow and scripts. They are using a fixed toolchain version through prebuilts to ensure builds are "reproducible", in particular with regards to the Hafnium CI. We intend to approach those problems in the course of Q3 in Arm OSS roadmap. As an early exploration we already have: -clang 12 compiler upgrade. This is necessary if wiling to use any arbitrary clang version: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.tr… -Ability to build on arm64 host (done, internally). -Identify the flow/script changes such that external dependencies can be used (on-going, internally). I thought of localizing common dependencies to python/shell scripts by the use of definition files included in the mentioned scripts. This is only an early investigation, I will check how this intersects the changes you provided. Regards, Olivier. From: Hafnium <hafnium-bounces(a)lists.trustedfirmware.org> on behalf of Varun Wadekar via Hafnium <hafnium(a)lists.trustedfirmware.org> Sent: 28 May 2021 16:47 To: hafnium(a)lists.trustedfirmware.org <hafnium(a)lists.trustedfirmware.org> Cc: Bo Yan <byan(a)nvidia.com> Subject: [Hafnium] .git submodules increase hafnium code size Hi, We at NVIDIA are evaluating Hafnium. During the initial investigation, we found out that the repository size (in terms of MB) is huge. This is mostly because of the "git submodules" used by the project. This is a great way to deliver Hafnium with its dependencies in one go. But we think that the size can be trimmed by moving the toolchain, linux folder, googletest and dtc compiler out, leaving just the Hafnium code in the project. This way, companies like us can pick and choose instead of having to use everything. In a bid to ease the pain internally and only use the Hafnium code base we have crafted the following changes: 1. hafnium: support external projects (I10a07de3) * Gerrit Code Review (trustedfirmware.org)<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.tr…> 2. hafnium: build with dtc and googletest out of tree (I057c9ad6) * Gerrit Code Review (trustedfirmware.org)<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.tr…> 3. build: support external toolchain (Iafd029c1) * Gerrit Code Review (trustedfirmware.org)<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.tr…> This series does not have the patch to use an out of tree linux codebase. I assume these patches wont be acceptable in their current state, so would like to know how the community plans to handle this situation. The code size is a real concern for us, as we already have copies of the dependencies in our codebase, so have no use for these duplicates. Thanks. -- Hafnium mailing list Hafnium(a)lists.trustedfirmware.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.tru… -- Hafnium mailing list Hafnium(a)lists.trustedfirmware.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.tru… -- Hafnium mailing list Hafnium(a)lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/hafnium

4 years, 1 month

Re: [Hafnium] Bug in hftest.py

by Federico Recanati

Hi Raghu, a fix for hftest.py has been merged: https://review.trustedfirmware.org/c/hafnium/hafnium/+/12481 addressing the random failures of both-world tests and supporting connection to telnet ports other than 5000 Cheers, Federico ____________________________________ > From: raghu.ncstate(a)icloud.com <raghu.ncstate(a)icloud.com> > Sent: 04 August 2021 21:01 > To: Olivier Deprez; 'Raghu Krishnamurthy via Hafnium' > Subject: RE: [Hafnium] Bug in hftest.py > > Thanks Olivier. I've created https://developer.trustedfirmware.org/T955 to > track. Understood all of this is new. I do have local fixes to get around > the issue so not a hurry to have a fix merged, but something to consider and > fix since it will eventually show up. > >>> the both worlds test scenario is not 100% stable on my machine > [RK] Likewise. I've noticed that this is caused by lingering FVP processes. > Usually I ps -ax | grep for FVP instances, kill and then run tests and I > never see failures after that. The issue that I faced was that the lingering > FVP would take up telnet ports and the newly spawned ones use different > ports(>5004) than what hftest.py expects. It appears that when tests fail, > we may not be cleaning up/exiting processes properly, but I haven't checked. > Or the code may be just fine and a ctrl+c leaves those processes lingering. > > Thanks > Raghu > > -----Original Message----- > From: Olivier Deprez <Olivier.Deprez(a)arm.com> > Sent: Tuesday, August 3, 2021 11:51 PM > To: 'Raghu Krishnamurthy via Hafnium' <hafnium(a)lists.trustedfirmware.org>; > raghu.ncstate(a)icloud.com > Subject: Re: [Hafnium] Bug in hftest.py > > Hi Raghu, > > Thanks for reporting. > This part of the test infrastructure (testing the SPMC) is still very fresh > and requires improvement iterations so please bear with us. Also a reason > it's not yet part of the automated non-regression with jenkins (as opposed > to the legacy kokoro/test.sh). For the time being we still mostly rely on > the TF-A CI for testing on the secure side. > > IIUC this change was made to help with the test time as the FVP takes long > to reload on every test. > But indeed it might have the side effect you describe. > So either we revert the FVP reloading on every test. > Or another (somewhat hackish) possibility is to clear the mentioned > variables from within the test (or make them part of BSS)? > > To be fair, the both worlds test scenario is not 100% stable on my machine > (for some reason the connection is not always successful between the FVP and > hftest) hence limiting confidence/robustness of my testing and > investigations. So I wonder is the scripting is still somewhat a bit > fragile. > > Regards, > Olivier. > > ________________________________________ > From: Hafnium <hafnium-bounces(a)lists.trustedfirmware.org> on behalf of Raghu > Krishnamurthy via Hafnium <hafnium(a)lists.trustedfirmware.org> > Sent: 03 August 2021 23:47 > To: 'Raghu Krishnamurthy via Hafnium' > Subject: [Hafnium] Bug in hftest.py > > Hi All, > > > > Wanted to report to you that commit 18a25f9241f86ba2d637011ff465ce3869e8651b > in hafnium "appears" broken. The issue with the optimization in this patch > is that the partition images are not reloaded for each test run, which means > a previous test could have written data to say SRAM, and the following test > would use the old values from the previous test, when the same image is > executed again from SRAM for a following test. This would be a problem for > pretty much anything in the data section of a partition. In my case, I have > a counter in the data section of my partition, which does not get reset back > to its original value. > > I've attached a patch to help repro the issue. Fix is to disable the > optimization or somehow reload the images for each run. This affects only > "both world" tests. > > Let me know if I'm missing something here. > > > > Apply patch and run timeout --foreground 300s ./test/hftest/hftest.py > --out_partitions out/reference/secure_aem_v8a_fvp_vm_clang --log > out/reference/kokoro_log --spmc > out/reference/secure_aem_v8a_fvp_clang/hafnium.bin --driver=fvp --hypervisor > out/reference/aem_v8a_fvp_clang/hafnium.bin --partitions_json > test/vmapi/ffa_secure_partitions/ffa_both_world_partitions_test.json > > > > The command line is from kokoro/test_spmc.sh. > > > > Thanks > > Raghu > > > > -- > Hafnium mailing list > Hafnium(a)lists.trustedfirmware.org > https://lists.trustedfirmware.org/mailman/listinfo/hafnium >

4 years, 2 months

Hafnium toolchain switch

by Olivier Deprez

Hi, A small heads up that from this change: https://review.trustedfirmware.org/c/hafnium/hafnium/+/11613 a developer needs to provide the LLVM/clang toolchain through the PATH environment variable. See the documentation update: https://review.trustedfirmware.org/plugins/gitiles/hafnium/hafnium/+/HEAD/d… Until then the toolchain was hardcoded to use the version stored in the prebuilt submodule. >From now, this weakens the dependency to the prebuilt toolchain and provides flexibility with providing an alternate toolchain on a x86 host. This also opens to building on an aarch64 host. This has been tested with different combinations of hosts, ubuntu releases and toolchains downloaded from https://releases.llvm.org/download.html x86_64 Ubuntu 18.04 / 20.04 clang/llvm 12.0.0 , 13.0.0 aarch64 Ubuntu 19.04 (Ampere eMAG) / 20.04, 21.04 (Rpi4) clang/llvm 12.0.0 , 12.0.1, 13.0.0 It is still possible to point PATH to the prebuilt toolchain version (Android llvm/clang 12.0.5) as indicated in the documentation. If you have a live tree, please clean the out directory or run make clobber, once you update master. Builds run as before after the switch. Limitations: -the build breaks if using a native toolchain installed on the host (apt-get install clang..) -the build breaks with Ubuntu 21.10/AArch64 (under investigation). Regards, Olivier.

4 years, 2 months

Re: [Hafnium] [TF-A] A problem about assert failed in TF-A

by Joanna Farley

TF-A List. This issue has also been discussed on Hafnium list before being posted here. Cross posting so we can have a single thread to track going forward. See https://lists.trustedfirmware.org/pipermail/hafnium/2021-December/000209.ht… with Olivier's last reply copied below. But see the archive above for full history of the thread. > Hi Wang, > > With this level of details; this is difficult to say. You can extend to the TF-A ML if you wish. I'm hinting the SPMD because you are mentioning spmd_smc_forward and cm_el1/2_sysregs_context_restore which are within the SPMD/EL3 space. I wouldn't expect such assert to happen in any regular use case of the reference implementation (because this is a hard EL3 failure). But yes, the problem can be elsewhere in Hafnium or Cactus, but I'd say less likely to alter the EL3 state. Unless Hafnium has a bug leading to corrupting a secure memory region which doesn't belong to it. > Beyond this, notice the assert is taken in cm_el1_sysregs_context_restore. It is called by cm_prepare_el3_exit which means it can be related to power management e.g. on a psci resume event. This can be a hint as you say this is occurring 'randomly'. > > Regards, > Olivier. Joanna On 14/12/2021, 19:39, "TF-A on behalf of Chenxu Wang via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote: Hi all, I am running FVP with 2CPUs, Cactus SP (SEL1), Hafnium (SEL2) and KVM VHE. Sometimes I send the "FFA_MSG_SEND_DIRECT_REQ" smc call from KVM (I fill 0x8400006f in x0, then VMID and SP ID in x1, let x2 as 0). It says assert failed, like this: ASSERT: lib/el3_runtime/aarch64/context_mgmt.c:651 BACKTRACE: START: assert 0: EL3: 0x4005cac 1: EL3: 0x400323c 2: EL3: 0x400620c 3: EL3: 0x400e180 4: EL3: 0x4005a94 BACKTRACE: END: assert After I check the bl31.dump, I notice that: when services/std_svc/spmd/spmd_main.c sends the FFA call (from NS to S) via "spmd_smc_forward(smc_fid, secure_origin,x1, x2, x3, x4, handle)", it will go to cm_el1_sysregs_context_restore(secure_state_out) and cm_el2_sysregs_context_restore(secure_state_out), then it will assert the cm_get_context(). it gets the NULL context, so assert failed. Before the problem appeared, I have modified many codes on a dirty TF-A v2.4 (commit hash is 0aa70f4c4c023ca58dea2d093d3c08c69b652113), Hafnium and TF-A-TESTS. I also mail with Hafnium MailList, they consider it can be a problem in EL3. Such assert is NOT ALWAYS failed. I mean, maybe when I run FVP and send "smc" now, it is failed. But when I shut down, run FVP, and send the same instruction with the same parameter again, it is OK. I want to know, what is the possible reasons for suddenly losing the secure context. Can you give me some advice on debugging? e.g., where should I check? Need I provide more info? Sincerely, Wang -- TF-A mailing list TF-A(a)lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a

4 years, 2 months

A problem about assert failed in TF-A

by Chenxu Wang

Hi all, I am running Hafnium on FVP, with Cactus SP in SEL1 and KVM VHE enabled. Sometimes I send the "FFA_MSG_SEND_DIRECT_REQ" smc call in KVM (I fill 0x8400006f in x0, then VMID and SP ID in x1, let x2 as 0). It says assert failed, like this: ASSERT: lib/el3_runtime/aarch64/context_mgmt.c:651 BACKTRACE: START: assert 0: EL3: 0x4005cac 1: EL3: 0x400323c 2: EL3: 0x400620c 3: EL3: 0x400e180 4: EL3: 0x4005a94 BACKTRACE: END: assert I notice that when services/std_svc/spmd/spmd_main.c sends the FFA call (from NS to S) via "spmd_smc_forward(smc_fid, secure_origin,x1, x2, x3, x4, handle)", it will go to cm_el1_sysregs_context_restore(secure_state_out) and cm_el2_sysregs_context_restore(secure_state_out), then it will assert the cm_get_context(). it gets the NULL context, so assert failed. Such assert is NOT ALWAYS failed, but I still want to solve this problem. Since I have modified many lines of code in Hafnium and Cactus SP, I cannot show them here. Can you give me some advice on debugging? e.g., where should I check?

4 years, 2 months

2026

2025

2024

2023

2022

2021

2020

Hafnium December 2021