Hi,
On Thu, 9 Feb 2023 at 13:21, Jens Wiklander jens.wiklander@linaro.org wrote:
Hi Olivier,
Comments below.
On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
Hi Jens,
Thanks for your feedback. See comments inline [OD].
Regards, Olivier.
From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org <hafnium@lists.trustedfirmware.org
Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE
Hi Olivier,
On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com
wrote:
Hi Jens,
We should now have all needed changes merged to TF-A/Hafnium upstream
in particular:
TF-A:
https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)
Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310
I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing
earlier).
That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8
I'm not sure what's wrong.
[OD] That's unfortunate. My best bet is an issue with MTE happening
between qemu v7.1.0 and v7.2.0.
We do all our testing with v7.1.0 eventually. And I'll have a closer
look at what's going with v7.2.0.
So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.
[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")
With that applied everything seems to work.
Afterwards it traps some GIC register access panic to EL3 at early linux
boot ( I can give more details). I worked this around by using -machine virt-6.2
The overall MTE story needs more investigation as I had noticed issues
with booting Hafnium+OP-TEE on FVP some time ago:
see bullet 3 from
https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...
BTW noticed a few random findings in the Makefile:
https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is
not compiled into this binary
[JW] That's odd, I'm using that.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an
initial build.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.
[JW] Thanks, I'll fix these.
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:
- Ack the ME interrupt in foreign interrupt exit. See attached
0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.
You might not have observed a difference by omitting this, because of
how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation).
If possible, it would be preferable integrating this one way or the
other (I admit adding to the foreign interrupt handler code path might not look clean).
To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?
[OD] Yes I understand your position. And tbh this is actually more an
impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).
- Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium
saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.
This makes sense. I'll take a close look and add it to my branch later.
- Earlier findings around managed exit virq and HF_INTERRUPT_GET.
- Earlier finding with SEL2 trapping cache operation
(CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).
I'm interested to know whether you're able to progress with this
configuration, also in particular if you intend to integrate into an automated CI mid term.
I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.
[OD] Would you be able to progress with v7.1.0 as stated above?
I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.
We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.
[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned
from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?
[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".
[JF] FWIW a typical CI job is here: https://github.com/OP-TEE/optee_os/blob/master/.github/workflows/ci.yml#L247... We would simply add a similar section for the Hafnium build, with the proper flags set.
We have in mind to remove the clang toolchain from prebuilts submodule.
That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?
[JW] Absolutely
[JF] Already there: make clang-toolchains :)
Would it be considered as an option to use this same clang toolchain for
all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.
[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.
[JW] It is :) You may be thinking about this: https://github.com/OP-TEE/optee_os/commit/33017d856e8c But as the commit says it's fine now. There is a gotcha with Clang though, we need the compiler-rt libraries for both 32 and 64-bit and that's why we have a special download script https://github.com/OP-TEE/build/blob/3.20.0/get_clang.sh (invoked by make clang-toolchains)
The other 'improvement' would be to find a way to get rid of the linux
sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.
[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.
Cheers, Jens
Regards,