[Hafnium] Re: Hafnium with QEMU and OP-TEE

9 Feb 2023

      Hi Olivier,
Comments below.
On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
...
Hi Jens,
Thanks for your feedback.
See comments inline [OD].
Regards,
Olivier.

From: Jens Wiklander jens.wiklander@linaro.org
Sent: 06 February 2023 10:55
To: Olivier Deprez Olivier.Deprez@arm.com
Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org
Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE
Hi Olivier,
On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
...
Hi Jens,
We should now have all needed changes merged to TF-A/Hafnium upstream in particular:
TF-A:
https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)
Hafnium:
https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042
https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111
https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322
https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310
I'm reproducing your qemu setup by using TF-A + Hafnium tips of master.
It passes full xtest suite (incl. regression_1034 that was failing earlier).
That's good news! I tried updating to the latest on Hafnium. However,
when booting it fails with:
INFO: Initializing Hafnium (SPMC)
INFO: text: 0xe100000 - 0xe125000
INFO: rodata: 0xe125000 - 0xe12b000
INFO: data: 0xe12b000 - 0xe214000
INFO: stacks: 0xe214000 - 0xe224000
INFO: Supported bits in physical address: 48
INFO: Stage 2 has 4 page table levels with 1 pages at the root.
INFO: Stage 1 has 4 page table levels with 1 pages at the root.
ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8
I'm not sure what's wrong.
[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0.
We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0.
So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.
[JW] Jerfome Forissier reminded me that this is a known bug in qemu
v7.2.0, it's fixed with:
28fb921f02ef ("target/arm: Fix physical address resolution for MTE")
With that applied everything seems to work.
...
Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2
The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago:
see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...
BTW noticed a few random findings in the Makefile:
https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471
This seems to cause issues with qemu v7.2.0:
qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary
[JW] That's odd, I'm using that.
...
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198
CTX_INCLUDE_MTE_REGS should be taken care of by line 226.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229
I appreciate this is a temp. change, but actually prevents doing an initial build.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497
QEMU_MTE option overrides settings done lines 492-496.
[JW] Thanks, I'll fix these.
...
...
Here's what I have on my list as next steps.
Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation).
If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).
To me this looks like a pointless switch back and forth between S-EL1
and S-EL2. Hafnium is obviously able to handle the case where this
isn't done. What is the advantage from a system point of view by
acking the ME interrupt?
[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).
...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.
...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.
Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.
I'm trying to update to upstream only, we just merged the OP-TEE
patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck
on the data abort in Hafnium above.
[OD] Would you be able to progress with v7.1.0 as stated above?
I checked out 28fb921f02ef ("target/arm: Fix physical address
resolution for MTE") instead. I'll publish my changes on the qemu_sel2
setup soon.
...
We would like to integrate this in our CI loop in a not too distant
future. We usually try to use TF-A release tags so we may want to wait
for v2.9. However, we have a larger problem with Hafnium sources.
Since we have one common repo setup for all QEMU v8 based
configuration it's not reasonable to add 3GB to configurations that
doesn't care about Hafnium. We have to find some way to address this.
Perhaps we'll have to add a special repo configuration for Hafnium
only, but that's not ideal either since it increases the maintenance
burden a bit.
[OD] Yes I fully understand this concern.
Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?
[JW] We're normally basing the CI setup on the usual setup. You can in
principle repeat what CI is doing with "make check".
...
We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?
[JW] Absolutely
...
Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.
[JW] I'm not sure if it's possible to compile OP-TEE with clang only.
There's something missing in objcopy or the linker if I remember
correctly.
...
The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.
[JW] That's a welcome change. Another improvement that comes to mind
is to only build the needed target instead of all targets.
Cheers,
Jens

2025

2024

2023

2022

2021

2020

[Hafnium] Re: Hafnium with QEMU and OP-TEE