Hi Olivier,
Comments below.
On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
Hi Jens,
Thanks for your feedback. See comments inline [OD].
Regards, Olivier.
From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE
Hi Olivier,
On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
Hi Jens,
We should now have all needed changes merged to TF-A/Hafnium upstream in particular:
TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)
Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310
I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).
That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8
I'm not sure what's wrong.
[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0. We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0. So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.
[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")
With that applied everything seems to work.
Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2 The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago: see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...
BTW noticed a few random findings in the Makefile:
https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary
[JW] That's odd, I'm using that.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an initial build.
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.
[JW] Thanks, I'll fix these.
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:
- Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.
You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).
To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?
[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).
- Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.
This makes sense. I'll take a close look and add it to my branch later.
- Earlier findings around managed exit virq and HF_INTERRUPT_GET.
- Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).
I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.
I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.
[OD] Would you be able to progress with v7.1.0 as stated above?
I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.
We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.
[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?
[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".
We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?
[JW] Absolutely
Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.
[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.
The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.
[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.
Cheers, Jens