Hi Olivier,
Thanks for the encouragement. :-) Some comments inline below.
Cheers, Jens
On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
Hi Jens,
Thanks for this update, this is extremely useful!
See few comments [OD]
Regards, Olivier.
From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE
Hi,
I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.
I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...
My setup can be duplicated with:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only
With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception
[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.
[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.
[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.
[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.
What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.
I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7
[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.
[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.
It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?
[OD] This is interesting, not something we see with FVP so worth investigating more.
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:
- 79b4d2cbe06e SPMC: add missing ME initialization for secondary cores
- 659c79d5eacf feat(mm): fix FEAT_LPA workaround
[OD] Noted. Eventually we might have to upstream those changes.
[JW] Agreed
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:
- a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc
- 4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value
- fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces
- ca65081b9cdc feat(sptool): add dependency to SP image
- b1e1b46a0680 fix(qemu): restore code to added needed psci nodes
For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:
- 1057def23777 plat-vexpress: sel2 spmc: update for hafnium
- f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2
[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.
[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.
- d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC
There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2
- 0c1e39672dcb Read PS bits from VTCR_EL2
[OD] Great.
The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".
[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.
I'll try to upstream the Hafnium and TF-A patches that make sense on their own.
What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?
[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.
[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.
The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.
[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.
It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.
[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)
[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.
If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.
Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.
[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.
I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.
[OD] Thanks for this. At least me is watching carefully 🙂
:-)