Hi Olivier,
I've just pushed my TF-A changes for S-EL2 with QEMU, available at https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)
Cheers, Jens
On Mon, Nov 14, 2022 at 7:53 AM Jens Wiklander jens.wiklander@linaro.org wrote:
Hi Olivier,
Comments below.
Cheers, Jens
On Thu, Nov 10, 2022 at 9:48 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
Hi Jens,
See more comments below [OD].
I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.
[JW] Sure
Regards, Olivier.
From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE
Hi Olivier,
Thanks for the encouragement. :-) Some comments inline below.
Cheers, Jens
On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
Hi Jens,
Thanks for this update, this is extremely useful!
See few comments [OD]
Regards, Olivier.
From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE
Hi,
I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.
I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...
My setup can be duplicated with:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only
With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception
[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.
[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.
[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.
[JW] I guess it could depend on the Linux kernel too, which pages are used etc.
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.
[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.
[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.
[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂
[JW] I was trying out different options. We have both configurations in the code anyway to deal with GICv2 vs GICv3. I prefer using vFIQ for foreign interrupts though.
Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?
[JW] No, the preferred configuration is to signal that a managed exit is due with an FIQ and use IRQ for OP-TEE native interrupt handling.
What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.
[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not?
[JW] No, that I can see.
If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.
[JW] I tried managed exit with and without HF_INTERRUPT_GET and couldn't see any benefit in using HF_INTERRUPT_GET. When done like in your patch there's the disadvantage that the managed exit interrupt must have the highest priority of all interrupts or we may not get the one we're expecting. Normally I'd expect a managed exit interrupt to have lower priority than the native interrupts.
I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7
[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.
[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.
It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?
[OD] This is interesting, not something we see with FVP so worth investigating more.
[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC.
[JW] Makes sense
I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)
[JW] Perhaps this mitigation isn't needed on post Armv8.4 hardware.
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:
- 79b4d2cbe06e SPMC: add missing ME initialization for secondary cores
- 659c79d5eacf feat(mm): fix FEAT_LPA workaround
[OD] Noted. Eventually we might have to upstream those changes.
[JW] Agreed
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:
- a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc
- 4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value
- fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces
- ca65081b9cdc feat(sptool): add dependency to SP image
- b1e1b46a0680 fix(qemu): restore code to added needed psci nodes
For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:
- 1057def23777 plat-vexpress: sel2 spmc: update for hafnium
- f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2
[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.
[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.
[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).
[JW] Thanks, I'll consider it. By the way, do you have recommendations for when to use SMC instead of HVC and vice versa with Hafnium?
- d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC
There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2
- 0c1e39672dcb Read PS bits from VTCR_EL2
[OD] Great.
The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".
[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.
I'll try to upstream the Hafnium and TF-A patches that make sense on their own.
What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?
[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.
[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.
[OD] Yes I see it working and this is a great test harness!
The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.
[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.
[JW] OK
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.
[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.
It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.
[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)
[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.
If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.
Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)
[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)
[JW] Thanks
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.
[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.
I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.
[OD] Thanks for this. At least me is watching carefully 🙂
:-)