Hafnium with QEMU and OP-TEE

List overview All Threads
Download

newer

older

optee_benchmark found optee_os...

optee_benchmark pmccfiltr_el0

Jens Wiklander

7 Nov 2022 7 Nov '22

9:41 a.m.

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2: - 79b4d2cbe06e SPMC: add missing ME initialization for secondary cores - 659c79d5eacf feat(mm): fix FEAT_LPA workaround

For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2: - a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc - 4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value - fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces - ca65081b9cdc feat(sptool): add dependency to SP image - b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2: - 1057def23777 plat-vexpress: sel2 spmc: update for hafnium - f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2 - d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2 - 0c1e39672dcb Read PS bits from VTCR_EL2

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all. It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A. Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

Thanks, Jens

Show replies by date

Olivier Deprez

7 Nov 7 Nov

3:32 p.m.

Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

________________________________ From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Noted. Eventually we might have to upstream those changes.

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

- d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2 - 0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

[OD] Thanks for this. At least me is watching carefully 🙂

Thanks, Jens -- Hafnium mailing list -- hafnium@lists.trustedfirmware.org To unsubscribe send an email to hafnium-leave@lists.trustedfirmware.org

Jens Wiklander

9 Nov 9 Nov

8:47 a.m.

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

...

Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.

What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...

This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...

I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...

For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.

...

The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

...

Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Olivier Deprez

10 Nov 10 Nov

8:48 p.m.

Hi Jens,

See more comments below [OD].

I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.

Regards, Olivier.

________________________________ From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.

...

Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂 Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?

What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not? If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.

I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...

This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...

I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC. I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)

...

For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...

For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).

...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

[OD] Yes I see it working and this is a great test harness!

[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.

...

The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)

...

Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Jens Wiklander

14 Nov 14 Nov

6:53 a.m.

Hi Olivier,

Comments below.

Cheers, Jens

On Thu, Nov 10, 2022 at 9:48 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

See more comments below [OD].

I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.

[JW] Sure

...

Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.

[JW] I guess it could depend on the Linux kernel too, which pages are used etc.

...

...
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.

[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂

[JW] I was trying out different options. We have both configurations in the code anyway to deal with GICv2 vs GICv3. I prefer using vFIQ for foreign interrupts though.

...

Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?

[JW] No, the preferred configuration is to signal that a managed exit is due with an FIQ and use IRQ for OP-TEE native interrupt handling.

...

What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not?

[JW] No, that I can see.

...

If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.

[JW] I tried managed exit with and without HF_INTERRUPT_GET and couldn't see any benefit in using HF_INTERRUPT_GET. When done like in your patch there's the disadvantage that the managed exit interrupt must have the highest priority of all interrupts or we may not get the one we're expecting. Normally I'd expect a managed exit interrupt to have lower priority than the native interrupts.

...

I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC.

[JW] Makes sense

...

I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)

[JW] Perhaps this mitigation isn't needed on post Armv8.4 hardware.

...

...
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).

[JW] Thanks, I'll consider it. By the way, do you have recommendations for when to use SMC instead of HVC and vice versa with Hafnium?

...

...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

[OD] Yes I see it working and this is a great test harness!

The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.

[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.

[JW] OK

...

...
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)

[JW] Thanks

...

...
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Jens Wiklander

23 Nov 23 Nov

8:24 a.m.

Hi Olivier,

I've just pushed my TF-A changes for S-EL2 with QEMU, available at https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Cheers, Jens

On Mon, Nov 14, 2022 at 7:53 AM Jens Wiklander jens.wiklander@linaro.org wrote:

...

Hi Olivier,

Comments below.

Cheers, Jens

On Thu, Nov 10, 2022 at 9:48 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

See more comments below [OD].

I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.

[JW] Sure

...
Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.

[JW] I guess it could depend on the Linux kernel too, which pages are used etc.

...
...
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.

[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂

[JW] I was trying out different options. We have both configurations in the code anyway to deal with GICv2 vs GICv3. I prefer using vFIQ for foreign interrupts though.

...
Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?

[JW] No, the preferred configuration is to signal that a managed exit is due with an FIQ and use IRQ for OP-TEE native interrupt handling.

...
What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not?

[JW] No, that I can see.

...
If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.

[JW] I tried managed exit with and without HF_INTERRUPT_GET and couldn't see any benefit in using HF_INTERRUPT_GET. When done like in your patch there's the disadvantage that the managed exit interrupt must have the highest priority of all interrupts or we may not get the one we're expecting. Normally I'd expect a managed exit interrupt to have lower priority than the native interrupts.

...
I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC.

[JW] Makes sense

...
I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)

[JW] Perhaps this mitigation isn't needed on post Armv8.4 hardware.

...
...
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).

[JW] Thanks, I'll consider it. By the way, do you have recommendations for when to use SMC instead of HVC and vice versa with Hafnium?

...
...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

[OD] Yes I see it working and this is a great test harness!

The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.

[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.

[JW] OK

...
...
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)

[JW] Thanks

...
...
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Olivier Deprez

3 Feb 3 Feb

5:02 p.m.

Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

1. Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch. You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean). 2. Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea. 3. Earlier findings around managed exit virq and HF_INTERRUPT_GET. 4. Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

Thanks, Olivier.

________________________________ From: Jens Wiklander jens.wiklander@linaro.org Sent: 23 November 2022 09:24 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

I've just pushed my TF-A changes for S-EL2 with QEMU, available at https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Cheers, Jens

On Mon, Nov 14, 2022 at 7:53 AM Jens Wiklander jens.wiklander@linaro.org wrote:

...

Hi Olivier,

Comments below.

Cheers, Jens

On Thu, Nov 10, 2022 at 9:48 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

See more comments below [OD].

I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.

[JW] Sure

...
Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.

[JW] I guess it could depend on the Linux kernel too, which pages are used etc.

...
...
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.

[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂

[JW] I was trying out different options. We have both configurations in the code anyway to deal with GICv2 vs GICv3. I prefer using vFIQ for foreign interrupts though.

...
Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?

[JW] No, the preferred configuration is to signal that a managed exit is due with an FIQ and use IRQ for OP-TEE native interrupt handling.

...
What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not?

[JW] No, that I can see.

...
If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.

[JW] I tried managed exit with and without HF_INTERRUPT_GET and couldn't see any benefit in using HF_INTERRUPT_GET. When done like in your patch there's the disadvantage that the managed exit interrupt must have the highest priority of all interrupts or we may not get the one we're expecting. Normally I'd expect a managed exit interrupt to have lower priority than the native interrupts.

...
I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC.

[JW] Makes sense

...
I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)

[JW] Perhaps this mitigation isn't needed on post Armv8.4 hardware.

...
...
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).

[JW] Thanks, I'll consider it. By the way, do you have recommendations for when to use SMC instead of HVC and vice versa with Hafnium?

...
...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

[OD] Yes I see it working and this is a great test harness!

The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.

[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.

[JW] OK

...
...
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)

[JW] Thanks

...
...
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Jens Wiklander

6 Feb 6 Feb

9:55 a.m.

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8

I'm not sure what's wrong.

...

Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).

To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?

...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.

Cheers, Jens

...

Thanks, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 23 November 2022 09:24 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

I've just pushed my TF-A changes for S-EL2 with QEMU, available at https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Cheers, Jens

On Mon, Nov 14, 2022 at 7:53 AM Jens Wiklander jens.wiklander@linaro.org wrote:

...
Hi Olivier,

Comments below.

Cheers, Jens

On Thu, Nov 10, 2022 at 9:48 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

See more comments below [OD].

I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.

[JW] Sure

...
Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.

[JW] I guess it could depend on the Linux kernel too, which pages are used etc.

...
...
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.

[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂

[JW] I was trying out different options. We have both configurations in the code anyway to deal with GICv2 vs GICv3. I prefer using vFIQ for foreign interrupts though.

...
Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?

[JW] No, the preferred configuration is to signal that a managed exit is due with an FIQ and use IRQ for OP-TEE native interrupt handling.

...
What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not?

[JW] No, that I can see.

...
If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.

[JW] I tried managed exit with and without HF_INTERRUPT_GET and couldn't see any benefit in using HF_INTERRUPT_GET. When done like in your patch there's the disadvantage that the managed exit interrupt must have the highest priority of all interrupts or we may not get the one we're expecting. Normally I'd expect a managed exit interrupt to have lower priority than the native interrupts.

...
I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC.

[JW] Makes sense

...
I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)

[JW] Perhaps this mitigation isn't needed on post Armv8.4 hardware.

...
...
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).

[JW] Thanks, I'll consider it. By the way, do you have recommendations for when to use SMC instead of HVC and vice versa with Hafnium?

...
...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

[OD] Yes I see it working and this is a great test harness!

The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.

[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.

[JW] OK

...
...
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)

[JW] Thanks

...
...
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Olivier Deprez

7 Feb 7 Feb

1:12 p.m.

Hi Jens,

Thanks for your feedback. See comments inline [OD].

Regards, Olivier.

________________________________ From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

I'm not sure what's wrong.

[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0. We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0. So I reverted to qemu v7.1.0 and booted TF-A+Hafnium. Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2 The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago: see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...

BTW noticed a few random findings in the Makefile:

https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an initial build.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.

...

Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).

[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).

...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

[OD] Would you be able to progress with v7.1.0 as stated above?

[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?

We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps? Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.

The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.

Cheers, Jens

...

Thanks, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 23 November 2022 09:24 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

I've just pushed my TF-A changes for S-EL2 with QEMU, available at https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Cheers, Jens

On Mon, Nov 14, 2022 at 7:53 AM Jens Wiklander jens.wiklander@linaro.org wrote:

...
Hi Olivier,

Comments below.

Cheers, Jens

On Thu, Nov 10, 2022 at 9:48 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

See more comments below [OD].

I'm thinking a lot of your qemu integration might be useful to FVP/VExpress/Total Compute platforms when used with S-EL2/Hafnium. Something to talk about for later, perhaps.

[JW] Sure

...
Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 09 November 2022 09:47 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

Thanks for the encouragement. :-) Some comments inline below.

Cheers, Jens

On Mon, Nov 7, 2022 at 4:32 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for this update, this is extremely useful!

See few comments [OD]

Regards, Olivier.

From: Jens Wiklander via Hafnium hafnium@lists.trustedfirmware.org Sent: 07 November 2022 10:41 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on QEMU v7.0.0. I've run into a few problems and fixed most of them.

I believe the setup is similar to what Shiju is using in this mail thread https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwar...

My setup can be duplicated with:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 (cd hafnium && git submodule init && git submodule update) cd build make -j8 toolchains make -j8 all make run-only

With this xtest -x 1034 passes, xtest 1034 often causes ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c Panic: EL2 exception

[OD] I reproduce it, I suspect this might have to do with how OP-TEE workspace size (IPA range) is defined either in the platform makefile or the Hafnium VM configuration.

[JW] This test case often causes fragmented memory share operations, so the problem might relate to that too.

[OD] Yes, I see it panics here: https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/api.c#n3233 because "to" pointer is NULL. I was expecting fragmented mem sharing to work, but I'll check further next week on a valid reason. I checked 1034 passes with TC SW stack, based on OP-TEE ~ 3.18 so I wonder if there were changes in 3.19 that may trigger this now.

[JW] I guess it could depend on the Linux kernel too, which pages are used etc.

...
...
Xtest runs dreadfully slow, I haven't investigated why yet, but at least it works.

[OD] I think this might be because of some missing bit with managed exit, e.g. the FIQ exit path has to ack the ME virtual interrupt (similarly to how it's done on TC platform). I have a change for it that I may be able to share.

[JW] I think I've found it. "[PATCH] WIP: Enable managed exit", right? I've tried something similar. I'm first enabling it with HF_INTERRUPT_ENABLE (which returns 0) then when a IRQ (I'm using managed-exit-virq) is received I'm calling HF_INTERRUPT_GET, but HF_INTERRUPT_GET returns 0xffffffff so I guess something isn't quite right. However, if I remove managed-exit-virq and expect FIQ to indicate managed exit interrupt things works as expected.

[OD] Ok I wasn't expecting the OP-TEE design to use vIRQ for foreign interrupts 🙂

[JW] I was trying out different options. We have both configurations in the code anyway to deal with GICv2 vs GICv3. I prefer using vFIQ for foreign interrupts though.

...
Though I'll investigate more what's going on when routed as vIRQ. Meanwhile, I was referring to the fact that for either of vFIQ or vIRQ, the TEE is meant to call HF_INTERRUPT_GET in order to ack the interrupt at the virtual interrupt controller (notice this is an implementation detail, as FF-A v1.x doesn't specify this). For ME, the TEE doesn't have to call deactivate as this is a pure virtual interrupt. In OP-TEE's foreign interrupt code path, there isn't this acknowledgement presently (see attached diff where it's added). I assume it works still, because the next immediate action in the foreign interrupt code path is a direct resp. in which case the SPMC assumes the managed exit operation completed and de-asserts the virtual interrupt. I can see that both IRQ/FIQ could merge to vIRQ and determine based on HF_INTERRUPT_GET, if going for foreign or native interrupt code path. Is this what you had in mind?

[JW] No, the preferred configuration is to signal that a managed exit is due with an FIQ and use IRQ for OP-TEE native interrupt handling.

...
What's the difference when using HF_INTERRUPT_ENABLE for managed exit? Stuff seems to run at a similar speed and OP-TEE does managed exit and is later resumed at the same rate as far as I can tell.

[OD] Since recently, ME interrupt is meant to be automatically enabled by Hafnium on partition loading. So it should no longer be required for a SP to call HF_INTERRUPT_ENABLE. Do you mean you can't spot a difference to when managed exit is enabled or not?

[JW] No, that I can see.

...
If NS interrupts are not serviced timely (here through managed exit), this can trigger linux warnings saying a core is stalled while doing a long operation (like crypto). If you remove the managed exit field from manifest, OP-TEE would just be pre-empted and execution returned to normal world without notice which I don't believe is what we want.

[JW] I tried managed exit with and without HF_INTERRUPT_GET and couldn't see any benefit in using HF_INTERRUPT_GET. When done like in your patch there's the disadvantage that the managed exit interrupt must have the highest priority of all interrupts or we may not get the one we're expecting. Normally I'd expect a managed exit interrupt to have lower priority than the native interrupts.

...
I've found out why QEMU was so slow, stupid mistake on my side, I was using a debug build.

...
This is based on patches provided by Olivier at: [1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2 [2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7

[OD] Tbh the TF-A changes is really experimental/enabler. Feel free to amend directly, or add on top if you wish. I can also abandon if you have a cleaner version that we can follow up upon.

[JW] OK, I'll let you know when I'm uploading my version so you can abandon yours.

...
I've also encountered the problem cache maintenance problem Shiju described in the mail thread above: NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7, crm=14, op2=2, rt=11.

It can be worked around by compiling OP-TEE with CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected to block dcache clean+inv?

[OD] This is interesting, not something we see with FVP so worth investigating more.

[OD] Hafnium traps cache operations by set/way https://git.trustedfirmware.org/hafnium/hafnium.git/tree/src/arch/aarch64/sy... I did not write this code, but I believe this is to avoid a VM/SP to clean/invalidate cache lines that doesn't belong to it, like from another VM/SP or even the SPMC.

[JW] Makes sense

...
I will check with arch folks, whether for this case of a security mitigation implemented by a TEE, if Hafnium needs to implement the same mitigation (?)

[JW] Perhaps this mitigation isn't needed on post Armv8.4 hardware.

...
...
For Hafnium I've added two patches on top of [2], available at https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:

79b4d2cbe06e SPMC: add missing ME initialization for secondary cores

659c79d5eacf feat(mm): fix FEAT_LPA workaround

[OD] Noted. Eventually we might have to upstream those changes.

[JW] Agreed

...
For TF-A I've added a few patches on top of [1], available at https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:

a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc

4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value

fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces

ca65081b9cdc feat(sptool): add dependency to SP image

b1e1b46a0680 fix(qemu): restore code to added needed psci nodes

For OP-TEE I've also added a few patches, available at https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:

1057def23777 plat-vexpress: sel2 spmc: update for hafnium

f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2

[OD] Not sure this is really required. Hafnium should accept SMC/HVC indifferently.

[JW] I noticed that in order to reach hvc_handler() the HVC instruction must be used.

[OD] I see how it works. Though I'd suggest to implement thread_hvc (similarly to thread_smc) to hint the Hafnium specific hypercalls (see attached diff).

[JW] Thanks, I'll consider it. By the way, do you have recommendations for when to use SMC instead of HVC and vice versa with Hafnium?

...
...

d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC

There's also one patch for QEMU on top of v7.0.0, available at: https://github.com/jenswi-linaro/qemu/tree/qemu_sel2

0c1e39672dcb Read PS bits from VTCR_EL2

[OD] Great.

The QEMU problem is fixed in v.7.1.0, but I can't get that version of QEMU to work with TF-A. I guess it's because of yet another new CPU feature since I'm running with "-cpu max".

[OD] I had only done my testing with 6.0.0 that we have versioned in Hafnium CI, so I didn't notice those FEAT_LPA2 (or other extension) problems. Great you had a look.

I'll try to upstream the Hafnium and TF-A patches that make sense on their own.

What's the plan with the interrupt controller? How will OP-TEE be able to handle secure interrupts?

[OD] Did I notice that the serial device (secure UART) is triggering secure interrupts? Is this a mainline use case you're trying to enable? We should be able to provide the right interrupt configuration in the OP-TEE SP manifest file for the secure UART and OP-TEE be delivered with the appropriate virtual secure interrupt.

[JW] Thanks for the tip, I got it to work (patches pushed on my branches). The purpose of this is just to see that OP-TEE can process secure interrupts.

[OD] Yes I see it working and this is a great test harness!

The paravirtualized interrupt controller is a bit fragile. If HF_INTERRUPT_ENABLE isn't called to enable the interrupt to OP-TEE is still signaled with an IRQ or FIQ, but HF_INTERRUPT_GET will return -1. If HF_INTERRUPT_DEACTIVATE isn't called no more interrupts (secure or non-secure it seems) are delivered. Things only seem to work if managed-exit-virq isn't specified in the manifest.

[OD] Hum, the HF_INTERRUPT_ENABLE behavior doesn't look good indeed. The managed-exit-virq toggle is very recent and possibly lacking sensible testing. Let me add to our TODOs for review. About omitting HF_INTERRUPT_DEACTIVE I'd say this is an expected side effect because this corresponds to the physical EOI. Hence if not done, the physical interrupt is not deactivated and the GIC doesn't signal further interrupts.

[JW] OK

...
...
The hafnium git pulls in a few git submodules and even the source code for a Linux kernel. I guess this is useful in your internal CI setup, but when used isolated as in my setup it makes no sense at all.

[OD] It's not only an internal CI setup. but required for a developer to run local tests and submit a change. I appreciate this isn't super convenient for your use case though.

It would also be nice to be able to build with an external toolchain. I hope this is a temporary situation, I don't see why Hafnium should be pickier about toolchain than for instance TF-A.

[OD] You should be able to override the clang toolchain by doing: PATH=<path>/clang/bin:$PATH run make clobber, then make PROJECT=reference again (note gcc is not supported, and clang only is)

[JW] Missing GCC support is also a pity (and a bit inconvenient too when setting up a build environment), it seems that GDB is a bit confused by the code generated by LLVM. It's still possible to debug with GDB, you just need to work it a bit harder.

If Hafnium is supposed to become _the_ SPMC it's quite important to address inconveniences for users too at some point since it will be used more or less everywhere you need to populate S-EL2.

Hmm, I realize I'm getting dangerously close to the question: "If you care so much, why don't you fix it?" :-)

[OD] Fair enough. Please raise such concerns in our TF-A tech forum, or to trustedfirmware.org's technical steering committee. Need for GCC support wasn't mentioned that many times as I recall, but I get the point. We made a presentation some time ago on general build system improvements to get the history/rationale, and as we've tried to improve user experience problems: https://www.trustedfirmware.org/docs/HafniumBuildSystem.pdf (video recording at https://www.trustedfirmware.org/meetings/tf-a-technical-forum/)

[JW] Thanks

...
...
Speaking of building, I haven't been able to figure out how to build only for the QEMU variant I need so right now I'm building for everything and that takes a bit longer than necessary.

[OD] Yes this is a known problem. I don't have a immediate clean solution atm, but noted.

I'm going to maintain the setup above as long as it's relevant to me. I may add more patches on the branches or even rebase as needed. So if anyone is using this, keep in mind that my branches may change without warning.

[OD] Thanks for this. At least me is watching carefully 🙂

:-)

Jens Wiklander

9 Feb 9 Feb

12:21 p.m.

Hi Olivier,

Comments below.

On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

Thanks for your feedback. See comments inline [OD].

Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8

I'm not sure what's wrong.

[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0. We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0. So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.

[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")

With that applied everything seems to work.

...

Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2 The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago: see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...

BTW noticed a few random findings in the Makefile:

https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary

[JW] That's odd, I'm using that.

...

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an initial build.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.

[JW] Thanks, I'll fix these.

...

...
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).

To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?

[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).

...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

[OD] Would you be able to progress with v7.1.0 as stated above?

I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.

...

We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.

[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?

[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".

...

We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?

[JW] Absolutely

...

Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.

[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.

...

The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.

[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.

Cheers, Jens

Jérôme Forissier

1:13 p.m.

Hi,

On Thu, 9 Feb 2023 at 13:21, Jens Wiklander jens.wiklander@linaro.org wrote:

...

Hi Olivier,

Comments below.

On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for your feedback. See comments inline [OD].

Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org <hafnium@lists.trustedfirmware.org

Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com

wrote:

...
...
Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream

in particular:

...
...
TF-A:

https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

...
...
Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing

earlier).

...
That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8

I'm not sure what's wrong.

[OD] That's unfortunate. My best bet is an issue with MTE happening

between qemu v7.1.0 and v7.2.0.

...
We do all our testing with v7.1.0 eventually. And I'll have a closer

look at what's going with v7.2.0.

...
So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.

[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")

With that applied everything seems to work.

...
Afterwards it traps some GIC register access panic to EL3 at early linux

boot ( I can give more details). I worked this around by using -machine virt-6.2

...
The overall MTE story needs more investigation as I had noticed issues

with booting Hafnium+OP-TEE on FVP some time ago:

...
see bullet 3 from

https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...

...
BTW noticed a few random findings in the Makefile:

https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is

not compiled into this binary

[JW] That's odd, I'm using that.

...
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an

initial build.

...
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.

[JW] Thanks, I'll fix these.

...
...
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached

0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

...
...
You might not have observed a difference by omitting this, because of

how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation).

...
...
If possible, it would be preferable integrating this one way or the

other (I admit adding to the foreign interrupt handler code path might not look clean).

...
To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?

[OD] Yes I understand your position. And tbh this is actually more an

impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).

...
...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium

saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

...
This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation

(CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

...
...
I'm interested to know whether you're able to progress with this

configuration, also in particular if you intend to integrate into an automated CI mid term.

...
I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

[OD] Would you be able to progress with v7.1.0 as stated above?

I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.

...
We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.

[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned

from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?

[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".

[JF] FWIW a typical CI job is here: https://github.com/OP-TEE/optee_os/blob/master/.github/workflows/ci.yml#L247... We would simply add a similar section for the Hafnium build, with the proper flags set.

...

...
We have in mind to remove the clang toolchain from prebuilts submodule.

That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?

[JW] Absolutely

[JF] Already there: make clang-toolchains :)

...

...
Would it be considered as an option to use this same clang toolchain for

all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.

[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.

[JW] It is :) You may be thinking about this: https://github.com/OP-TEE/optee_os/commit/33017d856e8c But as the commit says it's fine now. There is a gotcha with Clang though, we need the compiler-rt libraries for both 32 and 64-bit and that's why we have a special download script https://github.com/OP-TEE/build/blob/3.20.0/get_clang.sh (invoked by make clang-toolchains)

...

...
The other 'improvement' would be to find a way to get rid of the linux

sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.

[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.

Cheers, Jens

Regards,

-- Jerome

Olivier Deprez

4:45 p.m.

Hi, Thanks both. See three small comments inline [OD]. Regards, Olivier.

________________________________ From: Jérôme Forissier jerome.forissier@linaro.org Sent: 09 February 2023 14:13 To: Jens Wiklander jens.wiklander@linaro.org Cc: Olivier Deprez Olivier.Deprez@arm.com; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

On Thu, 9 Feb 2023 at 13:21, Jens Wiklander <jens.wiklander@linaro.orgmailto:jens.wiklander@linaro.org> wrote: Hi Olivier,

Comments below.

On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez <Olivier.Deprez@arm.commailto:Olivier.Deprez@arm.com> wrote:

...

Hi Jens,

Thanks for your feedback. See comments inline [OD].

Regards, Olivier.

From: Jens Wiklander <jens.wiklander@linaro.orgmailto:jens.wiklander@linaro.org> Sent: 06 February 2023 10:55 To: Olivier Deprez <Olivier.Deprez@arm.commailto:Olivier.Deprez@arm.com> Cc: hafnium@lists.trustedfirmware.orgmailto:hafnium@lists.trustedfirmware.org <hafnium@lists.trustedfirmware.orgmailto:hafnium@lists.trustedfirmware.org> Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez <Olivier.Deprez@arm.commailto:Olivier.Deprez@arm.com> wrote:

...
Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8

I'm not sure what's wrong.

[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0. We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0. So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.

[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")

With that applied everything seems to work.

...

Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2 The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago: see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...

BTW noticed a few random findings in the Makefile:

https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary

[JW] That's odd, I'm using that.

[OD] Found this https://wiki.qemu.org/ChangeLog/7.2#Removal_of_the_.22slirp.22_submodule_.28... Provided --enable-slirp to configure but fails to build qemu because missing the host installed lipslirp package. Unfortunately I did not find a way to install libslirp-dev per the recommendation. I also tried booting without the -netdev user option but linux hangs when trying to configure the network interface. Never mind, I'll find a way...

...

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an initial build.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.

[JW] Thanks, I'll fix these.

[OD] I'm thinking of another cosmetic one: https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L197 This line might not be needed as both CTX_INCLUDE_EL2_REGS=1 SPMD_SPM_AT_SEL2=1 are TF-A default options.

...

...
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).

To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?

[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).

...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

[OD] Would you be able to progress with v7.1.0 as stated above?

I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.

...

We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.

[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?

[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".

...

We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?

[JW] Absolutely

[JF] Already there: make clang-toolchains :)

...

Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.

[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.

...

The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.

[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.

[OD] Agree, noted.

Cheers, Jens

Regards, -- Jerome

Jens Wiklander

10 Feb 10 Feb

9:10 a.m.

Hi Olivier,

Thanks for the feedback. I've updated the patches accordingly. I've rebased the setup so now we're at the latest on the OP-TEE gits and using upstream on Hafnium. Here are the steps to test it all, note that the build step handles the Hafnium git submodules:

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 cd build make -j8 toolchains make -j8 all make run-only

Thanks, Jens

On Thu, Feb 9, 2023 at 5:46 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi, Thanks both. See three small comments inline [OD]. Regards, Olivier.

From: Jérôme Forissier jerome.forissier@linaro.org Sent: 09 February 2023 14:13 To: Jens Wiklander jens.wiklander@linaro.org Cc: Olivier Deprez Olivier.Deprez@arm.com; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

On Thu, 9 Feb 2023 at 13:21, Jens Wiklander jens.wiklander@linaro.org wrote:

Hi Olivier,

Comments below.

On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for your feedback. See comments inline [OD].

Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8

I'm not sure what's wrong.

[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0. We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0. So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.

[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")

With that applied everything seems to work.

...
Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2 The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago: see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...

BTW noticed a few random findings in the Makefile:

https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary

[JW] That's odd, I'm using that.

[OD] Found this https://wiki.qemu.org/ChangeLog/7.2#Removal_of_the_.22slirp.22_submodule_.28... Provided --enable-slirp to configure but fails to build qemu because missing the host installed lipslirp package. Unfortunately I did not find a way to install libslirp-dev per the recommendation. I also tried booting without the -netdev user option but linux hangs when trying to configure the network interface. Never mind, I'll find a way...

...
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an initial build.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.

[JW] Thanks, I'll fix these.

[OD] I'm thinking of another cosmetic one: https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L197 This line might not be needed as both CTX_INCLUDE_EL2_REGS=1 SPMD_SPM_AT_SEL2=1 are TF-A default options.

...
...
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).

To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?

[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).

...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

[OD] Would you be able to progress with v7.1.0 as stated above?

I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.

...
We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.

[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?

[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".

[JF] FWIW a typical CI job is here: https://github.com/OP-TEE/optee_os/blob/master/.github/workflows/ci.yml#L247... We would simply add a similar section for the Hafnium build, with the proper flags set.

...
We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?

[JW] Absolutely

[JF] Already there: make clang-toolchains :)

...
Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.

[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.

[JW] It is :) You may be thinking about this: https://github.com/OP-TEE/optee_os/commit/33017d856e8c But as the commit says it's fine now. There is a gotcha with Clang though, we need the compiler-rt libraries for both 32 and 64-bit and that's why we have a special download script https://github.com/OP-TEE/build/blob/3.20.0/get_clang.sh (invoked by make clang-toolchains)

...
The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.

[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.

[OD] Agree, noted.

Cheers, Jens

Regards,

Jerome

Olivier Deprez

5:20 p.m.

Hi Jens,

I reproduced this setup and works well! Thanks for this brilliant work!

I still had to add --enable-slirp to qemu build for some reason..

I'm thinking of adding PAuth to the picture. BTI might be another one, but I guess it's not easy to enable (for TAs) if it requires a bti powered toolchain. See few suggestions below.

Regards, Olivier.

--- a/qemu_v8.mk +++ b/qemu_v8.mk @@ -4,6 +4,7 @@ TF_A_LOGLVL = 40 QEMU_SMP=2 QEMU_KERNEL_BOOTARGS=nokaslr MEMTAG=y +PAUTH=y

################################################################################ # Following variables defines how the NS_USER (Non Secure User - Client @@ -57,12 +58,15 @@ ifneq ($(filter-out n 1 2 3,$(SPMC_AT_EL)),) $(error Unsupported SPMC_AT_EL value $(SPMC_AT_EL)) endif

-# Option to configure Pointer Authentication for TA's +# Option to configure Pointer Authentication for core and TAs PAUTH ?= n

# Option to configure Memory Tagging Extension MEMTAG ?= n

+# Option to configure BTI for core and TAs +BTI ?= n + ################################################################################ # Paths to git projects and various binaries ################################################################################ @@ -177,8 +181,6 @@ TF_A_FLAGS ?= \ QEMU_USE_GIC_DRIVER=$(TFA_GIC_DRIVER) \ ENABLE_SVE_FOR_NS=1 \ ENABLE_SVE_FOR_SWD=1 \ - ENABLE_SME_FOR_NS=1 \ - ENABLE_SME_FOR_SWD=1 \ BL32_RAM_LOCATION=tdram \ DEBUG=$(TF_A_DEBUG) \ LOG_LEVEL=$(TF_A_LOGLVL) @@ -195,7 +197,6 @@ TF_A_FLAGS_SPMC_AT_EL_1 += QEMU_TOS_FW_CONFIG_DTS=../build/qemu_v8/spmc_el1_mani TF_A_FLAGS_SPMC_AT_EL_1 += SPMC_OPTEE=1 TF_A_FLAGS_SPMC_AT_EL_1 += QEMU_TOS_FW_CONFIG_DTS=../build/qemu_v8/spmc_el1_manifest.dts TF_A_FLAGS_SPMC_AT_EL_2 = SPD=spmd -TF_A_FLAGS_SPMC_AT_EL_2 += CTX_INCLUDE_PAUTH_REGS=1 TF_A_FLAGS_SPMC_AT_EL_2 += ENABLE_SPE_FOR_LOWER_ELS=0 TF_A_FLAGS_SPMC_AT_EL_2 += ENABLE_SME_FOR_NS=0 ENABLE_SME_FOR_SWD=0 TF_A_FLAGS_SPMC_AT_EL_2 += SP_LAYOUT_FILE=../build/qemu_v8/sp_layout.json @@ -221,6 +222,9 @@ endif ifeq ($(PAUTH),y) TF_A_FLAGS += CTX_INCLUDE_PAUTH_REGS=1 endif +ifeq ($(BTI),y) +TF_A_FLAGS += BRANCH_PROTECTION=1 +endif ifeq ($(MEMTAG),y) TF_A_FLAGS += CTX_INCLUDE_MTE_REGS=1 endif @@ -388,6 +392,10 @@ ifeq ($(PAUTH),y) OPTEE_OS_COMMON_FLAGS += CFG_TA_PAUTH=y OPTEE_OS_COMMON_FLAGS += CFG_CORE_PAUTH=y endif +ifeq ($(BTI),y) +OPTEE_OS_COMMON_FLAGS += CFG_TA_BTI=y +OPTEE_OS_COMMON_FLAGS += CFG_CORE_BTI=y +endif ifeq ($(MEMTAG),y) OPTEE_OS_COMMON_FLAGS += CFG_MEMTAG=y endif

________________________________ From: Jens Wiklander jens.wiklander@linaro.org Sent: 10 February 2023 10:10 To: Olivier Deprez Olivier.Deprez@arm.com Cc: Jérôme Forissier jerome.forissier@linaro.org; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b qemu_sel2 repo sync -j8 cd build make -j8 toolchains make -j8 all make run-only

Thanks, Jens

On Thu, Feb 9, 2023 at 5:46 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi, Thanks both. See three small comments inline [OD]. Regards, Olivier.

From: Jérôme Forissier jerome.forissier@linaro.org Sent: 09 February 2023 14:13 To: Jens Wiklander jens.wiklander@linaro.org Cc: Olivier Deprez Olivier.Deprez@arm.com; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi,

On Thu, 9 Feb 2023 at 13:21, Jens Wiklander jens.wiklander@linaro.org wrote:

Hi Olivier,

Comments below.

On Tue, Feb 7, 2023 at 2:13 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

Thanks for your feedback. See comments inline [OD].

Regards, Olivier.

From: Jens Wiklander jens.wiklander@linaro.org Sent: 06 February 2023 10:55 To: Olivier Deprez Olivier.Deprez@arm.com Cc: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 3, 2023 at 6:02 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...
Hi Jens,

We should now have all needed changes merged to TF-A/Hafnium upstream in particular:

TF-A: https://review.trustedfirmware.org/q/topic:%22qemu_sel2%22+(status:open%20OR...)

Hafnium: https://review.trustedfirmware.org/c/hafnium/hafnium/+/19042 https://review.trustedfirmware.org/c/hafnium/hafnium/+/19111 https://review.trustedfirmware.org/c/hafnium/project/reference/+/16322 https://review.trustedfirmware.org/c/hafnium/hafnium/+/18310

I'm reproducing your qemu setup by using TF-A + Hafnium tips of master. It passes full xtest suite (incl. regression_1034 that was failing earlier).

That's good news! I tried updating to the latest on Hafnium. However, when booting it fails with: INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. ERROR: Data abort: pc=0xe102a10, esr=0x96000051, ec=0x25, far=0xe215ef8

I'm not sure what's wrong.

[OD] That's unfortunate. My best bet is an issue with MTE happening between qemu v7.1.0 and v7.2.0. We do all our testing with v7.1.0 eventually. And I'll have a closer look at what's going with v7.2.0. So I reverted to qemu v7.1.0 and booted TF-A+Hafnium.

[JW] Jerfome Forissier reminded me that this is a known bug in qemu v7.2.0, it's fixed with: 28fb921f02ef ("target/arm: Fix physical address resolution for MTE")

With that applied everything seems to work.

...
Afterwards it traps some GIC register access panic to EL3 at early linux boot ( I can give more details). I worked this around by using -machine virt-6.2 The overall MTE story needs more investigation as I had noticed issues with booting Hafnium+OP-TEE on FVP some time ago: see bullet 3 from https://lists.trustedfirmware.org/archives/list/op-tee@lists.trustedfirmware...

BTW noticed a few random findings in the Makefile:

https://github.com/jenswi-linaro/build/blob/qemu_sel2/common.mk#L471 This seems to cause issues with qemu v7.2.0: qemu-system-aarch64: -netdev user,id=vmnic: network backend 'user' is not compiled into this binary

[JW] That's odd, I'm using that.

[OD] Found this https://wiki.qemu.org/ChangeLog/7.2#Removal_of_the_.22slirp.22_submodule_.28... Provided --enable-slirp to configure but fails to build qemu because missing the host installed lipslirp package. Unfortunately I did not find a way to install libslirp-dev per the recommendation. I also tried booting without the -netdev user option but linux hangs when trying to configure the network interface. Never mind, I'll find a way...

...
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L198 CTX_INCLUDE_MTE_REGS should be taken care of by line 226.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L229 I appreciate this is a temp. change, but actually prevents doing an initial build.

https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L497 QEMU_MTE option overrides settings done lines 492-496.

[JW] Thanks, I'll fix these.

[OD] I'm thinking of another cosmetic one: https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8.mk#L197 This line might not be needed as both CTX_INCLUDE_EL2_REGS=1 SPMD_SPM_AT_SEL2=1 are TF-A default options.

...
...
Here's what I have on my list as next steps. Let me know your opinion on 1 and 2:

Ack the ME interrupt in foreign interrupt exit. See attached 0001-fix-ack-ME-in-foreign-interrupt-code-path.patch.

You might not have observed a difference by omitting this, because of how OP-TEE handles foreign interrupts by returning to the SPMC with a direct response (implicitely ending the managed exit operation). If possible, it would be preferable integrating this one way or the other (I admit adding to the foreign interrupt handler code path might not look clean).

To me this looks like a pointless switch back and forth between S-EL1 and S-EL2. Hafnium is obviously able to handle the case where this isn't done. What is the advantage from a system point of view by acking the ME interrupt?

[OD] Yes I understand your position. And tbh this is actually more an impdef behavior rather than mandated by the spec, so I won't be too strong about it. The counterargument is that we'd want to keep a consistency into how virtual interrupts are handled from the SP perspective. Whichever the INTID or interrupt type IRQ or FIQ, first thing would be to call HF_INTERRUPT_GET to 'deassert' the virtual interrupt at the hafnium virtual interrupt controller. The fact that hafnium deassert the ME interrupt upon return by a direct response looks a bit counterintuitive (even if actually mandated by the spec!).

...

Omit saving/restoring NS VFP context when S-EL2 is used. Hafnium saves the NS SIMD (and SVE if implememented) before returning to OP-TEE. So it is not necessary for OP-TEE to save again the NS context. See 0002-fix-omit-ns-vfp-context-save-restore-if-sel2-present.patch. This change is just a tentative, not confident it properly handles all cases, this is just to get the idea.

This makes sense. I'll take a close look and add it to my branch later.

...

Earlier findings around managed exit virq and HF_INTERRUPT_GET.

Earlier finding with SEL2 trapping cache operation (CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME).

I'm interested to know whether you're able to progress with this configuration, also in particular if you intend to integrate into an automated CI mid term.

I'm trying to update to upstream only, we just merged the OP-TEE patches https://github.com/OP-TEE/optee_os/pull/5667. So far I'm stuck on the data abort in Hafnium above.

[OD] Would you be able to progress with v7.1.0 as stated above?

I checked out 28fb921f02ef ("target/arm: Fix physical address resolution for MTE") instead. I'll publish my changes on the qemu_sel2 setup soon.

...
We would like to integrate this in our CI loop in a not too distant future. We usually try to use TF-A release tags so we may want to wait for v2.9. However, we have a larger problem with Hafnium sources. Since we have one common repo setup for all QEMU v8 based configuration it's not reasonable to add 3GB to configurations that doesn't care about Hafnium. We have to find some way to address this. Perhaps we'll have to add a special repo configuration for Hafnium only, but that's not ideal either since it increases the maintenance burden a bit.

[OD] Yes I fully understand this concern. Though just to be sure I understood, in the CI job, aren't trees cloned from scratch, built and deleted at the end of the run? Did you say you have to store the trees permanently beyond just one run?

[JW] We're normally basing the CI setup on the usual setup. You can in principle repeat what CI is doing with "make check".

[JF] FWIW a typical CI job is here: https://github.com/OP-TEE/optee_os/blob/master/.github/workflows/ci.yml#L247... We would simply add a similar section for the Hafnium build, with the proper flags set.

...
We have in mind to remove the clang toolchain from prebuilts submodule. That would save 1.5GB worth of space for starters. The side effect is that you'd have to download a clang toolchain as no longer provided in the hafnium trees. Could this be added to 'make toolchains' perhaps?

[JW] Absolutely

[JF] Already there: make clang-toolchains :)

...
Would it be considered as an option to use this same clang toolchain for all targets (linux, optee, tfa, hafnium etc.)? I appreciate this is a bit more of work though.

[JW] I'm not sure if it's possible to compile OP-TEE with clang only. There's something missing in objcopy or the linker if I remember correctly.

[JW] It is :) You may be thinking about this: https://github.com/OP-TEE/optee_os/commit/33017d856e8c But as the commit says it's fine now. There is a gotcha with Clang though, we need the compiler-rt libraries for both 32 and 64-bit and that's why we have a special download script https://github.com/OP-TEE/build/blob/3.20.0/get_clang.sh (invoked by make clang-toolchains)

...
The other 'improvement' would be to find a way to get rid of the linux sources. I'm brainstorming on what can be done. That would save an additional 1G and leaves the complete tree is a more palatable size.

[JW] That's a welcome change. Another improvement that comes to mind is to only build the needed target instead of all targets.

[OD] Agree, noted.

Cheers, Jens

Regards,

Jerome

Jérôme Forissier

5:30 p.m.

Hello Olivier,

On Fri, 10 Feb 2023 at 18:20, Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

I reproduced this setup and works well! Thanks for this brilliant work!

I still had to add --enable-slirp to qemu build for some reason..

I'm thinking of adding PAuth to the picture. BTI might be another one, but I guess it's not easy to enable (for TAs) if it requires a bti powered toolchain.

Already enabled in optee_os CI :) along with MTE and PAuth, see https://github.com/OP-TEE/optee_os/blob/5ddda749c60d/.github/workflows/ci.ym...

The BTI-enabled toolchain comes with the Docker image, it is build like so: https://github.com/jforissier/docker_optee_os_ci/blob/60d8c0105fa2/Dockerfil... (crosstool-ng is a fantastic tool!)

Cheers,

-- Jerome

Jens Wiklander

20 Feb 20 Feb

9:20 a.m.

Hi Olivier,

On Fri, Feb 10, 2023 at 6:20 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

I reproduced this setup and works well! Thanks for this brilliant work!

I still had to add --enable-slirp to qemu build for some reason..

I'm thinking of adding PAuth to the picture.

Makes sense, I've added PAUTH=y to the defaults in this branch.

...

BTI might be another one, but I guess it's not easy to enable (for TAs) if it requires a bti powered toolchain.

Yes, that's tricky. As Jerome said, we have this in CI but it's not so easy to support here.

...

See few suggestions below.

Thanks, took everything but BTI.

Cheers, Jens

Olivier Deprez

25 May 25 May

4:34 p.m.

Hi Jens,

I gave a try with fresh TF-A/Hafnium v2.9 and qemu using following shas:

TF-A 60df3d75edb7eae87f51b356c35307e3011202d7 Hafnium: 0715b8e002cdfb92e6b7efb71128cb24557b70cb qemu: b300c134465465385045ab705b68a42699688332

See attached small changes I applied to build directory.

As an out of curiosity question, where do we stand from making the overall changes upstream (or is it still in the radar)? Do you still think of integrating a CI loop?

(BTW we made progress with the idea of removing the clang toolchain from prebuilts submodule https://review.trustedfirmware.org/c/hafnium/prebuilts/+/19954 We may think of downloading the appropriate LLVM toolchain by make toolchains and use it for building..)

Regards, Olivier.

________________________________ From: Jens Wiklander jens.wiklander@linaro.org Sent: 20 February 2023 10:20 To: Olivier Deprez Olivier.Deprez@arm.com Cc: Jérôme Forissier jerome.forissier@linaro.org; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE

Hi Olivier,

On Fri, Feb 10, 2023 at 6:20 PM Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

I reproduced this setup and works well! Thanks for this brilliant work!

I still had to add --enable-slirp to qemu build for some reason..

I'm thinking of adding PAuth to the picture.

Makes sense, I've added PAUTH=y to the defaults in this branch.

...

BTI might be another one, but I guess it's not easy to enable (for TAs) if it requires a bti powered toolchain.

Yes, that's tricky. As Jerome said, we have this in CI but it's not so easy to support here.

...

See few suggestions below.

Thanks, took everything but BTI.

Cheers, Jens

Jérôme Forissier

26 May 26 May

12:58 p.m.

Hello Olivier,

On Thu, 25 May 2023 at 18:34, Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jens,

I gave a try with fresh TF-A/Hafnium v2.9 and qemu using following shas:

TF-A 60df3d75edb7eae87f51b356c35307e3011202d7 Hafnium: 0715b8e002cdfb92e6b7efb71128cb24557b70cb qemu: b300c134465465385045ab705b68a42699688332

See attached small changes I applied to build directory.

As an out of curiosity question, where do we stand from making the overall changes upstream (or is it still in the radar)? Do you still think of integrating a CI loop?

(BTW we made progress with the idea of removing the clang toolchain from prebuilts submodule https://review.trustedfirmware.org/c/hafnium/prebuilts/+/19954 We may think of downloading the appropriate LLVM toolchain by make toolchains and use it for building..)

FYI there is a "make clang-toolchains" already which currently downloads 12.0.

-- Jerome > Regards, > Olivier. > > > ------------------------------ > *From:* Jens Wiklander jens.wiklander@linaro.org > *Sent:* 20 February 2023 10:20 > *To:* Olivier Deprez Olivier.Deprez@arm.com > *Cc:* Jérôme Forissier jerome.forissier@linaro.org; > hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org > *Subject:* Re: [Hafnium] Hafnium with QEMU and OP-TEE > > Hi Olivier, > > On Fri, Feb 10, 2023 at 6:20 PM Olivier Deprez Olivier.Deprez@arm.com > wrote: > > > > Hi Jens, > > > > I reproduced this setup and works well! > > Thanks for this brilliant work! > > > > I still had to add --enable-slirp to qemu build for some reason.. > > > > I'm thinking of adding PAuth to the picture. > > Makes sense, I've added PAUTH=y to the defaults in this branch. > > > BTI might be another one, but I guess it's not easy to enable (for TAs) > if it requires a bti powered toolchain. > > Yes, that's tricky. As Jerome said, we have this in CI but it's not so > easy to support here. > > > See few suggestions below. > > Thanks, took everything but BTI. > > Cheers, > Jens >

Olivier Deprez

30 May 30 May

8:40 a.m.

Hi Jerome,

Thank you, yes this works, I added a 4th change to use the downloaded toolchain instead of the copy held in Hafnium's prebuilts submodule. See summary of changes below.

Is there a reason to store the llvm toolchain in $(ROOT) rather than $(ROOT)/toolchains?

Regards, Olivier.

commit d122092ec930b38466e369aa61b6f6b4ccaba14c Author: Olivier Deprez olivier.deprez@arm.com Date: Tue May 30 10:31:18 2023 +0200

build: use downloaded llvm toolchain to build hafnium

Use the toolchain downloaded by make clang-toolchains rather than the toolchain provided in Hafnium prebuilts submodule.

Note the default clang12 toolchain requires: sudo apt install libncurses5

Signed-off-by: Olivier Deprez olivier.deprez@arm.com

diff --git a/qemu_v8.mk b/qemu_v8.mk index 65a7a1d..f0ab955 100644 --- a/qemu_v8.mk +++ b/qemu_v8.mk @@ -400,7 +400,7 @@ optee-os-clean: optee-os-clean-common # Hafnium ################################################################################

-HAFNIUM_EXPORTS = PATH=$(HAFNIUM_PATH)/prebuilts/linux-x64/clang/bin:$(HAFNIUM_PATH)/prebuilts/linux-x64/dtc:$(PATH) +HAFNIUM_EXPORTS = PATH=$(ROOT)/clang-12.0.0/bin:$(HAFNIUM_PATH)/prebuilts/linux-x64/dtc:$(PATH)

.hafnium_checkout: (cd $(HAFNIUM_PATH) && git submodule init && git submodule update)

commit b74289cdba1334954a20c72c7037af81566c9f21 Author: Olivier Deprez olivier.deprez@arm.com Date: Fri May 12 14:44:31 2023 +0200

tfa: options required to build with TF-A v2.9

For SEL2 SPMC, set ENABLE_FEAT_SEL2=2 ENABLE_FEAT_FGT=2 as they're not default on the baseline qemu v8.

Signed-off-by: Olivier Deprez olivier.deprez@arm.com

diff --git a/.hafnium_checkout b/.hafnium_checkout new file mode 100644 index 0000000..e69de29 diff --git a/qemu_v8.mk b/qemu_v8.mk index 0542656..65a7a1d 100644 --- a/qemu_v8.mk +++ b/qemu_v8.mk @@ -58,7 +58,7 @@ ifneq ($(filter-out n 1 2 3,$(SPMC_AT_EL)),) $(error Unsupported SPMC_AT_EL value $(SPMC_AT_EL)) endif

-# Option to configure Pointer Authentication for TA's +# Option to configure Pointer Authentication for core and TAs PAUTH ?= n

# Option to configure Memory Tagging Extension @@ -193,7 +193,7 @@ TF_A_FLAGS_SPMC_AT_EL_1 += ENABLE_SME_FOR_NS=0 ENABLE_SME_FOR_SWD=0 TF_A_FLAGS_SPMC_AT_EL_1 += QEMU_TOS_FW_CONFIG_DTS=../build/qemu_v8/spmc_el1_manifest.dts TF_A_FLAGS_SPMC_AT_EL_1 += SPMC_OPTEE=1 TF_A_FLAGS_SPMC_AT_EL_1 += QEMU_TOS_FW_CONFIG_DTS=../build/qemu_v8/spmc_el1_manifest.dts -TF_A_FLAGS_SPMC_AT_EL_2 = SPD=spmd +TF_A_FLAGS_SPMC_AT_EL_2 = SPD=spmd ENABLE_FEAT_SEL2=2 ENABLE_FEAT_FGT=2 TF_A_FLAGS_SPMC_AT_EL_2 += ENABLE_SPE_FOR_LOWER_ELS=0 TF_A_FLAGS_SPMC_AT_EL_2 += ENABLE_SME_FOR_NS=0 ENABLE_SME_FOR_SWD=0 TF_A_FLAGS_SPMC_AT_EL_2 += SP_LAYOUT_FILE=../build/qemu_v8/sp_layout.json

commit 326b604b0c59ed7c0c1fab026435e9ab32e15806 Author: Olivier Deprez olivier.deprez@arm.com Date: Fri May 12 14:40:55 2023 +0200

build: add qemu enable slirp option

For netdev user command line option.

Requires apt-get install libslirp-dev

Signed-off-by: Olivier Deprez olivier.deprez@arm.com

diff --git a/common.mk b/common.mk index e5f2333..aadaa11 100644 --- a/common.mk +++ b/common.mk @@ -448,6 +448,11 @@ edk2-clean-common: ################################################################################ QEMU_CONFIGURE_PARAMS_COMMON = --cc="$(CCACHE)gcc" --extra-cflags="-Wno-error" \ --disable-docs + +#TODO: slirp submodule deprecated from qemu v7.1.2 +# https://wiki.qemu.org/ChangeLog/7.2#Removal_of_the_.22slirp.22_submodule_.28... +QEMU_CONFIGURE_PARAMS_COMMON += --enable-slirp + QEMU_EXTRA_ARGS +=\ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000

commit 120c99122fbd2238d64671737db8db672d211634 Author: Olivier Deprez olivier.deprez@arm.com Date: Mon Mar 13 10:25:12 2023 +0100

qemu: add ns mem ranges to spmc manifest

Per [1], define secure and non-secure memory ranges in the SPMC manifest.

[1] https://trustedfirmware-a.readthedocs.io/en/latest/components/secure-partiti...

Signed-off-by: Olivier Deprez olivier.deprez@arm.com

diff --git a/qemu_v8/spmc_el2_manifest.dts b/qemu_v8/spmc_el2_manifest.dts index 7acaa38..80d0d93 100644 --- a/qemu_v8/spmc_el2_manifest.dts +++ b/qemu_v8/spmc_el2_manifest.dts @@ -60,8 +60,13 @@ };

/* VIRT_SECURE_MEM */ - memory@e000000 { + memory@0 { device_type = "memory"; reg = <0x0 0xe000000 0x1000000>; }; + + memory@1 { + device_type = "ns-memory"; + reg = <0x0 0x40000000 0x80000000>; + }; };

From: Jérôme Forissier jerome.forissier@linaro.org Sent: 26 May 2023 14:58 To: Olivier Deprez Olivier.Deprez@arm.com Cc: Jens Wiklander jens.wiklander@linaro.org; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE Hello Olivier,

On Thu, 25 May 2023 at 18:34, Olivier Deprez Olivier.Deprez@arm.com wrote:

Hi Jens,

I gave a try with fresh TF-A/Hafnium v2.9 and qemu using following shas:

TF-A 60df3d75edb7eae87f51b356c35307e3011202d7 Hafnium: 0715b8e002cdfb92e6b7efb71128cb24557b70cb qemu: b300c134465465385045ab705b68a42699688332

See attached small changes I applied to build directory.

As an out of curiosity question, where do we stand from making the overall changes upstream (or is it still in the radar)? Do you still think of integrating a CI loop?

FYI there is a "make clang-toolchains" already which currently downloads 12.0.

-- Jerome Regards, Olivier. From: Jens Wiklander jens.wiklander@linaro.org Sent: 20 February 2023 10:20 To: Olivier Deprez Olivier.Deprez@arm.com Cc: Jérôme Forissier jerome.forissier@linaro.org; hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE Hi Olivier, On Fri, Feb 10, 2023 at 6:20 PM Olivier Deprez Olivier.Deprez@arm.com wrote: > > Hi Jens, > > I reproduced this setup and works well! > Thanks for this brilliant work! > > I still had to add --enable-slirp to qemu build for some reason.. > > I'm thinking of adding PAuth to the picture. Makes sense, I've added PAUTH=y to the defaults in this branch. > BTI might be another one, but I guess it's not easy to enable (for TAs) if it requires a bti powered toolchain. Yes, that's tricky. As Jerome said, we have this in CI but it's not so easy to support here. > See few suggestions below. Thanks, took everything but BTI. Cheers, Jens

Jérôme Forissier

9:04 a.m.

On Tue, 30 May 2023 at 10:40, Olivier Deprez Olivier.Deprez@arm.com wrote:

...

Hi Jerome,

Thank you, yes this works, I added a 4th change to use the downloaded toolchain instead of the copy held in Hafnium's prebuilts submodule. See summary of changes below.

Is there a reason to store the llvm toolchain in $(ROOT) rather than $(ROOT)/toolchains?

Not really. I suppose the latter would work equally well.

Thanks,

-- Jerome > Regards, > Olivier. > > commit d122092ec930b38466e369aa61b6f6b4ccaba14c > Author: Olivier Deprez olivier.deprez@arm.com > Date: Tue May 30 10:31:18 2023 +0200 > > build: use downloaded llvm toolchain to build hafnium > > Use the toolchain downloaded by make clang-toolchains rather than the > toolchain provided in Hafnium prebuilts submodule. > > Note the default clang12 toolchain requires: > sudo apt install libncurses5 > > Signed-off-by: Olivier Deprez olivier.deprez@arm.com > > diff --git a/qemu_v8.mk b/qemu_v8.mk > index 65a7a1d..f0ab955 100644 > --- a/qemu_v8.mk > +++ b/qemu_v8.mk > @@ -400,7 +400,7 @@ optee-os-clean: optee-os-clean-common > # Hafnium > > ################################################################################ > > -HAFNIUM_EXPORTS = > PATH=$(HAFNIUM_PATH)/prebuilts/linux-x64/clang/bin:$(HAFNIUM_PATH)/prebuilts/linux-x64/dtc:$(PATH) > +HAFNIUM_EXPORTS = > PATH=$(ROOT)/clang-12.0.0/bin:$(HAFNIUM_PATH)/prebuilts/linux-x64/dtc:$(PATH) > > .hafnium_checkout: > (cd $(HAFNIUM_PATH) && git submodule init && git submodule update) > > commit b74289cdba1334954a20c72c7037af81566c9f21 > Author: Olivier Deprez olivier.deprez@arm.com > Date: Fri May 12 14:44:31 2023 +0200 > > tfa: options required to build with TF-A v2.9 > > For SEL2 SPMC, set ENABLE_FEAT_SEL2=2 ENABLE_FEAT_FGT=2 > as they're not default on the baseline qemu v8. > > Signed-off-by: Olivier Deprez olivier.deprez@arm.com > > diff --git a/.hafnium_checkout b/.hafnium_checkout > new file mode 100644 > index 0000000..e69de29 > diff --git a/qemu_v8.mk b/qemu_v8.mk > index 0542656..65a7a1d 100644 > --- a/qemu_v8.mk > +++ b/qemu_v8.mk > @@ -58,7 +58,7 @@ ifneq ($(filter-out n 1 2 3,$(SPMC_AT_EL)),) > $(error Unsupported SPMC_AT_EL value $(SPMC_AT_EL)) > endif > > -# Option to configure Pointer Authentication for TA's > +# Option to configure Pointer Authentication for core and TAs > PAUTH ?= n > > # Option to configure Memory Tagging Extension > @@ -193,7 +193,7 @@ TF_A_FLAGS_SPMC_AT_EL_1 += ENABLE_SME_FOR_NS=0 > ENABLE_SME_FOR_SWD=0 > TF_A_FLAGS_SPMC_AT_EL_1 += > QEMU_TOS_FW_CONFIG_DTS=../build/qemu_v8/spmc_el1_manifest.dts > TF_A_FLAGS_SPMC_AT_EL_1 += SPMC_OPTEE=1 > TF_A_FLAGS_SPMC_AT_EL_1 += > QEMU_TOS_FW_CONFIG_DTS=../build/qemu_v8/spmc_el1_manifest.dts > -TF_A_FLAGS_SPMC_AT_EL_2 = SPD=spmd > +TF_A_FLAGS_SPMC_AT_EL_2 = SPD=spmd ENABLE_FEAT_SEL2=2 ENABLE_FEAT_FGT=2 > TF_A_FLAGS_SPMC_AT_EL_2 += ENABLE_SPE_FOR_LOWER_ELS=0 > TF_A_FLAGS_SPMC_AT_EL_2 += ENABLE_SME_FOR_NS=0 ENABLE_SME_FOR_SWD=0 > TF_A_FLAGS_SPMC_AT_EL_2 += SP_LAYOUT_FILE=../build/qemu_v8/sp_layout.json > > commit 326b604b0c59ed7c0c1fab026435e9ab32e15806 > Author: Olivier Deprez olivier.deprez@arm.com > Date: Fri May 12 14:40:55 2023 +0200 > > build: add qemu enable slirp option > > For netdev user command line option. > > Requires apt-get install libslirp-dev > > Signed-off-by: Olivier Deprez olivier.deprez@arm.com > > diff --git a/common.mk b/common.mk > index e5f2333..aadaa11 100644 > --- a/common.mk > +++ b/common.mk > @@ -448,6 +448,11 @@ edk2-clean-common: > > ################################################################################ > QEMU_CONFIGURE_PARAMS_COMMON = --cc="$(CCACHE)gcc" > --extra-cflags="-Wno-error" \ > --disable-docs > + > +#TODO: slirp submodule deprecated from qemu v7.1.2 > +# > https://wiki.qemu.org/ChangeLog/7.2#Removal_of_the_.22slirp.22_submodule_.28... > +QEMU_CONFIGURE_PARAMS_COMMON += --enable-slirp > + > QEMU_EXTRA_ARGS +=\ > -object rng-random,filename=/dev/urandom,id=rng0 \ > -device virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000 > > commit 120c99122fbd2238d64671737db8db672d211634 > Author: Olivier Deprez olivier.deprez@arm.com > Date: Mon Mar 13 10:25:12 2023 +0100 > > qemu: add ns mem ranges to spmc manifest > > Per [1], define secure and non-secure memory ranges in the SPMC > manifest. > > [1] > https://trustedfirmware-a.readthedocs.io/en/latest/components/secure-partiti... > > Signed-off-by: Olivier Deprez olivier.deprez@arm.com > > diff --git a/qemu_v8/spmc_el2_manifest.dts b/qemu_v8/spmc_el2_manifest.dts > index 7acaa38..80d0d93 100644 > --- a/qemu_v8/spmc_el2_manifest.dts > +++ b/qemu_v8/spmc_el2_manifest.dts > @@ -60,8 +60,13 @@ > }; > > /* VIRT_SECURE_MEM */ > - memory@e000000 { > + memory@0 { > device_type = "memory"; > reg = <0x0 0xe000000 0x1000000>; > }; > + > + memory@1 { > + device_type = "ns-memory"; > + reg = <0x0 0x40000000 0x80000000>; > + }; > }; > > > > > > > > From: Jérôme Forissier jerome.forissier@linaro.org > Sent: 26 May 2023 14:58 > To: Olivier Deprez Olivier.Deprez@arm.com > Cc: Jens Wiklander jens.wiklander@linaro.org; > hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org > Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE > > Hello Olivier, > > On Thu, 25 May 2023 at 18:34, Olivier Deprez Olivier.Deprez@arm.com > wrote: > > Hi Jens, > > I gave a try with fresh TF-A/Hafnium v2.9 and qemu using following shas: > > TF-A 60df3d75edb7eae87f51b356c35307e3011202d7 > Hafnium: 0715b8e002cdfb92e6b7efb71128cb24557b70cb > qemu: b300c134465465385045ab705b68a42699688332 > > See attached small changes I applied to build directory. > > As an out of curiosity question, where do we stand from making the overall > changes upstream (or is it still in the radar)? > Do you still think of integrating a CI loop? > > (BTW we made progress with the idea of removing the clang toolchain from > prebuilts submodule > https://review.trustedfirmware.org/c/hafnium/prebuilts/+/19954 > We may think of downloading the appropriate LLVM toolchain by make > toolchains and use it for building..) > > FYI there is a "make clang-toolchains" already which currently downloads > 12.0. > > -- > Jerome > > > Regards, > Olivier. > > > > From: Jens Wiklander jens.wiklander@linaro.org > Sent: 20 February 2023 10:20 > To: Olivier Deprez Olivier.Deprez@arm.com > Cc: Jérôme Forissier jerome.forissier@linaro.org; > hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org > Subject: Re: [Hafnium] Hafnium with QEMU and OP-TEE > > Hi Olivier, > > On Fri, Feb 10, 2023 at 6:20 PM Olivier Deprez Olivier.Deprez@arm.com > wrote: > > > > Hi Jens, > > > > I reproduced this setup and works well! > > Thanks for this brilliant work! > > > > I still had to add --enable-slirp to qemu build for some reason.. > > > > I'm thinking of adding PAuth to the picture. > > Makes sense, I've added PAUTH=y to the defaults in this branch. > > > BTI might be another one, but I guess it's not easy to enable (for TAs) > if it requires a bti powered toolchain. > > Yes, that's tricky. As Jerome said, we have this in CI but it's not so > easy to support here. > > > See few suggestions below. > > Thanks, took everything but BTI. > > Cheers, > Jens

shiju.jose＠huawei.com

21 Jun 21 Jun

8:57 a.m.

Hi Jens,

We get the following failure (Stage-2 page fault) when built and ran the SEL2 stack downloaded from https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml -b qemu_sel2 This seems happens when booting the op-tee.

Debugging showed this is related to the relocatable OP-TEE binary support as there is no failure, with disabling CFG_CORE_PHYS_RELOCATABLE in the optee_os/core/arch/arm/arm.mk

Thanks, Shiju

=========================================== NOTICE: BL31: v2.8(debug):v2.8-151-gf4d8ed50d NOTICE: BL31: Built : 17:11:47, Jun 20 2023 INFO: GICv3 without legacy support detected. INFO: ARM GICv3 driver initialized in EL3 INFO: Maximum SPI INTID supported: 287 INFO: BL31: Initializing runtime services INFO: SPM Core setup done. INFO: BL31: Initializing BL32 INFO: Initializing Hafnium (SPMC) INFO: text: 0xe100000 - 0xe125000 INFO: rodata: 0xe125000 - 0xe12b000 INFO: data: 0xe12b000 - 0xe214000 INFO: stacks: 0xe214000 - 0xe224000 INFO: Supported bits in physical address: 48 INFO: Stage 2 has 4 page table levels with 1 pages at the root. INFO: Stage 1 has 4 page table levels with 1 pages at the root. INFO: Memory range: 0xe000000 - 0xeffffff INFO: Loading VM id 0x8001: op-tee. INFO: Loaded with 4 vCPUs, entry at 0xe300000. INFO: Hafnium initialisation completed WARNING: Stage-2 page fault: pc=0xe3105f4, vmid=0x8001, vcpu=0, vaddr=0xd00000, ipaddr=0xd00000, mode=0x1 0xd0000000000007c NOTICE: Injecting Data Abort exception into VM 0x8001. ============================================

Olivier Deprez

9:01 a.m.

Hi,

you need to add following properties in the SP manifest:

diff --git a/qemu_v8/optee_sp_manifest.dts b/qemu_v8/optee_sp_manifest.dts index e576295..4e813e6 100644 --- a/qemu_v8/optee_sp_manifest.dts +++ b/qemu_v8/optee_sp_manifest.dts @@ -29,6 +29,9 @@ messaging-method = <0x3>; /* Direct messaging only */ ns-interrupts-action = <1>; /* NS_ACTION_ME */

+ /* mem-size OP-TEE specific binding. */ + mem-size = <0xd00000>; + /* Boot protocol */ gp-register-num = <0x0>;

@@ -43,4 +46,10 @@ interrupts = <0x28 0xb01>; }; }; + + /* Boot Info */ + boot-info { + compatible = "arm,ffa-manifest-boot-info"; + ffa_manifest; + }; };

Regards, Olivier.

________________________________ From: shiju.jose--- via Hafnium hafnium@lists.trustedfirmware.org Sent: 21 June 2023 10:57 To: hafnium@lists.trustedfirmware.org hafnium@lists.trustedfirmware.org Subject: [Hafnium] Re: Hafnium with QEMU and OP-TEE

Hi Jens,

Debugging showed this is related to the relocatable OP-TEE binary support as there is no failure, with disabling CFG_CORE_PHYS_RELOCATABLE in the optee_os/core/arch/arm/arm.mk

Thanks, Shiju

shiju.jose＠huawei.com

9:20 a.m.

Hi Olivier,

Thanks, it solved the failure, booted.

Regards, Shiju

887

days inactive

1113

days old

hafnium@lists.trustedfirmware.org

22 comments

participants

tags (0)

participants (4)

Jens Wiklander
Jérôme Forissier
Olivier Deprez
shiju.jose＠huawei.com