Hi,
I'm testing with Hafnium as SPMC at S-EL2 and OP-TEE as an SP at S-EL1 on
QEMU v7.0.0. I've run into a few problems and fixed most of them.
I believe the setup is similar to what Shiju is using in this mail thread
https://lists.trustedfirmware.org/archives/list/hafnium@lists.trustedfirmwa…
My setup can be duplicated with:
repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \
-b qemu_sel2
repo sync -j8
(cd hafnium && git submodule init && git submodule update)
cd build
make -j8 toolchains
make -j8 all
make run-only
With this xtest -x 1034 passes, xtest 1034 often causes
ERROR: Data abort: pc=0xe1198b8, esr=0x96000006, ec=0x25, far=0x9c
Panic: EL2 exception
Xtest runs dreadfully slow, I haven't investigated why yet, but at
least it works.
This is based on patches provided by Olivier at:
[1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16412/2
[2] https://review.trustedfirmware.org/c/hafnium/hafnium/+/16323/7
I've also encountered the problem cache maintenance problem Shiju
described in the mail thread above:
NOTICE: Trapped access to system register write: op0=1, op1=0, crn=7,
crm=14, op2=2, rt=11.
It can be worked around by compiling OP-TEE with
CFG_CORE_WORKAROUND_NSITR_CACHE_PRIME=n. I'm pretty sure we do dcache
clean+inv elsewhere so I'm surprised it fails here. Is Hafnium expected
to block dcache clean+inv?
For Hafnium I've added two patches on top of [2], available at
https://github.com/jenswi-linaro/hafnium/tree/qemu_sel2:
- 79b4d2cbe06e SPMC: add missing ME initialization for secondary cores
- 659c79d5eacf feat(mm): fix FEAT_LPA workaround
For TF-A I've added a few patches on top of [1], available at
https://github.com/jenswi-linaro/arm-trusted-firmware/tree/qemu_sel2:
- a040396cae9e feat(qemu): add tos-fw-config for sel2 spmc
- 4f7d91723485 fix(qemu): change TOS_FW_CONFIG_NAME value
- fbfc9a222c7f spmd_smc_handler() add s/ns state to SMC traces
- ca65081b9cdc feat(sptool): add dependency to SP image
- b1e1b46a0680 fix(qemu): restore code to added needed psci nodes
For OP-TEE I've also added a few patches, available at
https://github.com/jenswi-linaro/optee_os/tree/qemu_sel2:
- 1057def23777 plat-vexpress: sel2 spmc: update for hafnium
- f18a54ed3524 core: ffa: use hvc instead of smc with S-EL2
- d18bbc92f7c1 core: mobj_ffa_add_pages_at() trust addresses from SPMC
There's also one patch for QEMU on top of v7.0.0, available at:
https://github.com/jenswi-linaro/qemu/tree/qemu_sel2
- 0c1e39672dcb Read PS bits from VTCR_EL2
The QEMU problem is fixed in v.7.1.0, but I can't get that version of
QEMU to work with TF-A. I guess it's because of yet another new CPU
feature since I'm running with "-cpu max".
I'll try to upstream the Hafnium and TF-A patches that make sense on
their own.
What's the plan with the interrupt controller?
How will OP-TEE be able to handle secure interrupts?
The hafnium git pulls in a few git submodules and even the source code
for a Linux kernel.
I guess this is useful in your internal CI setup, but when used
isolated as in my setup it makes no sense at all.
It would also be nice to be able to build with an external toolchain.
I hope this is a temporary situation, I don't see why Hafnium should
be pickier about toolchain than for instance TF-A.
Speaking of building, I haven't been able to figure out how to build
only for the QEMU variant I need so right now I'm building for
everything and that takes a bit longer than necessary.
I'm going to maintain the setup above as long as it's relevant to me. I may
add more patches on the branches or even rebase as needed. So if anyone is
using this, keep in mind that my branches may change without warning.
Thanks,
Jens
Hi Jens,
I have a couple of Hafnium changes implementing the use of VSTTBR_EL2/VTTBR_EL2 to split an SP IPA into secure and non-secure IPA spaces.
They're very much in experimental stage so difficult to share just now (I will do some time later in February).
However I'd like to report some possible issue observed with qemu.
Essentially, when normal world driver inits, it performs a first share operation for a single NS page:
INFO: 1> 1 0 FFA_MEM_SHARE_32(84000073) 50 50 0 0 0 0 0
[...]
VERBOSE: Marked sending complete.
Current share states:
SHARE 0x0 (from VM 0x0, attributes 0x6f (NS), flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 0 retrieved, sender's original mode: 0x87
INFO: 1< 1 0 FFA_SUCCESS_32(84000061) 0 0 0 0 0 0 0
INFO: 1> 1 0 FFA_MSG_SEND_DIRECT_REQ_32(8400006f) 8001 0 80000000 0 0 0 0
E/TC:1 0 mobj_ffa_get_by_cookie:387 Populating mobj from rx buffer, cookie 0
Retrieve operation happens:
E/TC:1 0 spmc_retrieve_req:1415 spmc_retrieve_req enter.
INFO: 1> 1 8001 FFA_MEM_RETRIEVE_REQ_32(84000074) 30 30 0 0 0 0 0
Current share states:
SHARE 0x0 (from VM 0x0, attributes 0x6f (NS), flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 0 retrieved, sender's original mode: 0x87
Current share states:
SHARE 0x0 (from VM 0x0, attributes 0x6f (NS), flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 1 retrieved, sender's original mode: 0x87
INFO: 1< 1 8001 Unknown(84000075) 50 50 0 0 0 0 0
Hafnium maps the NS page into OP-TEE's S2 page tables rooted to by VTTBR_EL2
0: e178003 S
1: e179003 S
f: e17a003 S
186: 240000041f867ff NS
(similar dump from VSTTBR_EL2 show OP-TEE secure pages properly mapped)
OP-TEE then maps the page in its S1 PTs as NS:
E/TC:1 0 spmc_retrieve_req:1428 spmc_retrieve_req exit.
E/TC:1 0 thread_spmc_populate_mobj_from_rx:1506 thread_spmc_populate_mobj_from_rx exit.
E/TC:1 0 set_pages:1461 set_pages 0 addr 41f86000 count 1
E/TC:1 0 mobj_ffa_add_pages_at:220 mobj_ffa_add_pages_at is_ns 0
INFO: 1> 1 8001 FFA_RX_RELEASE_32(84000065) 0 0 0 0 0 0 0
INFO: 1< 1 8001 FFA_SUCCESS_32(84000061) 0 0 0 0 0 0 0
E/TC:1 0 ffa_inc_map:566 ffa_inc_map addr fa00000 pages 0x90000000e3eadd0 sz 4096
D/TC:1 0 core_mmu_xlat_table_alloc:526 xlat tables used 4 / 5
A page fault is hit when OP-TEE accesses the page from its VA:
WARNING: Stage-2 page fault: pc=0xe30b764, vmid=0x8001, vcpu=1, vaddr=0xfa0001c, ipaddr=0x41f8601c, mode=0x1 0x40000000000007c
This issue is not observed with the TC2 FVP and similar Hafnium+OP-TEE SW stack, at the same point of initialization.
So it seems qemu is not doing the translation properly from VTTBR_EL2 for a page mapped NS by OP-TEE (hence NS IPA space).
Who should I report this problem to?
Regards,
Olivier.
Hi,
With the introduction of FFA_CONSOLE_LOG ABI [1], we are intending to replace and remove support for HF_DEBUG_LOG.
This proposal is in review in the following stages:
1) Remove the dependency of hftest VMs on HF_DEBUG_LOG and move to FFA_CONSOLE_LOG [2]
2) Remove the support for HF_DEBUG_LOG (i.e. api_debug_log) from hafnium project. [3]
The adoption of FFA_CONSOLE_LOG will allow us to make use of its ability to log multiple characters at a time, as opposed to HF_DEBUG_LOG which writes one character at a time.
This improvement will be enabled in a future patch. Also, should [3] be adopted, we will make accompanying changes to tf-a-tests Cactus-based tests.
We want to know if there are any concerns about removing support for HF_DEBUG_LOG at this time as we realize other downstream SPs may rely on its support.
Thank you,
Kathleen Capella
[1] feat(console_log): add FFA_CONSOLE_LOG ABI https://review.trustedfirmware.org/c/hafnium/hafnium/+/15334
[2] feat(ffa_console_log): replace hf_debug_log https://review.trustedfirmware.org/c/hafnium/hafnium/+/19513
[3] refactor: remove support for HF_DEBUG_LOG https://review.trustedfirmware.org/c/hafnium/hafnium/+/19681
Hi,
Current Hafnium build and test configs cover both cases of non-VHE (HCR_EL2.E2H=0) and VHE (HCR_EL2.E2H=1).
From an Arm architecture perspective the latter is a superset of the former.
We get to know that non-VHE becomes a legacy, and other projects in general tend to always enable Armv8.1 VHE extension early at boot.
Our focus being S-EL2 (Armv8.4+) it looks reasonable to assume Armv8.1's VHE is present on chipsets loading and booting Hafnium.
That said we're exploring the possibility to abandon 'non-vhe' builds, and focus build and test on vhe-enabled builds.
The intent to simplify build configurations, and improve build and test time.
Joao made experiments in 2 steps, first on removing the said build configurations, and then runtime checks.
This gives good results at least in terms of build time , and simplification in build/test scripts.
https://review.trustedfirmware.org/c/hafnium/hafnium/+/18925/https://review.trustedfirmware.org/c/hafnium/project/reference/+/18926/
This message is to poll the community for feedback, as to whether this is foreseen as an issue (or not!) for on going deployments, before we engage further in this refactoring.
Thanks & Regards,
Olivier.
Hi Yuye,
See comments inline [OD].
Regards,
Olivier.
________________________________
From: 梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
Sent: 22 February 2023 04:17
To: Olivier Deprez <Olivier.Deprez(a)arm.com>; hafnium <hafnium(a)lists.trustedfirmware.org>
Cc: 高海源(码源) <haiyuan.ghy(a)alibaba-inc.com>; 王一蒙(北见) <wym389994(a)alibaba-inc.com>
Subject: 回复:[Hafnium] hafnium page table configuration
Hi, Olivier,
Sorry for the delay in replying to this query.
As I've seen in the documentation, in the case of hafnium implemented as S-EL2,
NWd uses shared memory to communicate with SWd and there are two steps to register the page table.
Please correct me if there are any errors in my description.
Firstly, Linux sends FFA_MEM_SHARE to SPMC (Hafnium), which completes memory mapping for S-EL1's stage-2 transition.
Secondly, Linux sends FFA_MSG_SEND_DIRECT_REQ to OPTEE, and OPTEE then retrieves the IPA space from SPMC according to the cookie received
and then completes the memory mapping for S-EL1's stage-1 transition.
[OD] That's quite correct, with the subtle difference that the region is mapped in SP's S2 page tables upon the receiver SP emitting the memory retrieve request.
The SP maps the region in the S1 page tables after receiving the memory retrieve response from the SPMC.
Our solution to the problem is: After the first step be completed, we add the invalidation of TLB,
which seems to greatly reduce the probability of problem occurrence when testing the optee_examples.
The root cause still needs to be further located.
[OD] There is a probable miss with invalidating the S2 TLB entries for NS IPA space.
Interestingly, this issue is not observed with models, but likely to happen on real silicon.
Can you try the suggested temp. fix from https://github.com/OP-TEE/optee_os/issues/5803#issuecomment-1436084763 ?
(while reverting your own TLB invalidation fix. ?)
I'm working on a cleaner fix, and prioritize if this issue is confirmed at your end.
It should be noted that our OPTEE version is based on 3.19-rc,
and Hafnium version is based on the commit:
https://git.trustedfirmware.org/hafnium/hafnium.git/commit/?id=dd883207ee9b…
Anyway, thanks for the support.
Regards,
Yuye.
------------------------------------------------------------------
发件人:Olivier Deprez <Olivier.Deprez(a)arm.com>
发送时间:2023年2月11日(星期六) 00:56
收件人:hafnium <hafnium(a)lists.trustedfirmware.org>; 梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
主 题:Re: [Hafnium] hafnium page table configuration
Hi Yuye,
Quick feedback, we have a test case testing SP to SP mem sharing with a large physical address (for a NS memory region), so I expect this is covered:
https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/tftf/tests/runtime…https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/spm/cactus/cactus_…https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/spm/cactus/plat/ar…
There may certainly be other reasons, but before investigating further, is this still an issue at your end?
Regards,
Olivier.
________________________________
From: 梅建强(禹夜) via Hafnium <hafnium(a)lists.trustedfirmware.org>
Sent: 06 February 2023 07:41
To: hafnium <hafnium(a)lists.trustedfirmware.org>
Subject: [Hafnium] hafnium page table configuration
Hi,
At present, I suspect that this may be a hafnium problem,
secure_storage CA/TA may use over 34G address space before communication.
May I ask, does hafnium currently support the configuration of the page table of 0x8 80000000-xxxxxxxxx address range?
If yes, how to configure it?
The error log is as follows:
Current share states: SHARE 0x0 (from VM 0x0, attributes 0x2f, flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 1 retrieved, sender's original mode: 0x7 SHARE 0x1 (from VM 0x0, attributes 0x2f, flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 1 retrieved, sender's original mode: 0x7 SHARE 0x2 (from VM 0x0, attributes 0x2f, flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 1 retrieved, sender's original mode: 0x7 WARNING: Stage-2 page fault: pc=0x40032106, vmid=0x8001, vcpu=13, vaddr=0x40046ce0, ipaddr=0x8a9b28ce0, mode=0x81 0x63 NOTICE: Injecting Data Abort exception into VM 0x8001. D/TC:013 0 abort_handler:550 [abort] abort in User mode (TA will panic) E/TC:??? 0 E/TC:??? 0 User mode data-abort at address 0x40046ce0 (translation fault) E/TC:??? 0 esr 0x94020007 ttbr0 0x20000f03061a0 ttbr1 0x00000000 cidr 0x0 E/TC:??? 0 cpu #13 cpsr 0x00000130 E/TC:??? 0 x0 000000004003e348 x1 000000004003e349 E/TC:??? 0 x2 0000000040046ce0 x3 000000004003e348 E/TC:??? 0 x4 0000000040036088 x5 0000000000000000 E/TC:??? 0 x6 0000000000000000 x7 0000000040042e28 E/TC:??? 0 x8 0000000000000000 x9 0000000000000000 E/TC:??? 0 x10 0000000000000000 x11 0000000000000000 E/TC:??? 0 x12 0000000000000000 x13 0000000040042e28 E/TC:??? 0 x14 00000000400213cf x15 0000000000000000 E/TC:??? 0 x16 0000000000000000 x17 0000000000000000 E/TC:??? 0 x18 0000000000000000 x19 0000000000000000 E/TC:??? 0 x20 0000000000000000 x21 0000000000000000 E/TC:??? 0 x22 0000000000000000 x23 0000000000000000 E/TC:??? 0 x24 0000000000000000 x25 0000000000000000 E/TC:??? 0 x26 0000000000000000 x27 0000000000000000 E/TC:??? 0 x28 0000000000000000 x29 0000000000000000 E/TC:??? 0 x30 0000000000000000 elr 0000000040032106 E/TC:??? 0 sp_el0 0000000040042f80 E/LD: Status of TA f4e750bb-1437-4fbf-8785-8d3580c34994 E/LD: arch: arm E/LD: region 0: va 0x40006000 pa 0xf0404000 size 0x002000 flags rw-s (ldelf) E/LD: region 1: va 0x40008000 pa 0xf0406000 size 0x011000 flags r-xs (ldelf) E/LD: region 2: va 0x40019000 pa 0xf0417000 size 0x001000 flags rw-s (ldelf) E/LD: region 3: va 0x4001a000 pa 0xf0418000 size 0x004000 flags rw-s (ldelf) E/LD: region 4: va 0x4001e000 pa 0xf041c000 size 0x001000 flags r--s E/LD: region 5: va 0x4001f000 pa 0x00001000 size 0x017000 flags r-xs [0] E/LD: region 6: va 0x40036000 pa 0x00018000 size 0x00c000 flags rw-s [0] E/LD: region 7: va 0x40042000 pa 0xf0440000 size 0x001000 flags rw-s (stack) E/LD: region 8: va 0x40043000 pa 0x8b9101620 size 0x003000 flags rw-- (param) E/LD: region 9: va 0x40046000 pa 0x8a9b28ce0 size 0x001000 flags rw-- (param) E/LD: [0] f4e750bb-1437-4fbf-8785-8d3580c34994 @ 0x4001f000 E/LD: Call stack
Regards,
Yuye
------------------------------------------------------------------
发件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
发送时间:2023年2月4日(星期六) 11:40
收件人:op-tee <op-tee(a)lists.trustedfirmware.org>
抄 送:赵哲(为哲) <weizhe.zz(a)alibaba-inc.com>; 高海源(码源) <haiyuan.ghy(a)alibaba-inc.com>; 王一蒙(北见) <wym389994(a)alibaba-inc.com>
主 题:OPTEE TA Crash
Hi,
Does anyone have a good solution to this problem?
https://github.com/OP-TEE/optee_os/issues/5803 <https://github.com/OP-TEE/optee_os/issues/5803 >
regards,
yuye
--
Hafnium mailing list -- hafnium(a)lists.trustedfirmware.org
To unsubscribe send an email to hafnium-leave(a)lists.trustedfirmware.org
Hi, experts
Here are the comments I saw in the Hafnium code.
/*
* Hafnium doesn't support fragmentation of memory retrieve requests
* (because it doesn't support caller-specified mappings, so a request
* will never be larger than a single page), so this must be part of a
* memory send (i.e. donate, lend or share) request.
*
* We can tell from the handle whether the memory transaction is for the
* TEE or not.
*/
I have a few questions about this description.
1. In the case of Hafnium as SPMC, optee should register memory fragments allocated by linux.
What does Hafnium do with these memory fragments since doesn't support fragmentation of memory retrieve requests?
2. How does Hafnium map the page tables of stage-2 in dynamic shared memory for CA/TA,
and are these page tables stored in the heap, or where?
3. What does HF_MAILBOX for dynamic shared memory mean? How does it relate to RX/TX buffer?
Where can I see the related introduction in detail?
Regards,
Yuye