Hi, experts,
We seem to have solved the problem. This problem is not caused by the load_address of optee, but by the security space configured in the ATF.
It seems that the security space we have configured is too large. And why will it causes the problem needs to be further identified.
Regards,
Yuye.
------------------------------------------------------------------
发件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
发送时间:2023年3月22日(星期三) 11:50
收件人:Olivier Deprez <Olivier.Deprez(a)arm.com>; Jens Wiklander <jens.wiklander(a)linaro.org>
抄 送:hafnium <hafnium(a)lists.trustedfirmware.org>; op-tee <op-tee(a)lists.trustedfirmware.org>
主 题:load optee at 36bit address
Hi, experts,
We are currently moving optee to a 36bit address to boot with environment
that hafnium runs at sel2 as spmc and optee runs at sel1 as sp.
Now we have moved hafnium to 0x880000000 and run successfully.
Then we tried moving optee to a 36bit address (0x89000000) as well.
Although hafnium and optee were successfully initialized on the primary cpu,
psci_cpu_on does not seem to be entered into when the secondary cpu is started.
The error is as follows:
https://github.com/OP-TEE/optee_os/issues/5895 <https://github.com/OP-TEE/optee_os/issues/5895 >
Is there any difference between the two cases
where hafnium and optee initialize the secondary cpu with different load_address?
The log print shows that the secondary cpu has not entered hafnium.
Could Hafnium be affected by the 36bit address when dealing with psci related transactions?
Regards,
Yuye.
Hi, Olivier
I need to explain the query in more detail.
>For building/running hafnium , did you have to apply a change similar to:
>https://review.trustedfirmware.org/c/hafnium/hafnium/+/18510 <https://review.trustedfirmware.org/c/hafnium/hafnium/+/18510 >
Yes.
Some time ago, we used hafnium codebase based on the commit ca03054ba6a351534b93e6d64c12e671578eb340.
And I verified the fix. The compilation passed, but the system cannot boot successfully.
Later, we updated the hafnium codebase based on the commit dd883207ee9b31c19169adf97c918d561dcb9a5c.
And I verified the fix again. The system can boot successfully.
Regards,
Yuye.
------------------------------------------------------------------
发件人:Olivier Deprez <Olivier.Deprez(a)arm.com>
发送时间:2023年3月23日(星期四) 16:54
收件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
主 题:Re: load optee at 36bit address
Hi Yuye, just to acknowledge receipt of this query.
Please accept a slow response as I'm currently away from my office.
I will do a few trials at my end to reproduce and try to help at my best.
For building/running hafnium , did you have to apply a change similar to:
https://review.trustedfirmware.org/c/hafnium/hafnium/+/18510 <https://review.trustedfirmware.org/c/hafnium/hafnium/+/18510 >
Regards,
Olivier.
From: 梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
Sent: 22 March 2023 04:50
To: Olivier Deprez <Olivier.Deprez(a)arm.com>; Jens Wiklander <jens.wiklander(a)linaro.org>
Cc: hafnium <hafnium(a)lists.trustedfirmware.org>; op-tee <op-tee(a)lists.trustedfirmware.org>
Subject: load optee at 36bit address
Hi, experts,
We are currently moving optee to a 36bit address to boot with environment
that hafnium runs at sel2 as spmc and optee runs at sel1 as sp.
Now we have moved hafnium to 0x880000000 and run successfully.
Then we tried moving optee to a 36bit address (0x89000000) as well.
Although hafnium and optee were successfully initialized on the primary cpu,
psci_cpu_on does not seem to be entered into when the secondary cpu is started.
The error is as follows:
https://github.com/OP-TEE/optee_os/issues/5895 <https://github.com/OP-TEE/optee_os/issues/5895 >
Is there any difference between the two cases
where hafnium and optee initialize the secondary cpu with different load_address?
The log print shows that the secondary cpu has not entered hafnium.
Could Hafnium be affected by the 36bit address when dealing with psci related transactions?
Regards,
Yuye.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Jens,
I have a couple of Hafnium changes implementing the use of VSTTBR_EL2/VTTBR_EL2 to split an SP IPA into secure and non-secure IPA spaces.
They're very much in experimental stage so difficult to share just now (I will do some time later in February).
However I'd like to report some possible issue observed with qemu.
Essentially, when normal world driver inits, it performs a first share operation for a single NS page:
INFO: 1> 1 0 FFA_MEM_SHARE_32(84000073) 50 50 0 0 0 0 0
[...]
VERBOSE: Marked sending complete.
Current share states:
SHARE 0x0 (from VM 0x0, attributes 0x6f (NS), flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 0 retrieved, sender's original mode: 0x87
INFO: 1< 1 0 FFA_SUCCESS_32(84000061) 0 0 0 0 0 0 0
INFO: 1> 1 0 FFA_MSG_SEND_DIRECT_REQ_32(8400006f) 8001 0 80000000 0 0 0 0
E/TC:1 0 mobj_ffa_get_by_cookie:387 Populating mobj from rx buffer, cookie 0
Retrieve operation happens:
E/TC:1 0 spmc_retrieve_req:1415 spmc_retrieve_req enter.
INFO: 1> 1 8001 FFA_MEM_RETRIEVE_REQ_32(84000074) 30 30 0 0 0 0 0
Current share states:
SHARE 0x0 (from VM 0x0, attributes 0x6f (NS), flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 0 retrieved, sender's original mode: 0x87
Current share states:
SHARE 0x0 (from VM 0x0, attributes 0x6f (NS), flags 0x8, tag 0, to 1 recipients [VM 0x8001: 0x6 (offset 48)]): fully sent with 1 fragments, 1 retrieved, sender's original mode: 0x87
INFO: 1< 1 8001 Unknown(84000075) 50 50 0 0 0 0 0
Hafnium maps the NS page into OP-TEE's S2 page tables rooted to by VTTBR_EL2
0: e178003 S
1: e179003 S
f: e17a003 S
186: 240000041f867ff NS
(similar dump from VSTTBR_EL2 show OP-TEE secure pages properly mapped)
OP-TEE then maps the page in its S1 PTs as NS:
E/TC:1 0 spmc_retrieve_req:1428 spmc_retrieve_req exit.
E/TC:1 0 thread_spmc_populate_mobj_from_rx:1506 thread_spmc_populate_mobj_from_rx exit.
E/TC:1 0 set_pages:1461 set_pages 0 addr 41f86000 count 1
E/TC:1 0 mobj_ffa_add_pages_at:220 mobj_ffa_add_pages_at is_ns 0
INFO: 1> 1 8001 FFA_RX_RELEASE_32(84000065) 0 0 0 0 0 0 0
INFO: 1< 1 8001 FFA_SUCCESS_32(84000061) 0 0 0 0 0 0 0
E/TC:1 0 ffa_inc_map:566 ffa_inc_map addr fa00000 pages 0x90000000e3eadd0 sz 4096
D/TC:1 0 core_mmu_xlat_table_alloc:526 xlat tables used 4 / 5
A page fault is hit when OP-TEE accesses the page from its VA:
WARNING: Stage-2 page fault: pc=0xe30b764, vmid=0x8001, vcpu=1, vaddr=0xfa0001c, ipaddr=0x41f8601c, mode=0x1 0x40000000000007c
This issue is not observed with the TC2 FVP and similar Hafnium+OP-TEE SW stack, at the same point of initialization.
So it seems qemu is not doing the translation properly from VTTBR_EL2 for a page mapped NS by OP-TEE (hence NS IPA space).
Who should I report this problem to?
Regards,
Olivier.
Hi Jens,
We're preparing for the Hafnium changes introducing FF-A v1.1 mem sharing structures, up to:
https://review.trustedfirmware.org/c/hafnium/hafnium/+/17399
This is done in a backwards compatible manner, in which a FF-A v1.0 endpoint can still use former mem sharing struct definitions.
The requirement is that the endpoint provides the right version either by its manifest, or by calling FFA_VERSION when booted.
In order for this transition to happen smoothly it would require OP-TEE to declare the v1.0 version here:
https://github.com/jenswi-linaro/build/blob/qemu_sel2/qemu_v8/optee_sp_mani…
or call FFA_VERSION first thing:
diff --git a/core/arch/arm/kernel/thread_spmc.c b/core/arch/arm/kernel/thread_spmc.c
index 240bcffe..893cb63b 100644
--- a/core/arch/arm/kernel/thread_spmc.c
+++ b/core/arch/arm/kernel/thread_spmc.c
@@ -1382,6 +1382,18 @@ static uint16_t spmc_get_id(void)
return args.a2;
}
+static uint16_t spmc_version(void)
+{
+ struct thread_smc_args args = {
+ .a0 = FFA_VERSION,
+ .a1 = MAKE_FFA_VERSION(FFA_VERSION_MAJOR, FFA_VERSION_MINOR),
+ };
+
+ thread_smccc(&args);
+
+ return (uint16_t)args.a0;
+}
+
static struct ffa_mem_transaction *spmc_retrieve_req(uint64_t cookie)
{
struct ffa_mem_transaction *trans_descr = nw_rxtx.tx;
@@ -1519,6 +1531,10 @@ out:
static TEE_Result spmc_init(void)
{
+ DMSG("OP-TEE FF-A version %x, SPMC version %x",
+ MAKE_FFA_VERSION(FFA_VERSION_MAJOR, FFA_VERSION_MINOR),
+ spmc_version());
+
spmc_rxtx_map(&nw_rxtx);
my_endpoint_id = spmc_get_id();
DMSG("My endpoint ID %#x", my_endpoint_id);
I also noticed PR #5359 introducing the v1.1 mem sharing structures, so this may be another way forward that I did not investigate.
https://github.com/jenswi-linaro/optee_os/commit/ddd107f019386d035488e3b4e7…
Let me know your opinions.
Regards,
Olivier.
Hi Olivier,
I'm trying to implement a relocatable OP-TEE binary so it can be
loaded at different physical addresses without the need to recompile
it. This means that in the case with Hafnium when changing
"load-address" or "entrypoint-offset" in the OP-TEE SP manifest
there's no need to recompile OP-TEE. For this to work OP-TEE must be
able to figure out which memory range it's supposed to reside in.
Currently, OP-TEE knows the entry point address from PC and "memory
size" from X0. However, the "memory size" is from the "load-address"
so "entrypoint-offset" must be subtracted from PC in order to know the
allocated memory range.
Do you have ideas on how OP-TEE at runtime can determine the allocated
memory range?
Thanks,
Jens
Hi, experts,
We are currently moving optee to a 36bit address to boot with environment
that hafnium runs at sel2 as spmc and optee runs at sel1 as sp.
Now we have moved hafnium to 0x880000000 and run successfully.
Then we tried moving optee to a 36bit address (0x89000000) as well.
Although hafnium and optee were successfully initialized on the primary cpu,
psci_cpu_on does not seem to be entered into when the secondary cpu is started.
The error is as follows:
https://github.com/OP-TEE/optee_os/issues/5895 <https://github.com/OP-TEE/optee_os/issues/5895 >
Is there any difference between the two cases
where hafnium and optee initialize the secondary cpu with different load_address?
The log print shows that the secondary cpu has not entered hafnium.
Could Hafnium be affected by the 36bit address when dealing with psci related transactions?
Regards,
Yuye.
Hi, Jens,
We seem to have found the answer following the comment you gave in the issue:
https://github.com/OP-TEE/optee_os/issues/5877 <https://github.com/OP-TEE/optee_os/issues/5877 >
The reason seems to be that the system has more non-secure memory than what OP-TEE is aware of.
When our system memory DRAM size is 32GB, optee has the following configuration for non-secure state memory:
#define DRAM1_BASE 0x880000000UL
#define DRAM1_SIZE 0x780000000UL
On another device, our system memory DRAM size is 64G, optee uses the above configuration for non-secure memory.
And xtest will not run stably.
In addition, I would like to ask you a few questions in order to understand the problem better.
1.
System has more non-secure memory than what OP-TEE is aware of, and optee will not run stably.
Why? What is the root cause of TEEC_ERROR_OUT_OF_MEMORY on the issue?
I'll do some research on it myself, meanwhile I hope you could give us some insights.
2.
When I execute the following command,
for i in {1.. 10}; do ./xtest&; done;
xtest prints out of order,
And the following error is reported:
E/TC:032 002 mobj_ffa_get_by_cookie:356 possible spinlock deadlock reminder 1
E/TC:070 mobj_ffa_unregister_by_cookie:310 possible spinlock deadlock reminder 1
E/TC:004 003 mobj_ffa_get_by_cookie:356 possible spinlock deadlock reminder 1
E/TC:058 001 mobj_ffa_get_by_cookie:356 possible spinlock deadlock reminder 1
Then the system hangs and restarts automatically.
Does optee currently support running multiple TAs in parallel?
Is the maximum number of TAs running in parallel equal to the number of vcpus?
3.
The values of MAX_XLAT_TABLES and CFG_CORE_HEAP_SIZE also seem to affect the stability of the system.
What configurations, such as the number of cores, should I pay attention to if I want to configure appropriate values for these options?
Thanks a lot.
Regards,
Yuye.
------------------------------------------------------------------
发件人:Jens Wiklander <jens.wiklander(a)linaro.org>
发送时间:2023年3月20日(星期一) 15:05
收件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
抄 送:Olivier Deprez <Olivier.Deprez(a)arm.com>; hafnium <hafnium(a)lists.trustedfirmware.org>; op-tee <op-tee(a)lists.trustedfirmware.org>
主 题:Re: optee xtest cannot run success stably
Hi Yuye,
Comment below.
On Mon, Mar 20, 2023 at 4:43 AM 梅建强(禹夜)
<meijianqiang.mjq(a)alibaba-inc.com> wrote:
> Hi, experts
>
> Recently, we are testing the stability for running optee xtest with environment that hafnium runs as SPMC and optee runs on SPMC as SP.
> When we reboot the system, xtest failed on some cases with TEEC_ERROR_OUT_OF_MEMORY.
> It seems to be that there is an insufficient memory allocation somewhere in the chain.
> We tried the following:
> Using Single core startup,
> Increased optee MAX_XLAT_TABLES size to 16,
> Increased optee CFG_CORE_HEAP_SIZE to 0x2000000,
> Increasing the size of optee CFG_TEE_RAM_VA_SIZE to 0x4000000,
> Increasing the size of hafnium heap_pages to 8192,
> But nothing seems to be working.
> Can you offer any help or suggestions?
It would help if you could pinpoint the source of the out-of-memory
error. I guess it happens somewhere during mobj_ffa_get_by_cookie(),
where especially thread_spmc_populate_mobj_from_rx() is interesting.
It could also be worth setting CFG_CORE_DUMP_OOM=y, it's easy to
enable but I'm afraid it's more of a long shot.
Cheers,
Jens
> Some other configuration for optee is attached in the issue:
> https://github.com/OP-TEE/optee_os/issues/5893 <https://github.com/OP-TEE/optee_os/issues/5893 >
>
> Regards,
> Yuye.
>
Hi, experts
Recently, we are testing the stability for running optee xtest with environment that hafnium runs as SPMC and optee runs on SPMC as SP.
When we reboot the system, xtest failed on some cases with TEEC_ERROR_OUT_OF_MEMORY.
It seems to be that there is an insufficient memory allocation somewhere in the chain.
We tried the following:
Using Single core startup,
Increased optee MAX_XLAT_TABLES size to 16,
Increased optee CFG_CORE_HEAP_SIZE to 0x2000000,
Increasing the size of optee CFG_TEE_RAM_VA_SIZE to 0x4000000,
Increasing the size of hafnium heap_pages to 8192,
But nothing seems to be working.
Can you offer any help or suggestions?
Some other configuration for optee is attached in the issue:
https://github.com/OP-TEE/optee_os/issues/5893 <https://github.com/OP-TEE/optee_os/issues/5893 >
Regards,
Yuye.
Hi,
With the introduction of FFA_CONSOLE_LOG ABI [1], we are intending to replace and remove support for HF_DEBUG_LOG.
This proposal is in review in the following stages:
1) Remove the dependency of hftest VMs on HF_DEBUG_LOG and move to FFA_CONSOLE_LOG [2]
2) Remove the support for HF_DEBUG_LOG (i.e. api_debug_log) from hafnium project. [3]
The adoption of FFA_CONSOLE_LOG will allow us to make use of its ability to log multiple characters at a time, as opposed to HF_DEBUG_LOG which writes one character at a time.
This improvement will be enabled in a future patch. Also, should [3] be adopted, we will make accompanying changes to tf-a-tests Cactus-based tests.
We want to know if there are any concerns about removing support for HF_DEBUG_LOG at this time as we realize other downstream SPs may rely on its support.
Thank you,
Kathleen Capella
[1] feat(console_log): add FFA_CONSOLE_LOG ABI https://review.trustedfirmware.org/c/hafnium/hafnium/+/15334
[2] feat(ffa_console_log): replace hf_debug_log https://review.trustedfirmware.org/c/hafnium/hafnium/+/19513
[3] refactor: remove support for HF_DEBUG_LOG https://review.trustedfirmware.org/c/hafnium/hafnium/+/19681
Hi, Olivier
I have forgotten one thing, hafnium will invalidate the data cache for the whole image.
adrp x0, ORIGIN_ADDRESS
adrp x1, image_end
sub x1, x1, x0
bl arch_cache_data_invalidate_range
I solved this problem by using an address-fixed variable outside the data section instead of the global variabal malloced in data section.
In addition, I would like to ask that
if multiple optees run on the Hafnium, how to switch the vCPUs of different optees?
If one of the optees is expected to be boot under the host OS (post-boot load), how to switch the vCPUs?
Thanks for the support.
Regards,
Yuye.
------------------------------------------------------------------
发件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
发送时间:2023年3月11日(星期六) 21:36
收件人:Olivier Deprez <Olivier.Deprez(a)arm.com>
抄 送:Jens Wiklander <jens.wiklander(a)linaro.org>; hafnium(a)lists.trustedfirmware.org <hafnium(a)lists.trustedfirmware.org>
主 题:Re: Multi-OPTEE run with Hafnium
Hi, Olivier,
The framework message definition that I mentioned is refered in the FF-A Specification.
I hope I didn't misinterpret it.
The work flow is interpreted as follows:
When the system run into the host OS,
I want implemented the post-boot load optee as mentioned in the email with Hafnium as SPMC
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/18635 <https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/18635 >
However, when the intialization about hafnium and optee has been finished,
SMC_RET8 instruction failed reback from spmd to optee driver.
I suspect that there may be some special handling of context needed here.
Can you provide some suggestions or help?
Regards,
Yuye.
------------------------------------------------------------------
发件人:Olivier Deprez <Olivier.Deprez(a)arm.com>
发送时间:2023年3月7日(星期二) 16:55
收件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
抄 送:Jens Wiklander <jens.wiklander(a)linaro.org>; hafnium(a)lists.trustedfirmware.org <hafnium(a)lists.trustedfirmware.org>
主 题:Re: Multi-OPTEE run with Hafnium
Hi Yuye,
I sense those questions are related to downstream changes of yours, and I may not be able to provide accurate answers without seeing the code.
Please clarify if the questions are related to the upstream version of Hafnium.
See additional comments below [OD].
Regards,
Olivier.
From: 梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
Sent: 06 March 2023 14:15
To: Olivier Deprez <Olivier.Deprez(a)arm.com>
Cc: Jens Wiklander <jens.wiklander(a)linaro.org>
Subject: Re: Multi-OPTEE run with Hafnium
Hi, Olivier,
On this question, I should add something based on the results of my tests.
When I enter Hafnium through framwork message, the value of the global variable remains the same.
[OD] please explain what is a framework message.
When I re-enter the Hafnium image_entry via spmd_init, the global variable is restored to its original value.
[OD] I'm not sure about the flow here, it depends about where the variable is declared and whether you have local changes to the hafnium boot flows.
Do you know why?
Regards,
Yuye.
------------------------------------------------------------------
发件人:梅建强(禹夜) <meijianqiang.mjq(a)alibaba-inc.com>
发送时间:2023年3月6日(星期一) 17:00
收件人:Olivier Deprez <olivier.deprez(a)arm.com>
抄 送:Jens Wiklander <jens.wiklander(a)linaro.org>
主 题:Multi-OPTEE run with Hafnium
Hi, Olivier,
I want to ask a question about variable modification.
Before this, let me explain the background about the question.
Now, I have run into host OS with Hafnium as spmc and OPTEE as SP.
Then the optee driver will send a framwork message to Hafnium to inform it that whether the current state of the host OS is booting or running,
[OD] What is a 'framework message' in this context? Is this a custom direct request from driver to optee?
so that Hafnium could know which vm node(SP) it should load and init at current state.
[OD] Can you please explain why this is required?
This current state is represented by a global variable.
[OD] do you mean a variable in Hafnium's BSS space?
Will the global variable change when message flow exit Hafnium and enter it again?
[OD] This depends in which section the variable is declared (see above).
Regards,
Yuye.