TF-A List. This issue has also been discussed on Hafnium list before being posted here. Cross posting so we can have a single thread to track going forward.
See https://lists.trustedfirmware.org/pipermail/hafnium/2021-December/000209.htm...
with Olivier's last reply copied below. But see the archive above for full history of the thread.
Hi Wang,
With this level of details; this is difficult to say. You can extend to the TF-A ML if you wish. I'm hinting the SPMD because you are mentioning spmd_smc_forward and cm_el1/2_sysregs_context_restore which are within the SPMD/EL3 space. I wouldn't expect such assert to happen in any regular use case of the reference implementation (because this is a hard EL3 failure). But yes, the problem can be elsewhere in Hafnium or Cactus, but I'd say less likely to alter the EL3 state. Unless Hafnium has a bug leading to corrupting a secure memory region which doesn't belong to it. Beyond this, notice the assert is taken in cm_el1_sysregs_context_restore. It is called by cm_prepare_el3_exit which means it can be related to power management e.g. on a psci resume event. This can be a hint as you say this is occurring 'randomly'.
Regards, Olivier.
Joanna
On 14/12/2021, 19:39, "TF-A on behalf of Chenxu Wang via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi all, I am running FVP with 2CPUs, Cactus SP (SEL1), Hafnium (SEL2) and KVM VHE. Sometimes I send the "FFA_MSG_SEND_DIRECT_REQ" smc call from KVM (I fill 0x8400006f in x0, then VMID and SP ID in x1, let x2 as 0). It says assert failed, like this:
ASSERT: lib/el3_runtime/aarch64/context_mgmt.c:651 BACKTRACE: START: assert 0: EL3: 0x4005cac 1: EL3: 0x400323c 2: EL3: 0x400620c 3: EL3: 0x400e180 4: EL3: 0x4005a94 BACKTRACE: END: assert
After I check the bl31.dump, I notice that: when services/std_svc/spmd/spmd_main.c sends the FFA call (from NS to S) via "spmd_smc_forward(smc_fid, secure_origin,x1, x2, x3, x4, handle)", it will go to cm_el1_sysregs_context_restore(secure_state_out) and cm_el2_sysregs_context_restore(secure_state_out), then it will assert the cm_get_context(). it gets the NULL context, so assert failed.
Before the problem appeared, I have modified many codes on a dirty TF-A v2.4 (commit hash is 0aa70f4c4c023ca58dea2d093d3c08c69b652113), Hafnium and TF-A-TESTS. I also mail with Hafnium MailList, they consider it can be a problem in EL3.
Such assert is NOT ALWAYS failed. I mean, maybe when I run FVP and send "smc" now, it is failed. But when I shut down, run FVP, and send the same instruction with the same parameter again, it is OK.
I want to know, what is the possible reasons for suddenly losing the secure context. Can you give me some advice on debugging? e.g., where should I check? Need I provide more info?
Sincerely, Wang -- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Hi Wang,
Something else I may have missed.
About "running FVP with 2CPU", what does that mean exactly? Are you forcing the model to only instanciate 1 cluster and 2 cpus? If yes what are the models parameters? Are linux and SPMC DTs configured to only declare two physical CPUs? Are you forcing only two cores from TF-A build command line?
Are you using some user space app. and emit SMCs from a linux kernel driver?
The scenario I have in mind is that the system boots properly and emits SMC calls as long as only the primary boot cpu0 is involved. But then when linux is up, SMP enabled, an smc (e.g. direct request) may be emitted from a secondary core for which the EL3 cpu context does not exist (and hence trigger the assert)?
Regards, Olivier.
________________________________________ From: Hafnium hafnium-bounces@lists.trustedfirmware.org on behalf of Joanna Farley via Hafnium hafnium@lists.trustedfirmware.org Sent: 14 December 2021 21:56 To: Chenxu Wang; tf-a@lists.trustedfirmware.org; Olivier Deprez via Hafnium Subject: Re: [Hafnium] [TF-A] A problem about assert failed in TF-A
TF-A List. This issue has also been discussed on Hafnium list before being posted here. Cross posting so we can have a single thread to track going forward.
See https://lists.trustedfirmware.org/pipermail/hafnium/2021-December/000209.htm...
with Olivier's last reply copied below. But see the archive above for full history of the thread.
Hi Wang,
With this level of details; this is difficult to say. You can extend to the TF-A ML if you wish. I'm hinting the SPMD because you are mentioning spmd_smc_forward and cm_el1/2_sysregs_context_restore which are within the SPMD/EL3 space. I wouldn't expect such assert to happen in any regular use case of the reference implementation (because this is a hard EL3 failure). But yes, the problem can be elsewhere in Hafnium or Cactus, but I'd say less likely to alter the EL3 state. Unless Hafnium has a bug leading to corrupting a secure memory region which doesn't belong to it. Beyond this, notice the assert is taken in cm_el1_sysregs_context_restore. It is called by cm_prepare_el3_exit which means it can be related to power management e.g. on a psci resume event. This can be a hint as you say this is occurring 'randomly'.
Regards, Olivier.
Joanna
On 14/12/2021, 19:39, "TF-A on behalf of Chenxu Wang via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi all, I am running FVP with 2CPUs, Cactus SP (SEL1), Hafnium (SEL2) and KVM VHE. Sometimes I send the "FFA_MSG_SEND_DIRECT_REQ" smc call from KVM (I fill 0x8400006f in x0, then VMID and SP ID in x1, let x2 as 0). It says assert failed, like this:
ASSERT: lib/el3_runtime/aarch64/context_mgmt.c:651 BACKTRACE: START: assert 0: EL3: 0x4005cac 1: EL3: 0x400323c 2: EL3: 0x400620c 3: EL3: 0x400e180 4: EL3: 0x4005a94 BACKTRACE: END: assert
After I check the bl31.dump, I notice that: when services/std_svc/spmd/spmd_main.c sends the FFA call (from NS to S) via "spmd_smc_forward(smc_fid, secure_origin,x1, x2, x3, x4, handle)", it will go to cm_el1_sysregs_context_restore(secure_state_out) and cm_el2_sysregs_context_restore(secure_state_out), then it will assert the cm_get_context(). it gets the NULL context, so assert failed.
Before the problem appeared, I have modified many codes on a dirty TF-A v2.4 (commit hash is 0aa70f4c4c023ca58dea2d093d3c08c69b652113), Hafnium and TF-A-TESTS. I also mail with Hafnium MailList, they consider it can be a problem in EL3.
Such assert is NOT ALWAYS failed. I mean, maybe when I run FVP and send "smc" now, it is failed. But when I shut down, run FVP, and send the same instruction with the same parameter again, it is OK.
I want to know, what is the possible reasons for suddenly losing the secure context. Can you give me some advice on debugging? e.g., where should I check? Need I provide more info?
Sincerely, Wang -- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- Hafnium mailing list Hafnium@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/hafnium
Hi Olivier,
Thanks for your kind reply.
Actually, I mean I boot 2 clusters, and each cluster boots 1 CPU. (Although I set NUM_CORES as 4, but only 1 CPU of each cluster boots, feel confused) Also, when I read TPIDR_EL3 in cm_get_context(), I can see two different thread id. I did not run user space app, I think I only send smc in KVM with VHE. Here is my booting command:
FVP_Base_RevC-2xAEMvA -C pctl.startup=0.0.0.0 -C bp.secure_memory=1 -C cluster0.NUM_CORES=4 -C cluster1.NUM_CORES=4 -C cluster0.has_arm_v8-4=1 -C cluster1.has_arm_v8-4=1 -C cache_state_modelled=0 -C bp.pl011_uart0.untimed_fifos=1 -C bp.pl011_uart0.unbuffered_output=1 -C bp.pl011_uart0.out_file=fvp-uart0.log -C bp.pl011_uart1.out_file=fvp-uart1.log -C bp.pl011_uart2.out_file=fvp-uart2.log -C bp.secureflashloader.fname=bl1.bin -C bp.flashloader0.fname=fip.bin --data cluster0.cpu0=Image@0x80080000 --data cluster0.cpu0=fvp-base-aemv8a-aemv8a.dtb@0x82000000 --data cluster0.cpu0=ramdisk.img@0x84000000 -C bp.ve_sysregs.mmbSiteDefault=0 -C bp.ve_sysregs.exit_on_shutdown=1 -C bp.virtioblockdevice.image_path=hostfs.img
Here is some key dmesg when booting linux:
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd0f0] ... [ 0.055843] smp: Bringing up secondary CPUs ... [ 0.088729] psci: failed to boot CPU1 (-22) [ 0.088778] CPU1: failed to boot: -22 [ 0.121008] psci: failed to boot CPU2 (-22) [ 0.121057] CPU2: failed to boot: -22 [ 0.153336] psci: failed to boot CPU3 (-22) [ 0.153373] CPU3: failed to boot: -22 [ 0.185701] Detected PIPT I-cache on CPU4 [ 0.185825] GICv3: CPU4: found redistributor 100 region 0:0x000000002f120000 [ 0.185825] GICv3: CPU4: using allocated LPI pending table @0x00000000ee5a0000 [ 0.185970] CPU4: Booted secondary processor 0x0000000100 [0x410fd0f0] [ 0.217931] psci: failed to boot CPU5 (-22) [ 0.218055] CPU5: failed to boot: -22 [ 0.250259] psci: failed to boot CPU6 (-22) [ 0.250305] CPU6: failed to boot: -22 [ 0.282538] psci: failed to boot CPU7 (-22) [ 0.282575] CPU7: failed to boot: -22 [ 0.282971] smp: Brought up 1 node, 2 CPUs [ 0.283009] SMP: Total of 2 processors activated. ...
Sincerely, Wang
Olivier Deprez Olivier.Deprez@arm.com 于2021年12月16日周四 03:15写道:
Hi Wang,
Something else I may have missed.
About "running FVP with 2CPU", what does that mean exactly? Are you forcing the model to only instanciate 1 cluster and 2 cpus? If yes what are the models parameters? Are linux and SPMC DTs configured to only declare two physical CPUs? Are you forcing only two cores from TF-A build command line?
Are you using some user space app. and emit SMCs from a linux kernel driver?
The scenario I have in mind is that the system boots properly and emits SMC calls as long as only the primary boot cpu0 is involved. But then when linux is up, SMP enabled, an smc (e.g. direct request) may be emitted from a secondary core for which the EL3 cpu context does not exist (and hence trigger the assert)?
Regards, Olivier.
From: Hafnium hafnium-bounces@lists.trustedfirmware.org on behalf of Joanna Farley via Hafnium hafnium@lists.trustedfirmware.org Sent: 14 December 2021 21:56 To: Chenxu Wang; tf-a@lists.trustedfirmware.org; Olivier Deprez via Hafnium Subject: Re: [Hafnium] [TF-A] A problem about assert failed in TF-A
TF-A List. This issue has also been discussed on Hafnium list before being posted here. Cross posting so we can have a single thread to track going forward.
See https://lists.trustedfirmware.org/pipermail/hafnium/2021-December/000209.htm...
with Olivier's last reply copied below. But see the archive above for full history of the thread.
Hi Wang,
With this level of details; this is difficult to say. You can extend to the TF-A ML if you wish. I'm hinting the SPMD because you are mentioning spmd_smc_forward and cm_el1/2_sysregs_context_restore which are within the SPMD/EL3 space. I wouldn't expect such assert to happen in any regular use case of the reference implementation (because this is a hard EL3 failure). But yes, the problem can be elsewhere in Hafnium or Cactus, but I'd say less likely to alter the EL3 state. Unless Hafnium has a bug leading to corrupting a secure memory region which doesn't belong to it. Beyond this, notice the assert is taken in cm_el1_sysregs_context_restore. It is called by cm_prepare_el3_exit which means it can be related to power management e.g. on a psci resume event. This can be a hint as you say this is occurring 'randomly'.
Regards, Olivier.
Joanna
On 14/12/2021, 19:39, "TF-A on behalf of Chenxu Wang via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi all, I am running FVP with 2CPUs, Cactus SP (SEL1), Hafnium (SEL2) and KVM VHE. Sometimes I send the "FFA_MSG_SEND_DIRECT_REQ" smc call from KVM (I fill 0x8400006f in x0, then VMID and SP ID in x1, let x2 as 0). It says assert failed, like this: ASSERT: lib/el3_runtime/aarch64/context_mgmt.c:651 BACKTRACE: START: assert 0: EL3: 0x4005cac 1: EL3: 0x400323c 2: EL3: 0x400620c 3: EL3: 0x400e180 4: EL3: 0x4005a94 BACKTRACE: END: assert After I check the bl31.dump, I notice that: when services/std_svc/spmd/spmd_main.c sends the FFA call (from NS to S) via "spmd_smc_forward(smc_fid, secure_origin,x1, x2, x3, x4, handle)", it will go to cm_el1_sysregs_context_restore(secure_state_out) and cm_el2_sysregs_context_restore(secure_state_out), then it will assert the cm_get_context(). it gets the NULL context, so assert failed. Before the problem appeared, I have modified many codes on a dirty TF-A v2.4 (commit hash is 0aa70f4c4c023ca58dea2d093d3c08c69b652113), Hafnium and TF-A-TESTS. I also mail with Hafnium MailList, they consider it can be a problem in EL3. Such assert is NOT ALWAYS failed. I mean, maybe when I run FVP and send "smc" now, it is failed. But when I shut down, run FVP, and send the same instruction with the same parameter again, it is OK. I want to know, what is the possible reasons for suddenly losing the secure context. Can you give me some advice on debugging? e.g., where should I check? Need I provide more info? Sincerely, Wang -- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- Hafnium mailing list Hafnium@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/hafnium
hafnium@lists.trustedfirmware.org