TF-A List. This issue has also been discussed on Hafnium list before being posted here. Cross posting so we can have a single thread to track going forward.
See https://lists.trustedfirmware.org/pipermail/hafnium/2021-December/000209.htm...
with Olivier's last reply copied below. But see the archive above for full history of the thread.
Hi Wang,
With this level of details; this is difficult to say. You can extend to the TF-A ML if you wish. I'm hinting the SPMD because you are mentioning spmd_smc_forward and cm_el1/2_sysregs_context_restore which are within the SPMD/EL3 space. I wouldn't expect such assert to happen in any regular use case of the reference implementation (because this is a hard EL3 failure). But yes, the problem can be elsewhere in Hafnium or Cactus, but I'd say less likely to alter the EL3 state. Unless Hafnium has a bug leading to corrupting a secure memory region which doesn't belong to it. Beyond this, notice the assert is taken in cm_el1_sysregs_context_restore. It is called by cm_prepare_el3_exit which means it can be related to power management e.g. on a psci resume event. This can be a hint as you say this is occurring 'randomly'.
Regards, Olivier.
Joanna
On 14/12/2021, 19:39, "TF-A on behalf of Chenxu Wang via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi all, I am running FVP with 2CPUs, Cactus SP (SEL1), Hafnium (SEL2) and KVM VHE. Sometimes I send the "FFA_MSG_SEND_DIRECT_REQ" smc call from KVM (I fill 0x8400006f in x0, then VMID and SP ID in x1, let x2 as 0). It says assert failed, like this:
ASSERT: lib/el3_runtime/aarch64/context_mgmt.c:651 BACKTRACE: START: assert 0: EL3: 0x4005cac 1: EL3: 0x400323c 2: EL3: 0x400620c 3: EL3: 0x400e180 4: EL3: 0x4005a94 BACKTRACE: END: assert
After I check the bl31.dump, I notice that: when services/std_svc/spmd/spmd_main.c sends the FFA call (from NS to S) via "spmd_smc_forward(smc_fid, secure_origin,x1, x2, x3, x4, handle)", it will go to cm_el1_sysregs_context_restore(secure_state_out) and cm_el2_sysregs_context_restore(secure_state_out), then it will assert the cm_get_context(). it gets the NULL context, so assert failed.
Before the problem appeared, I have modified many codes on a dirty TF-A v2.4 (commit hash is 0aa70f4c4c023ca58dea2d093d3c08c69b652113), Hafnium and TF-A-TESTS. I also mail with Hafnium MailList, they consider it can be a problem in EL3.
Such assert is NOT ALWAYS failed. I mean, maybe when I run FVP and send "smc" now, it is failed. But when I shut down, run FVP, and send the same instruction with the same parameter again, it is OK.
I want to know, what is the possible reasons for suddenly losing the secure context. Can you give me some advice on debugging? e.g., where should I check? Need I provide more info?
Sincerely, Wang -- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a