Hi Ming,
IMO, what's happening is SError was caused by lower EL and it traps to EL3(RAS_EXTENSION means SCR_EL3.EA=1) and it goes to "serror_aarch64" vector entry. After unmasking SError at EL3 it again causes SError at EL3 and ends up in weak implementation of "plat_handle_el3_ea"(ends up report_unhandled_exception).
The reason to unmask SError is to catch SError caused by EL3 during handling of lower EL SError. I think having a synchronization barrier(esb) may prevent multiple trapping of SError. Just to let you know I am actively working on RAS refactoring in TF-A (and fixing bugs). I will add you to those patches.
For fixing this particular issue, I am planning below change, would you please test if it works for you? No need to unmask EAs if we are already handling an EA (unlike other exceptions)
--- a/bl31/aarch64/runtime_exceptions.S +++ b/bl31/aarch64/runtime_exceptions.S @@ -402,11 +402,8 @@ end_vector_entry fiq_aarch32 vector_entry serror_aarch32 save_x30 apply_at_speculative_wa -#if RAS_EXTENSION + esb msr daifclr, #DAIF_ABT_BIT -#else - check_and_unmask_ea -#endif b handle_lower_el_async_ea
Thanks Manish ________________________________ From: Ming Huang huangming@linux.alibaba.com Sent: 08 March 2023 14:22 To: Jeenu Viswambharan Jeenu.Viswambharan@arm.com Cc: Manish Pandey2 Manish.Pandey2@arm.com; tf-a@lists.trustedfirmware.org tf-a@lists.trustedfirmware.org Subject: About Unmask the SError in serror_aarch64
Hi Jeenu,
vector_entry serror_aarch64 + msr daifclr, #DAIF_ABT_BIT
Why the SError must be unmask in serror_arrch64?
We found unmask SError will lead TF-A panic with output "Unhandled Exception in EL3". --------------------------------------------------------------------------------------log: Detected DPC, skip AER core[64] mm(925) return: 2 [ 109.883159] pcieport 0000:b0:00.0: DPC: containment event, status:0x0005 source:0xb000 [ 109.891195] pcieport 0000:b0:00.0: DPC: ERR_FATAL detected [ 109.896776] igb 0000:b1:00.0 ens51f0: PCIe link lost
[root@localhost.localdomain /root/hm] #[ 110.068410] pcieport 0000:b0:00.0: pciehp: Slot(51): Link Down/Up ignored (recovered by DPC) [ 110.076988] igb 0000:b1:00.0: enabling device (0000 -> 0002) Unhandled Exception in EL3. x30 = 0x00000000ff013b84 x0 = 0x0000000000000000 x1 = 0xffff800011e7500c x2 = 0x0000000000000000 x3 = 0xffff800011e75004 x4 = 0xffff800011a82000 x5 = 0x000000000000000c x6 = 0x00000000000000fb x7 = 0xffff800011441e80 x8 = 0x0000000000000000 x9 = 0xffff800010655270 x10 = 0x00000000ffff8000 x11 = 0xffff800011701e80 x12 = 0x0000000000000001 x13 = 0xffff800010c08cc0 x14 = 0x0000000000000c80 x15 = 0x0000000000000001 x16 = 0x67692070552f6e77 x17 = 0x7228206465726f6e x18 = 0x0000000000000030 x19 = 0xffff04000032de80 x20 = 0xffff04000032dea0 x21 = 0xffff040002c2f000 x22 = 0xffff000809280000 x23 = 0xffff00080a217800 x24 = 0xffff800011853f80 x25 = 0xffff040002c2e0c8 x26 = 0xffff800011923de8 x27 = 0xffff000817fc8740 x28 = 0x0000000000000000 x29 = 0xffff80001e9ebb00 scr_el3 = 0x000000000403073d sctlr_el3 = 0x0000000030cd183f cptr_el3 = 0x0000000000000100 tcr_el3 = 0x0000000080843514 daif = 0x00000000000003c0 mair_el3 = 0x00000000004404ff spsr_el3 = 0x00000000624002cd elr_el3 = 0x00000000ff013d84 ttbr0_el3 = 0x00000000ff093001 esr_el3 = 0x00000000be000011 far_el3 = 0x7abce97e90b5fee1 spsr_el1 = 0x0000000000000000 elr_el1 = 0x0000000000000000 spsr_abt = 0x0000000000000000 spsr_und = 0x0000000000000000 spsr_irq = 0x0000000000000000 spsr_fiq = 0x0000000000000000 sctlr_el1 = 0x0000000030d00800 actlr_el1 = 0x0000000000000000 cpacr_el1 = 0x0000000000000000 csselr_el1 = 0x0000000000000002 sp_el1 = 0x0000000000000000 esr_el1 = 0x0000000000000000 ttbr0_el1 = 0x0000000000000000 ttbr1_el1 = 0x0000000000000000 mair_el1 = 0x0000000000000000 amair_el1 = 0x0000000000000000 tcr_el1 = 0x0000000000000000 tpidr_el1 = 0xffff800f6e6f1000 tpidr_el0 = 0x00000000f6e1ede0 tpidrro_el0 = 0x0000000000000000 par_el1 = 0xff000000f4214b80 mpidr_el1 = 0x0000000081000000 afsr0_el1 = 0x0000000000000000 afsr1_el1 = 0x0000000000000000 contextidr_el1 = 0x0000000000000000 vbar_el1 = 0x0000000000000000 cntp_ctl_el0 = 0x0000000000000000 cntp_cval_el0 = 0x000000010b938e04 cntv_ctl_el0 = 0x0000000000000000 cntv_cval_el0 = 0x0000000000000000 cntkctl_el1 = 0x0000000000000000 sp_el0 = 0xffff000809280000 isr_el1 = 0x0000000000000040 cpupwrctlr_el1 = 0x0000000000000000 --------------------------------------------------------------------------------------
Remove the above line(mask SError), TF-A continue execute with some useful output. --------------------------------------------------------------------------------------log: Detected DPC, skip AER core[64] mm(925) return: 2 [ 278.193642] pcieport 0000:b0:00.0: DPC: containment event, status:0x0005 source:0xb000 [ 278.201680] pcieport 0000:b0:00.0: DPC: ERR_FATAL detected [ 278.207262] igb 0000:b1:00.0 ens51f0: PCIe link lost
[root@localhost.localdomain /root/hm] #[ 278.378416] pcieport 0000:b0:00.0: pciehp: Slot(51): Link Down/Up ignored (recovered by DPC) [ 278.386993] igb 0000:b1:00.0: enabling device (0000 -> 0002) ERROR: Excepton received on 0x81000000, spsr_el3:82401009,reason:0 esr_el3:0xbe000411 ue_cnt:0x0 Exception Class = 2f: SError interrupt. Print cpu register: |-elr_el3: ffff8000106550f8 |-far_el3: 7abce97e92b57fe1 |-scr_el3: 403073d |-sctlr_el3: 30cd183f |-LR: ffff800010655270 |-SP: ffff0008091d1200 |-x0: 0000000000000000 x1: ffff800011e7500c |-x2: 0000000000000000 x3: ffff800011e75004 |-x4: ffff800011a82000 x5: 000000000000000c |-x6: 00000000000000fb x7: ffff800011441e80 |-x8: 0000000000000000 x9: ffff800010655270 |-x10: 00000000ffff8000 x11: ffff800011701e80 |-x12: 0000000000000001 x13: ffff800010c08cc0 |-x14: 0000000000000c80 x15: 0000000000000001 |-x16: 67692070552f6e77 x17: 7228206465726f6e |-x18: 0000000000000030 x19: ffff040005a7e100 |-x20: ffff040005a7e120 x21: ffff040001e9f000 |-x22: ffff0008091d1200 x23: ffff000809cdf800 |-x24: ffff800011853f80 x25: ffff040001e9e0c8 |-x26: ffff800011923de8 x27: ffff000807549140 |-x28: 0000000000000000 x29: ffff80001cc9bb00 INFO: mpidr:81000000, stop s-wtd. Return to lower EL by SDEI ->Core[0](0x81000000) received intr=0(exception), cnt=0x1 RTC: 2023-03-08 10:15:38 ERROR: Excepton received on 0x81000000, spsr_el3:620003c5,reason:2 esr_el3:0x80000011 ue_cnt:0x0 Exception Class = 17: SMC instruction execution in AArch64 state, when SMC is not disabled. Print cpu register: |-elr_el3: ff01a430 |-far_el3: 7abce97e92b57fe1 |-scr_el3: 4000e3c |-sctlr_el3: 30cd183f |-LR: ff208570 |-SP: ff631d80 |-x0: 00000000c4000061 x1: 0000000000000000 |-x2: 0000000000000000 x3: 0000000000000000 |-x4: 0000000000000000 x5: 0000000000000000 |-x6: 0000000000000000 x7: 0000000000000000 |-x8: 0000000000000000 x9: 0000000000000002 |-x10: 0000000000000002 x11: 0000000000000000 |-x12: 0000000000000002 x13: 0000000000000002 |-x14: 0000000000000001 x15: 00000000000000ff |-x16: 00000000ffa97650 x17: 00000000000000f8 |-x18: 0000000000000000 x19: 0000000000000001 |-x20: 0000000000000000 x21: 00000000ff20c240 |-x22: 00000000ff20c25a x23: 8000000000000009 |-x24: 0000000076726473 x25: 00000000ff20d1e0 |-x26: 00000000ffbdcc10 x27: 00000000ff631ef8 |-x28: 00000000ffbff328 x29: 00000000ff631d80 INFO: mpidr:81000000, stop s-wtd. Have report fatal Exception Class = 17: SMC instruction execution in AArch64 state, when SMC is not disabled. Print cpu register: --------------------------------------------------------------------------------------
Regards, Ming