Re: [TF-A] About kdump hang issue in SDEI dispatch

17 Dec 2021

      Hi Ming,
External-Abort is different because the CPU does this when it can't progress.
External-Abort can't be masked - and you wouldn't really want to! If the OS takes a RAS
error when running the RAS handler there is nothing you can do: it will never be able to
handle these errors. In this case, instead of trying to continue, firmware should log both
errors in some persistent memory, and reboot. Ideally both errors would be handed to the
OS on the next boot via ACPI's BERT table.
Kdump extends this window to the point the kdump kernel's SDEI driver calls reset, but the
kdump kernel should not be triggering external-abort, if it is, its very likely the system
is beyond recovery.
Firmware should already have a way of dealing with external-abort that occur when the SDEI
events are not registered or masked.
Thanks,
James
On 17/12/2021 06:42, Ming Huang wrote:
...
Hi Mark,
This is good news for these issues. I am not sure the solution (a selective masking of
the interrupts from the GIC) can fix the another similar issue:
A EA(external abort) tigger enter TF-A, TF-A report fatal via SDEI to kernel to kernal panic,
and then kdump without SDEI_EVENT_COMPLETE(or SDEI_EVENT_COMPLETE_AND_RESUME) being called.
Below code is not run:
bl31/aarch64/ea_delegate.S ea_proceed():
....
   /* Restore EL3 state and ESR */
   ldp	x1, x2, [sp, #CTX_EL3STATE_OFFSET + CTX_SPSR_EL3]
   msr	spsr_el3, x1
   msr	elr_el3, x2
....
Another exception may tigger enter TF-A while kdump and TF-A will hang here
ea_proceed():
   /*
    * If the ESR loaded earlier is not zero, we were processing an EA
    * already, and this is a double fault.
    */
   ldr	x5, [sp, #CTX_EL3STATE_OFFSET + CTX_ESR_EL3]
   cbz	x5, 1f
   no_ret	plat_handle_double_fault
Thanks,
Ming
在 12/16/21 11:53 PM, Mark Dykes 写道:
...
Ming,
   After some discussion with James and other team members we have determined that a selective masking of the interrupts from the GIC might be the best solution.  I have one more team member to run this by to make sure this is the right approach.  If this is the case I will talk with Doug and Joanna about how we can provide a patch probably after the holidays.  Let us know your view if you have any major concerns or questions.  Thanks for your patience!!...
	Mark

-----Original Message-----
From: Ming Huang huangming@linux.alibaba.com 
Sent: Tuesday, December 7, 2021 9:17 PM
To: James Morse James.Morse@arm.com; tf-a@lists.trustedfirmware.org; zhangliguang@linux.alibaba.com; Mark Dykes Mark.Dykes@arm.com; John Powell John.Powell@arm.com; Manish Pandey2 Manish.Pandey2@arm.com; Joanna Farley Joanna.Farley@arm.com
Subject: Re: [TF-A] About kdump hang issue in SDEI dispatch
Hi TF-A guys,
Any comments about this issue?
Thanks,
Ming
On 11/13/21 7:53 AM, Ming Huang via TF-A wrote:
...
Hi Mark & John & Manish,
Any comment about this?
In my mind, these issues was caused by cooperation between TF-A and kernel.
Thanks,
Ming
On 11/11/21 12:02 AM, James Morse wrote:
...
Hello!
On 05/11/2021 10:06, Ming Huang wrote:
...
When hest acpi table configure Hardware Error Notification type as 
Software Delegated Exception(0x0B) for RAS event, kernel RAS 
interacts with TF-A by SDEI mechanism. On the firmware first system, 
kernel was notified by TF-A sdei call.
The calling flow like as below when fatal RAS error happens:
TF-A notify kernel flow:
  sdei_dispatch_event()
    ehf_activate_priority()
      call sdei callback  // callback registered by kerenl
    ehf_deactivate_priority()
You've got an IRQ taken to EL3 somewhere in here, and EL3 then uses 
the GIC PMR to mask interrupts for secure world, which implicitly masks them for the normal world too.
...
Kernel sdei callback:
....
...
            panic()

What matters here is the kernel panic()d. It can do this for any 
number of reasons, it may run out of stack space, execute an illegal 
instruction etc. The kernel must always be able to panic().
We want to modify the panic case only in sdei_asm_handler flow, not 
other case like run out of stack space, execute an illagal instruction etc.
...
The problem is what happens next, the kernel triggers kdump, and kump 
inherits the state of the system that the EL3 firmware handed to the 
SDEI event.... where interrupts are masked by the GIC PMR.
...
If fatal RAS error occured, panic was called in sdei_asm_handle() 
without ehf_deactivate_priority executed, which lead interrupt masked.
...
So interrupt should be restored before panic otherwise kdump will hang.
'before kdump' is impossible for EL3 to know.
Linux makes a SDEI_PE_MASK() call from inside the handler as part of smp_send_stop().
If the Kdump kernel supports SDEI, it will eventually run 
SDEI_PRIVATE_RESET() and SDEI_SHARED_RESET(), but interrupts need to work well before that point.
...
In the process of sdei, a SDEI_EVENT_COMPLETE(or 
SDEI_EVENT_COMPLETE_AND_RESUME) call should be called before panic for a completed run of ehf_deactivate_priority().
The ehf_deactivate_priority() function restore pmr_el1 to original value(>0x80).
...
The SDEI dispatch flow was broken if SDEI_EVENT_COMPLETE was not be called.
The expected flow hasn't occurred, but nothing 'illegal' is 
happening. Section 6.1.2 lists
SDEI_PE_MASK() and the two reset calls as always being available, 
regardless of the event handler state.
The surprise here is that interrupts are broken by SDEI.
...
This will bring about two issue:
1 Kdump will hang for firmware reporting fatal RAS event by SDEI;
  (as explain above)
2 For NMI scene，TF-A enable a secure timer, the PPI 29 will trigger periodically.
  Kernel register a callback for hard lockup. The below code will not be
  called when panic in kernel callback:
... because it depends on an interrupt being taken by EL3, and 
interrupts are still masked by PMR.
...
How to fix above issues?
I think the root cause is that kernel broken the SDEI dispatch flow, 
so kernel should modify to fix these issues.
The surprise here is that interrupts are broken by SDEI. The OS 
always needs to be able to panic(), which might trigger kdump. Kdump needs interrupts to work before it probes SDEI.
The OS may panic() for reasons other than your GHES RAS example.
It is not possible for the OS to 'break out' of the SDEI event as the 
panic() call occurs deep on the stack, and completing the event 
allows another event to be delivered, which will overwrite the state that cause the kernel to call panic() in the first place.
The only scope for paper-ing over this problem in the OS is to also 
make the reset calls in smp_send_stop(). I suspect this won't solve the problem without EL3 changes too.
This needs tackling from the PMR angle.
I'm not familiar with the TFA code that uses PMR to mask all 
interrupts when an SDEI event is delivered. Could it stop doing that?
Is it upstream, or out of tree? (git grep doesn't find any references 
to pmr under
services/std_svc/sdei.)
We use the upstrean TF-A source code. PMR is modified in 
ehf_activate_priority();
...
As it is, this design allows the normal-world to prevent secure-world 
interrupts from being delivered by holding a CPU in the event handler.
Presumably the use of PMR is intended to ensure 'the' secure-world 
interrupt that caused the event to be delivered is not taken a second 
time. (EL3 can't mask secure-world interrupts using PSTATE when it 
enters the event handler as the event handler is a lower EL.)
Instead of masking all interrupts, could it disable 'the' 
secure-world interrupt that caused the event? The GIC has banks of 
I{S,C}ENABLER<n> registers for enabling and disabling interrupts. The 
interrupt can stay disabled until the event is completed, unregistered or a reset call unregisters it.
This is better as secure-world interrupts are still delivered and handled by secure-world.
The OS may take a long time to handle an SDEI event, starving a 
secure-world watchdog and causing a reset once the normal-world finishes.
Even better would be to leave the interrupt enabled, and handle the 
interrupt without delivering another SDEI event. You probably do want 
'the' interrupt to be delivered while the event handler is running, 
this may be the only way you can detect an error that occurs while the error-handler is running.
Thanks,
James

2025

2024

2023

2022

2021

2020

2019

2018

Re: [TF-A] About kdump hang issue in SDEI dispatch