Ming, After some discussion with James and other team members we have determined that a selective masking of the interrupts from the GIC might be the best solution. I have one more team member to run this by to make sure this is the right approach. If this is the case I will talk with Doug and Joanna about how we can provide a patch probably after the holidays. Let us know your view if you have any major concerns or questions. Thanks for your patience!!...
Mark
-----Original Message----- From: Ming Huang huangming@linux.alibaba.com Sent: Tuesday, December 7, 2021 9:17 PM To: James Morse James.Morse@arm.com; tf-a@lists.trustedfirmware.org; zhangliguang@linux.alibaba.com; Mark Dykes Mark.Dykes@arm.com; John Powell John.Powell@arm.com; Manish Pandey2 Manish.Pandey2@arm.com; Joanna Farley Joanna.Farley@arm.com Subject: Re: [TF-A] About kdump hang issue in SDEI dispatch
Hi TF-A guys,
Any comments about this issue?
Thanks, Ming
On 11/13/21 7:53 AM, Ming Huang via TF-A wrote:
Hi Mark & John & Manish,
Any comment about this? In my mind, these issues was caused by cooperation between TF-A and kernel.
Thanks, Ming
On 11/11/21 12:02 AM, James Morse wrote:
Hello!
On 05/11/2021 10:06, Ming Huang wrote:
When hest acpi table configure Hardware Error Notification type as Software Delegated Exception(0x0B) for RAS event, kernel RAS interacts with TF-A by SDEI mechanism. On the firmware first system, kernel was notified by TF-A sdei call.
The calling flow like as below when fatal RAS error happens: TF-A notify kernel flow: sdei_dispatch_event() ehf_activate_priority() call sdei callback // callback registered by kerenl ehf_deactivate_priority()
You've got an IRQ taken to EL3 somewhere in here, and EL3 then uses the GIC PMR to mask interrupts for secure world, which implicitly masks them for the normal world too.
Kernel sdei callback:
....
panic()
What matters here is the kernel panic()d. It can do this for any number of reasons, it may run out of stack space, execute an illegal instruction etc. The kernel must always be able to panic().
We want to modify the panic case only in sdei_asm_handler flow, not other case like run out of stack space, execute an illagal instruction etc.
The problem is what happens next, the kernel triggers kdump, and kump inherits the state of the system that the EL3 firmware handed to the SDEI event.... where interrupts are masked by the GIC PMR.
If fatal RAS error occured, panic was called in sdei_asm_handle() without ehf_deactivate_priority executed, which lead interrupt masked.
So interrupt should be restored before panic otherwise kdump will hang.
'before kdump' is impossible for EL3 to know.
Linux makes a SDEI_PE_MASK() call from inside the handler as part of smp_send_stop(). If the Kdump kernel supports SDEI, it will eventually run SDEI_PRIVATE_RESET() and SDEI_SHARED_RESET(), but interrupts need to work well before that point.
In the process of sdei, a SDEI_EVENT_COMPLETE(or SDEI_EVENT_COMPLETE_AND_RESUME) call should be called before panic for a completed run of ehf_deactivate_priority(). The ehf_deactivate_priority() function restore pmr_el1 to original value(>0x80).
The SDEI dispatch flow was broken if SDEI_EVENT_COMPLETE was not be called.
The expected flow hasn't occurred, but nothing 'illegal' is happening. Section 6.1.2 lists SDEI_PE_MASK() and the two reset calls as always being available, regardless of the event handler state.
The surprise here is that interrupts are broken by SDEI.
This will bring about two issue: 1 Kdump will hang for firmware reporting fatal RAS event by SDEI; (as explain above) 2 For NMI scene,TF-A enable a secure timer, the PPI 29 will trigger periodically. Kernel register a callback for hard lockup. The below code will not be called when panic in kernel callback:
... because it depends on an interrupt being taken by EL3, and interrupts are still masked by PMR.
How to fix above issues? I think the root cause is that kernel broken the SDEI dispatch flow, so kernel should modify to fix these issues.
The surprise here is that interrupts are broken by SDEI. The OS always needs to be able to panic(), which might trigger kdump. Kdump needs interrupts to work before it probes SDEI.
The OS may panic() for reasons other than your GHES RAS example.
It is not possible for the OS to 'break out' of the SDEI event as the panic() call occurs deep on the stack, and completing the event allows another event to be delivered, which will overwrite the state that cause the kernel to call panic() in the first place.
The only scope for paper-ing over this problem in the OS is to also make the reset calls in smp_send_stop(). I suspect this won't solve the problem without EL3 changes too.
This needs tackling from the PMR angle.
I'm not familiar with the TFA code that uses PMR to mask all interrupts when an SDEI event is delivered. Could it stop doing that? Is it upstream, or out of tree? (git grep doesn't find any references to pmr under services/std_svc/sdei.)
We use the upstrean TF-A source code. PMR is modified in ehf_activate_priority();
As it is, this design allows the normal-world to prevent secure-world interrupts from being delivered by holding a CPU in the event handler.
Presumably the use of PMR is intended to ensure 'the' secure-world interrupt that caused the event to be delivered is not taken a second time. (EL3 can't mask secure-world interrupts using PSTATE when it enters the event handler as the event handler is a lower EL.)
Instead of masking all interrupts, could it disable 'the' secure-world interrupt that caused the event? The GIC has banks of I{S,C}ENABLER<n> registers for enabling and disabling interrupts. The interrupt can stay disabled until the event is completed, unregistered or a reset call unregisters it. This is better as secure-world interrupts are still delivered and handled by secure-world. The OS may take a long time to handle an SDEI event, starving a secure-world watchdog and causing a reset once the normal-world finishes.
Even better would be to leave the interrupt enabled, and handle the interrupt without delivering another SDEI event. You probably do want 'the' interrupt to be delivered while the event handler is running, this may be the only way you can detect an error that occurs while the error-handler is running.
Thanks,
James