Hi Guys,
Does this platform need external-aborts to be routed to EL3? If not, you can clear SCR_EL3.EA and be done with it. This allows the EL2 OS/Hypervisor to take control of the routing of these exceptions. (which sounds like what you want)
Otherwise: As Soby describes, the choices are SDEI or emulate the exception according to the arm-arm psuedocode as if EL3 weren't implemented. This is best avoided as its difficult to get right: you have to create a new PSTATE for the target exception level, and read the routing controls to work out which exception level that is.
As Achin says, emulating the exception isn't always possible as Asynchronous exceptions can be masked. The hardware does this automatically when it takes an exception (e.g. irq). (Linux unmask it again once its read the CPU state).
This can leave you holding what may be an imprecise-asynchronous-abort in EL3, unable to emulate the exception or proceed without causing any RAS error to become uncontained. If you can't inject the emulated exception, the error still has to be handled at EL3. If this is an ACPI system you can do a soft restart of the normal-world and present the error via ACPI's BERT (boot error record table) which describes an error that happened in a previous life.
If your platform is ACPI firmware-first, using SDEI will make life easier. You still need to handle the 'SDEI masked' case, but it is a lot less likely to happen. Linux only does this over power-management events that (may) disable the MMU.
(EL2 doesn't have any of these problems as errors are almost always contained by stage2, and it has hardware features for injecting asynchronous exceptions, which cope with the masking and deferring)
Thanks,
James
On 25/05/2021 10:08, Achin Gupta wrote:
Hi,
The last time I checked injecting an SError from a higher to lower EL is a bad idea since the latter could be running with SErrors masked.
EL3 could check this before injecting but then there is no consistent contract with the lower EL about reporting of these errors. SDEI does not suffer from the same problem.
+James who knows more from the OS/Hypervisor perspective.
cheers, Achin
*From:* TF-A tf-a-bounces@lists.trustedfirmware.org on behalf of Soby Mathew via TF-A tf-a@lists.trustedfirmware.org *Sent:* 25 May 2021 09:59 *To:* Pali Rohár pali@kernel.org *Cc:* kabel@kernel.org kabel@kernel.org; tf-a@lists.trustedfirmware.org tf-a@lists.trustedfirmware.org *Subject:* Re: [TF-A] Rethrow SError from EL3 to kernel on arm64 [+tf-a list] Hi Pali, There are 2 philosophies for handing SError in the system, kernel first and firmware first. Assuming you want to stick with firmware first handling (i.e scr_el3.ea is set to 1), then as you mentioned, there are 2 ways to notify the kernel for delegating the error handling: SDEI and SError injection back to kernel. Upstream TF-A only supports SDEI at the moment.
For SError injection back to lower EL, you have to setup the hardware state via software at higher EL in such a way that it appears that the fault was taken to the exception vector at the lower exception level. The pseudocode function AArch64.TakeException() in ARM ARM shows the behavior when the PE takes an exception to an Exception level using AArch64 in Non-debug state. This behaviour has to replicated and it involves the higher EL setting up the PSTATE registers correctly and values in other registers for the lower EL (spsr, elr and fault syndrome registers) and jumping to the right offset point to by the vbar_elx of the lower EL. To the lower EL is appears as a SError has triggered at its exception vector and it can proceed with the fault handling.
Best Regards Soby Mathew
-----Original Message----- From: Pali Rohár pali@kernel.org Sent: Monday, May 24, 2021 6:07 PM To: Soby Mathew Soby.Mathew@arm.com Subject: Rethrow SError from EL3 to kernel on arm64
Hello Soby!
I have found following discussion in Armada 3720 PCIe SError issue: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-
a/+/1541/comment/ca882427_d142bde2/
TF-A on Armada 3720 redirects all SErrors to EL3 and panic in TF-A handler. You wrote in that discussion:
Ideally you need to signal the SError back to kernel from EL3 using SDEI or inject the SError to the lower EL and the kernel can decide to die or not.
And I would like to ask you, could you help me with implementation of this SError rethrow functionality? Because I have absolutely no idea how to do it and catching all SErrors in EL3 is causing issues because some of them can be handled and recovered by kernel.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a https://lists.trustedfirmware.org/mailman/listinfo/tf-a