Hi Pali, Thanks for the clarification. IMO, emulation of SError by EL3 is complicated and has numerous pitfalls that are difficult to verify correctness for all cases. SDEI would be the recommended approach for notifying lower EL about an Error.
Best Regards Soby Mathew
-----Original Message----- From: Pali Rohár pali@kernel.org Sent: Tuesday, May 25, 2021 11:09 AM To: James Morse James.Morse@arm.com Cc: Achin Gupta Achin.Gupta@arm.com; Soby Mathew Soby.Mathew@arm.com; kabel@kernel.org; tf-a@lists.trustedfirmware.org Subject: Re: Rethrow SError from EL3 to kernel on arm64
Hello!
Platform is not ACPI based. PCIe core in some cases sends External Aborts to kernel which needs to be masked/ignored. I have not found a way how to reconfigure PCIE core to not send these aborts.
In mentioned review is a link to kernel list where was discussion about custom kernel handlers to ignore some of EA. But this approach was rejected with information that TF-A should handle these aborts and ignores those which should not be propagated back to kernel.
If I clear SCR_EL3.EA then aborts (including those which should be ignored) are sent to kernel and kernel makes them fatal. So this is not a solution.
If I do not clear SCR_EL3.EA then in TF-A board/platform code I can implement check for aborts which needs to be ignored. But remaining aborts are not delivered to kernel and TF-A makes them fatal. Which is not correct too.
So, what I need, is to route all External Aborts to TF-A, implement logic which ignores specific PCIE aborts and all remaining aborts needs to be propagated back to kernel like if SCR_EL3.EA is clear.
So it means to implement some logic of abort injection.
On Tuesday 25 May 2021 11:00:09 James Morse wrote:
Hi Guys,
Does this platform need external-aborts to be routed to EL3? If not, you can clear SCR_EL3.EA and be done with it. This allows the EL2 OS/Hypervisor to take control of the routing of these exceptions. (which sounds like what you want)
Otherwise: As Soby describes, the choices are SDEI or emulate the exception according to the arm-arm psuedocode as if EL3 weren't implemented. This is best avoided as its difficult to get right: you have to create a new PSTATE for the target exception level, and read the routing controls to work out which exception level that is.
As Achin says, emulating the exception isn't always possible as Asynchronous exceptions can be masked. The hardware does this
automatically when it takes an exception (e.g. irq).
(Linux unmask it again once its read the CPU state).
This can leave you holding what may be an imprecise-asynchronous-abort in EL3, unable to emulate the exception or proceed without causing any RAS
error to become uncontained.
If you can't inject the emulated exception, the error still has to be handled at EL3. If this is an ACPI system you can do a soft restart of the normal-world and present the error via ACPI's BERT (boot error record table) which describes an error that happened in a previous life.
If your platform is ACPI firmware-first, using SDEI will make life easier. You still need to handle the 'SDEI masked' case, but it is a lot less likely to happen. Linux only does this over power-management
events that (may) disable the MMU.
(EL2 doesn't have any of these problems as errors are almost always contained by stage2, and it has hardware features for injecting asynchronous exceptions, which cope with the masking and deferring)
Thanks,
James
On 25/05/2021 10:08, Achin Gupta wrote:
Hi,
The last time I checked injecting an SError from a higher to lower EL is a bad idea since the latter could be running with SErrors masked.
EL3 could check this before injecting but then there is no consistent contract with the lower EL about reporting of these errors. SDEI does not suffer from the same problem.
+James who knows more from the OS/Hypervisor perspective.
cheers, Achin
*From:* TF-A tf-a-bounces@lists.trustedfirmware.org on behalf of Soby Mathew via TF-A tf-a@lists.trustedfirmware.org *Sent:* 25 May 2021 09:59 *To:* Pali Rohár pali@kernel.org *Cc:* kabel@kernel.org kabel@kernel.org; tf-a@lists.trustedfirmware.org tf-a@lists.trustedfirmware.org *Subject:* Re: [TF-A] Rethrow SError from EL3 to kernel on arm64 [+tf-a list] Hi Pali, There are 2 philosophies for handing SError in the system, kernel first and firmware first. Assuming you want to stick with firmware first handling (i.e scr_el3.ea is set to 1), then as you mentioned, there are 2 ways to notify the kernel for delegating the error handling: SDEI and SError injection back to kernel. Upstream TF-A only supports SDEI at the moment.
For SError injection back to lower EL, you have to setup the hardware state via software at higher EL in such a way that it appears that the fault was taken to the exception vector at the lower exception level. The pseudocode function AArch64.TakeException() in ARM ARM shows the behavior when the PE takes an exception to an Exception level using AArch64 in Non-debug state. This behaviour has to replicated and it involves the higher EL setting up the PSTATE registers correctly and values in other registers for the lower EL (spsr, elr and fault syndrome registers) and jumping to the right offset point to by the vbar_elx of the lower EL. To the lower EL is appears as a SError has triggered at its
exception vector and it can proceed with the fault handling.
Best Regards Soby Mathew
-----Original Message----- From: Pali Rohár pali@kernel.org Sent: Monday, May 24, 2021 6:07 PM To: Soby Mathew Soby.Mathew@arm.com Subject: Rethrow SError from EL3 to kernel on arm64
Hello Soby!
I have found following discussion in Armada 3720 PCIe SError issue: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-
a/+/1541/comment/ca882427_d142bde2/
TF-A on Armada 3720 redirects all SErrors to EL3 and panic in TF-A
handler.
You wrote in that discussion:
Ideally you need to signal the SError back to kernel from EL3 using SDEI or inject the SError to the lower EL and the kernel can decide to die or not.
And I would like to ask you, could you help me with implementation of this SError rethrow functionality? Because I have absolutely no idea how to do it and catching all SErrors in EL3 is causing issues because some of them can be handled and recovered by kernel.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a https://lists.trustedfirmware.org/mailman/listinfo/tf-a