Hello,
On 28/05/2021 13:10, Ming Huang wrote:
I did the test report SDEI to kernel with fatal severity in APEI / CPER while EL3 received SEA(SCR_EL3.EA = 1). Kernel will panic and print calltrace, but this calltrace was not the position where error occured(another word where throw SEA), instead calltrace in ghes.c.
This used to work in the cases where it was possible, but the stack tracing stuff has been changed a little over the recent months.
You didn't mention a kernel version.
How can SDEI solution let kernel print calltrace at right position?
The error was fatal, if the physical address was memory then its just chance that process-A was using it not process-B.
That said, the SDEI entry code in the kernel does try to set the records up to allow the stack tracer to walk onto another stack - but this isn't always possible: The stack tracer needs the frame records to be present and correct, if you took a exception 'x29' needs to be the current frame pointer, but this is only guaranteed at function boundaries. If you took an exception during the functions prologue or epilogue, the values seen by the stack tracer will be inconsistent.
For arm64, linux can't provide a 'reliable' stack trace over an exception, but it does provide a best effort.
For issue analysis, the right position calltrace is very useful. For ACPI firmware-first, we set SCR_EL3.EA = 1, although the solution rethrow EA back to kernel will suffer from some problems, but this solution can let kernel print calltrace at right position.
If you see a complete stacktrace for synchronous external abort delivered directly to the kernel, but not via EL3 and back into SDEI, its likely a problem the kernel has stepping between stacks. SDEI always has to do this, external aborts never do this.
Which kernel version do you see this? (A report to linux-arm-kernel@lists.infradead.org would help too)
Thanks,
James