On 6/2/21 10:42 PM, James Morse wrote:
The kernel version: 5.8.0 is not a kernel.org stable kernel.
Works for me: | Hardware name: Foundation-v8A (DT) | Call trace: | dump_backtrace+0x0/0x1b0 | show_stack+0x18/0x24 | dump_stack+0xc0/0x11c | ghes_in_nmi_queue_one_entry+0x138/0x2f0 | ghes_sdei_normal_callback+0x30/0x6c | sdei_event_handler+0x60/0x1d4 | __sdei_handler+0xc4/0x220 | __sdei_asm_handler+0xbc/0x168 | arch_cpu_idle+0xc/0x20 | cpu_startup_entry+0x24/0x80 | secondary_start_kernel+0x138/0x184 | root@localhost:/sys/kernel/debug# uname -r | 5.8.0-00007-g0db1a507bbda-dirty
How can you print the call trace rightly? In my test, I can't get call trace rightly with reporting CPER + SDEI.
Linux version 5.10.0 [root@m1rootfs /]$devmem 0 ERROR: Excepton received on 0x81000000, spsr_el3:60001000,reason:0 esr_el3:0xbe000011 [ 129.772039] sdei: sdei_event_handler this handler event: 806 [ 129.772519] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 3 [ 129.772768] {2}[Hardware Error]: event severity: fatal [ 129.772962] {2}[Hardware Error]: Error 0, type: fatal [ 129.773154] {2}[Hardware Error]: section_type: general processor error [ 129.773395] {2}[Hardware Error]: processor_type: 2, ARM [ 129.773503] {2}[Hardware Error]: processor_isa: 4, ARM A64 [ 129.773605] {2}[Hardware Error]: error_type: 0x00 [ 129.773725] {2}[Hardware Error]: operation: 0, unknown or generic [ 129.773833] {2}[Hardware Error]: processor_id: 0x0000000000000000 [ 129.773943] Kernel panic - not syncing: Fatal hardware error! [ 129.774060] CPU: 0 PID: 201 Comm: devmem Not tainted 5.10.0+ #25 [ 129.774195] Hardware name: Default Default/Default, BIOS 1.2.M1.AL.E.050.00 06/02/2021 [ 129.774273] Call trace: [ 129.774378] dump_backtrace+0x0/0x1f0 [ 129.774459] show_stack+0x24/0x70 [ 129.774543] dump_stack+0xbc/0x114 [ 129.774621] panic+0x158/0x364 [ 129.774723] __raw_spin_lock_irqsave.constprop.0+0x0/0xa0 [ 129.774820] ghes_in_nmi_queue_one_entry+0x204/0x2fc [ 129.774917] ghes_sdei_normal_callback+0x58/0xc0 [ 129.775005] sdei_event_handler+0x50/0xe8 [ 129.775090] _sdei_handler+0x8c/0x160 [ 129.775167] __sdei_handler+0x28/0x50 [ 129.775265] __sdei_asm_handler+0xbc/0x174 [ 129.779672] SMP: stopping secondary CPUs [ 129.779766] Kernel Offset: disabled
Best Regards, Ming Huang
In my mind, ELR_EL3 and x29 should report to kernel via APEI/CPER for kernel to print calltrace at error position.
That is not possible.
The kernel can only start a stacktrace from the current location. If your event is synchronous, the kernel should be able to chain one stack onto another, as it does above. If your event is asynchronous, the stack trace is meaningless.
But there is no suited table in APEI/CPER to report this.
See the UEFI spec (where CPER is defined): Table N-26 ARMv8 AArch64 GPRs (Type 4). But, all you can hope to get for populating all this is the kernel print it out. The kernel isn't going to do anything with it.
Thanks,
James