Hello Maintainers,
We are observing a reproducible runtime regression on the ZynqMP (Cortex-A53) platform after enabling LTO (ENABLED_LTO=1) and merging the changes from the topic NUMA_AWARE_PER_CPU into our integration branch (https://github.com/ARM-software/arm-trusted-firmware/commit/7303319b3823e9e33748d963e9173f3678aba4da).
Summary of the issue
- Baseline behavior
- Platform: ZynqMP (Cortex-A53)
- Configuration: ENABLED_LTO=1
- Without NUMA_AWARE_PER_CPU: Linux boots and runs stably
- After merging NUMA_AWARE_PER_CPU, Linux boots but hangs during runtime
- During the hang, CPUs are observed to unexpectedly re-enter EL3
- Re-entry into EL3 should not occur during normal Linux runtime execution and strongly suggests corruption or mismanagement of PSCI and/or per-CPU state(arm-trusted-firmware/lib/per_cpu/aarch64/per_cpu_asm.S
at master · ARM-software/arm-trusted-firmware)
- Reverting the NUMA_AWARE_PER_CPU changes restores stable Linux execution
- The issue is reproducible only when NUMA_AWARE_PER_CPU is present
- This clearly identifies NUMA_AWARE_PER_CPU as the regression source
- Suspect with LTO
- With NUMA_AWARE_PER_CPU enabled, LTO breaks the per-CPU base calculation
- BL31 contains hand-written assembly that relies on linker-script symbols (e.g., per-CPU section boundaries)
- Under LTO, symbol placement and retention are no longer guaranteed in the same way, leading to incorrect per-CPU base computation
- This results in corrupted per-CPU data and subsequent erroneous PSCI suspend behavior (EL3 re-entry)
- CPU idle dependency
- The following kernel configuration options are enabled:
- CONFIG_CPU_IDLE=y
- CONFIG_CPU_IDLE_MULTIPLE_DRIVERS=y
- CONFIG_CPU_IDLE_GOV_MENU=y
- CONFIG_DT_IDLE_STATES=y
- This further suggests the issue is triggered during CPU idle / suspend-resume paths, where correct per-CPU state handling is critical
Based on the above:
- This is specific to NUMA_AWARE_PER_CPU combined with LTO
- The failure mode points to per-CPU base calculation and PSCI state corruption
- Reverting NUMA_AWARE_PER_CPU fully restores stability on ZynqMP
We wanted to report this issue upstream and seek guidance on:
- Whether NUMA_AWARE_PER_CPU is expected to be LTO-safe on platforms relying on linker-defined per-CPU sections
- Or if additional constraints / fixes are required for platforms like ZynqMP
We are happy to provide further logs, configuration details, or help to fixes.
Regards,
Prasad Kummari