[AMD Official Use Only - AMD Internal Distribution Only]
Hello Maintainers,
We are observing a reproducible runtime regression on the ZynqMP (Cortex-A53) platform after enabling LTO (ENABLED_LTO=1) and merging the changes from the topic NUMA_AWARE_PER_CPU into our integration branch (https://github.com/ARM-software/arm-trusted-firmware/commit/7303319b3823e9e3...).
Summary of the issue
1. Baseline behavior * Platform: ZynqMP (Cortex-A53) * Configuration: ENABLED_LTO=1 * Without NUMA_AWARE_PER_CPU: Linux boots and runs stably * After merging NUMA_AWARE_PER_CPU, Linux boots but hangs during runtime * During the hang, CPUs are observed to unexpectedly re-enter EL3 * Re-entry into EL3 should not occur during normal Linux runtime execution and strongly suggests corruption or mismanagement of PSCI and/or per-CPU state(arm-trusted-firmware/lib/per_cpu/aarch64/per_cpu_asm.S at master * ARM-software/arm-trusted-firmwarehttps://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/per_cpu/aarch64/per_cpu_asm.S#L28) * Reverting the NUMA_AWARE_PER_CPU changes restores stable Linux execution * The issue is reproducible only when NUMA_AWARE_PER_CPU is present * This clearly identifies NUMA_AWARE_PER_CPU as the regression source 2. Suspect with LTO * With NUMA_AWARE_PER_CPU enabled, LTO breaks the per-CPU base calculation * BL31 contains hand-written assembly that relies on linker-script symbols (e.g., per-CPU section boundaries) * Under LTO, symbol placement and retention are no longer guaranteed in the same way, leading to incorrect per-CPU base computation * This results in corrupted per-CPU data and subsequent erroneous PSCI suspend behavior (EL3 re-entry) 3. CPU idle dependency * The following kernel configuration options are enabled: * CONFIG_CPU_IDLE=y * CONFIG_CPU_IDLE_MULTIPLE_DRIVERS=y * CONFIG_CPU_IDLE_GOV_MENU=y * CONFIG_DT_IDLE_STATES=y * This further suggests the issue is triggered during CPU idle / suspend-resume paths, where correct per-CPU state handling is critical
Based on the above:
* This is specific to NUMA_AWARE_PER_CPU combined with LTO * The failure mode points to per-CPU base calculation and PSCI state corruption * Reverting NUMA_AWARE_PER_CPU fully restores stability on ZynqMP
We wanted to report this issue upstream and seek guidance on:
* Whether NUMA_AWARE_PER_CPU is expected to be LTO-safe on platforms relying on linker-defined per-CPU sections * Or if additional constraints / fixes are required for platforms like ZynqMP
We are happy to provide further logs, configuration details, or help to fixes.
Regards, Prasad Kummari