Hello everyone,
There is a type of errata on a few CPUs where if they initiate a power down request which gets denied then attempting to power down again can fail in a deadlock. In essence, after the PE's power down sequence which ends on a WFI, but before actual power down, there exists a small window where an external event can interrupt the power down and cause the PE to continue after the WFI. Attempting to power down again after that can result in a deadlock. Affected CPUs are the Neoverse N2, Makalu ELP (Cortex X3), and Cortex A710.
The SDEN [1] suggests to set the chicken bit CPUACTLR2_EL1[36] before the power down sequence and to clear it after coming out of the WFI (on anything other than RESET). The mitigations [2] set the bit in the `core_pwr_dwn` of each CPU but never clear it. This is because in the generic TF-A code path the WFI ends up being called in an infinite loop with the only way to come out of it being RESET. Most platforms with custom `pwr_domain_pwr_down_wfi` end up in the same loop or unrelated hardware reset mechanisms that avoid the errata. However, a few platforms could continue running as normal without going through a hardware reset which would require special treatment.
The four problematic platforms are: * amlogic gxl and g12a: they fake a reset by manually calling the reset entrypoint on their primary CPUs only. This will leave the chicken bit set after reset. * socionext uniphier: same as amlogic but on all CPUs. * nxp (common code): I hope I understand what the platform is trying to do but there are 2 paths that raise an eyebrow: `_psci_sys_pwrdn_wfi` which has a single non-looped wfi (which could return as above) and `_psci_cpu_off_wfi` which seems to accept waking up as normal behaviour. The former path is a simple fix but the latter is the same case as amlogic and socionext. Due to its complexity I have not proposed any modification on either path.
Finally, nvidia tegra and renesas (common code) have acceptable behaviour as far as the errata are concerned, however, they end up in the wfi loop only after a panic sequence. Although not problematic, this stands out.
For all six platforms above there are a few options on how to proceed, the preferred one being to bring them in line with what everyone else does. Alternatively, ignoring the errata would be ok if these platforms never intend to use these CPUs. It must be noted, however, that it appears to be a family of errata, and these may not be all CPUs affected.
[1]: the wording is identical for all 3 cores. For Neoverse N2: https://developer.arm.com/documentation/SDEN1982442/latest/ [2]: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/17157/1
Regards, Boyan