Alignment fault checking in EL3 - TF-A

7 Nov 2020


      Hi,
I just debugged a TF-A boot crash that turned out to be caused by an
alignment fault in platform code. Someone had defined some static
storage space as a uint8_t array, and then accessed it by
dereferencing uint16_t pointers.
Of course this is ultimately a bug in the platform code that should be
fixed, but I am still wondering why we choose to set the SCTLR_EL3.A
(Alignment fault checking) flag in TF-A? In an ideal world, maybe we
could say that code which can generate alignment faults should not
exist -- but, unfortunately, people make mistakes, and this kind of
mistake may linger unnoticed for a long time in the codebase before
randomly getting triggered due to subtle shifts in the binary's memory
layout. (Worse, in some situations this could get affected by SMC
parameters passed in from lower exception levels, so it would only be
noticeable and could possibly be intentionally triggered if the lower
exception level passes in just the right values.)
For that reason, most other environments I know (e.g. Linux) always
keep that flag cleared. There's no harm in that -- as far as I'm aware
all aarch64 cores are required to support unaligned accesses to cached
memory types, and the worst that would happen is a slight performance
penalty for the access. I think that flag is mostly meant as a
debugging feature to be able to shake out accidental unaligned
accesses from your code? If our goal is to be stable and reliable
firmware, shouldn't we disable it to reduce the chance of unexpected
crashes?