Hi Julius,
Talking to the team here its has always been felt it is best practise to forbid unaligned data accesses in Arm's embedded projects. That's the reason TF-A also enforces the build options: -mno-unaligned-access (AArch32) and -mstrict-align (AArch64). In the TF-A documentation [1] SCTLR_EL3.A and other settings are listed in the Architectural Initialisation section.
There is thought to be only a few cases where unaligned access are correct and intentional and most are defects that should be caught early and not in production and as such it is better to detect unaligned data access conditions early in a platform porting, rather than in the field especially if there is no firmware update capability. Alignment faults should be treated as fatal as they should never happen in production when components have been designed as such from the beginning. Maintaining these settings in production is also advised as best practice as it is known any such defects can possibly play a part in allowing an actor to take advantage of the defect as part of a vulnerability and memory related defects in particular can be taken advantage of. So it seems prudent to guard against them.
We can of course provide better rationalisation based on the above in our documentation to provide platforms porting efforts better guidance. It is after all up to partners in their platform ports to make appropriate decisions for their ports where settings could be changed however the upstreamed reference code should follow the settings we have which are felt to be best practices for TF-A.
It is known other projects follow other practices and that is fine if that works for them. Once a project enables unaligned accesses, it's very hard to go back again since it's likely that code that does unaligned accesses will slowly get added to the project. However for TF-A maintaining the current settings is felt to be the best approach when weighing up the costs and benefits.
Saying all this for those few cases where unaligned accesses are correct and intentional we could look to define better ways to handle these and provide that in reference code as an option to try and get the best for robustness and debuggability which seemed a big part of the concern in your post. Team members in Arm have ideas on how that could be provided but it needs broader discussion on the implications for security and performance before taking forward.
Thanks
Joanna
[1] https://trustedfirmware-a.readthedocs.io/en/latest/design/firmware-design.ht...
On 09/11/2020, 17:47, "TF-A on behalf of Raghu Krishnamurthy via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi Julius,
I tend to agree with your argument about not using SCTLR.A bit but I think the unexpected crashes or instability is due buggy code and insufficient validation of invariants such as aligned pointers, irrespective of whether SCTRLR.A is set or unset. Even if we allow unaligned accesses, we could have buggy code that access registers that typically have to be size aligned and we wouldn’t catch those bugs with SCTLR.A. Worse, some hardware implementations have undefined/impdef behavior when there are unaligned access to MMIO registers for ex, in which case, I would rather take an alignment fault at the core than allow triggering of undefined behavior.
So I don’t think stability/reliability and the use of SCTLR.A bit are related. If TF-A's position is that we want to allow only aligned accesses in EL3(for whatever reason, I can only think of efficiency), it is the code's responsibility to enforce this invariants using asserts or explicit checks.
>> I am still wondering why we choose to set the SCTLR_EL3.A I think this is the relevant question. If there are good security reasons(which I don’t know about), I would say we should keep it. If it is for efficiency, given the way recent ARM64 cores are performing, I wouldn't have a problem with SCTLR.A=0.
Thanks Raghu
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Julius Werner via TF-A Sent: Friday, November 6, 2020 6:38 PM To: tf-a tf-a@lists.trustedfirmware.org Subject: [TF-A] Alignment fault checking in EL3
Hi,
I just debugged a TF-A boot crash that turned out to be caused by an alignment fault in platform code. Someone had defined some static storage space as a uint8_t array, and then accessed it by dereferencing uint16_t pointers.
Of course this is ultimately a bug in the platform code that should be fixed, but I am still wondering why we choose to set the SCTLR_EL3.A (Alignment fault checking) flag in TF-A? In an ideal world, maybe we could say that code which can generate alignment faults should not exist -- but, unfortunately, people make mistakes, and this kind of mistake may linger unnoticed for a long time in the codebase before randomly getting triggered due to subtle shifts in the binary's memory layout. (Worse, in some situations this could get affected by SMC parameters passed in from lower exception levels, so it would only be noticeable and could possibly be intentionally triggered if the lower exception level passes in just the right values.)
For that reason, most other environments I know (e.g. Linux) always keep that flag cleared. There's no harm in that -- as far as I'm aware all aarch64 cores are required to support unaligned accesses to cached memory types, and the worst that would happen is a slight performance penalty for the access. I think that flag is mostly meant as a debugging feature to be able to shake out accidental unaligned accesses from your code? If our goal is to be stable and reliable firmware, shouldn't we disable it to reduce the chance of unexpected crashes?
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Hi Joanna,
Thank you for your detailed response. I was mostly wondering if this was a deliberate choice or just something someone wrote this way once and nobody ever thought about again. Since it sounds like it is intentional, I'll try to understand your rationale better.
Maintaining these settings in production is also advised as best practice as it is known any such defects can possibly play a part in allowing an actor to take advantage of the defect as part of a vulnerability and memory related defects in particular can be taken advantage of. So it seems prudent to guard against them.
I'm curious, can you elaborate on that? I can't really think of a scenario where lack of alignment checking can really make code behave in an unintentional way. (I don't think Raghu's scenario of accessing MMIO registers works because those registers should be mapped with a Device memory type anyway, and that memory type enforces aligned accesses regardless of the SCTLR.A flag.) If there truly are security concerns with this I can understand your approach much better.
Saying all this for those few cases where unaligned accesses are correct and intentional we could look to define better ways to handle these and provide that in reference code as an option to try and get the best for robustness and debuggability which seemed a big part of the concern in your post. Team members in Arm have ideas on how that could be provided but it needs broader discussion on the implications for security and performance before taking forward.
There are scenarios where unaligned accesses can be intentional, but I am not too worried about those -- there are ways to work around that, and if we make it policy that those accesses always need to be written that way I'm okay with that. What I am worried about is mistakes and oversights that slip through testing and then cause random or even attacker-controllable crashes in production. If the main reason you're enabling this flag is to help with early detection of coding mistakes, I wonder if the best approach would be to enable it for DEBUG=1 and disable it for DEBUG=0 (of course, if there are really security issues associated with it like you mentioned above, that wouldn't make sense).
Hi Julius,
. What I am worried about is mistakes and oversights that slip through testing and then cause random or even attacker-controllable crashes in production
Why would this not be the case even with SCTLR.A bit set? I'm not seeing the relation between the crash and SCTLR.A bit. If there are oversights that slip through testing and an attacker can cause a crash, there is vulnerability/bug. If there is something that slipped through, that causes a data abort(perhaps address not mapped in the translation tables), would we turn off the MMU ?
I'm still curious as to why SCTLR.A=1 would be considered security best practice or provide robustness.
Thanks Raghu
-----Original Message----- From: Julius Werner jwerner@google.com Sent: Monday, November 9, 2020 2:42 PM To: Joanna Farley Joanna.Farley@arm.com Cc: raghu.ncstate@icloud.com; tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Alignment fault checking in EL3
Hi Joanna,
Thank you for your detailed response. I was mostly wondering if this was a deliberate choice or just something someone wrote this way once and nobody ever thought about again. Since it sounds like it is intentional, I'll try to understand your rationale better.
Maintaining these settings in production is also advised as best practice as it is known any such defects can possibly play a part in allowing an actor to take advantage of the defect as part of a vulnerability and memory related defects in particular can be taken advantage of. So it seems prudent to guard against them.
I'm curious, can you elaborate on that? I can't really think of a scenario where lack of alignment checking can really make code behave in an unintentional way. (I don't think Raghu's scenario of accessing MMIO registers works because those registers should be mapped with a Device memory type anyway, and that memory type enforces aligned accesses regardless of the SCTLR.A flag.) If there truly are security concerns with this I can understand your approach much better.
Saying all this for those few cases where unaligned accesses are correct and intentional we could look to define better ways to handle these and provide that in reference code as an option to try and get the best for robustness and debuggability which seemed a big part of the concern in your post. Team members in Arm have ideas on how that could be provided but it needs broader discussion on the implications for security and performance before taking forward.
There are scenarios where unaligned accesses can be intentional, but I am not too worried about those -- there are ways to work around that, and if we make it policy that those accesses always need to be written that way I'm okay with that. What I am worried about is mistakes and oversights that slip through testing and then cause random or even attacker-controllable crashes in production. If the main reason you're enabling this flag is to help with early detection of coding mistakes, I wonder if the best approach would be to enable it for DEBUG=1 and disable it for DEBUG=0 (of course, if there are really security issues associated with it like you mentioned above, that wouldn't make sense).
tf-a@lists.trustedfirmware.org