Hi Julius, The TF-A build sets the `-mstrict-align` compiler option, that means that the compiler will generate only aligned accesses as default. So disabling the check at runtime, seems `unaligned` with the compiler option IMO. The programmer can induce unaligned access though, although I am struggling to understand why the programmer would want to do that. The case you have pointed out seems like an error which needs to be corrected in platform code IMO.
Well, my assumption is that performing an unaligned access cannot be a vulnerability.
TF-A has linker symbol references and non-trivial linker section manipulations at runtime and these have previously triggered alignment errors when binary layout changed due to code issues. Disabling the alignment check at runtime means these errors would have remained undetected and would have resulted in more obscure crash at runtime or even a vulnerablility. In some cases, only the RELEASE builds (DEBUG=0) triggered the alignment fault.
There are scenarios where unaligned accesses can be intentional, but I am not too worried about those -- there are ways to work around that, and if we make it policy that those accesses always need to be written that way I'm okay with that. What I am worried about is mistakes and oversights that slip through testing and then cause random or even attacker-controllable crashes in production. If the main reason you're enabling this flag is to help with early detection of coding mistakes, I wonder if the best approach would be to enable it for DEBUG=1 and disable it for DEBUG=0 (of course, if there are really security issues associated with it like you mentioned above, that wouldn't make sense).
Agree. But I am failing to understand why the unaligned accesses came to be there in the first place. My worry is, if this was not intended by the programmer, this might as well be a bug that will remain undetected if the alignment check is disabled.
Best Regards Soby Mathew
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Julius Werner via TF-A Sent: 10 November 2020 02:31 To: raghu.ncstate@icloud.com Cc: tf-a tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Alignment fault checking in EL3
. What I am worried about is mistakes and oversights that slip through testing and then cause random or even attacker-controllable crashes in production
Why would this not be the case even with SCTLR.A bit set? I'm not seeing
the relation between the crash and SCTLR.A bit. If there are oversights that slip through testing and an attacker can cause a crash, there is vulnerability/bug.
If there is something that slipped through, that causes a data
abort(perhaps address not mapped in the translation tables), would we turn off the MMU ?
Well, my assumption is that performing an unaligned access cannot be a vulnerability. If you have examples to the contrary please let me know, but as far as I am aware letting an unaligned access through should always work and will always result in the behavior the programmer intended (other than maybe contrived cases where device address space is mapped with a non- Device memory type, which is already very wrong for plenty of other reasons and should never happen). The only negative consequence is that the access may take slightly longer than if it were aligned. I do understand the desire to be able to shake out unaligned accesses during development and testing
I don't think the comparison to a data abort works because obviously with a data abort the program usually can't just continue and still assume its internal state is valid.
Hi Soby,
The programmer can induce unaligned access though, although I am struggling to understand why the programmer would want to do that. The case you have pointed out seems like an error which needs to be corrected in platform code IMO.
Yes, it's an error, no question. My concern is just that those errors can be subtle and you're not guaranteed to find them, even with SCTLR.A set. In the concrete case that I had here, the code was just something like
static uint8_t alloc_buffer[512]; static int alloc_offset;
if (alloc_offset < sizeof(alloc_buffer)) { uint16_t *p = &alloc_buffer[alloc_offset]; alloc_offset += sizeof(uint16_t); ... }
And that is bad code that can easily be fixed, no question. But we had been testing this for over a year before it suddenly started crashing at an inopportune time after we already thought our code was stable. If SCTLR.A was unset, we would've never found this, but it also wouldn't really have hurt anyone in practice.
I think the same can happen with much more subtle situations, too. Consider code like this:
... smc_handler(uint32_t smc_fid, u_register_t base_address, u_register_t size, ...) { ...range check base_address, size... mmap_surrounding_page(base_address, size); struct somestruct *s = (void *)base_address; ... }
This code is wrong, too, but would reviewers really spot that? You're extremely unlikely to ever get an alignment fault in practice here because a kernel allocating a data structure to pass through an SMC would almost certainly allocate it aligned. But on the odd chance it doesn't (or if an attacker intentionally forces that), you suddenly have an EL3 panic that you never knew was there. If SCTLR.A was unset, the code would still work fine in that situation (with maybe a miniscule performance penalty).
I think this combination of "it's extremely hard to 100% prevent alignment fault risks throughout the code base" and "on the odd chance misaligned accesses do happen, they don't really cause a notable problem (when SCTLR.A is off)" is the reason that most other code bases I've seen (e.g. Linux) chose to treat avoiding misaligned accesses as a best-effort thing and leave SCTLR.A off. Accepting the occasional misaligned access doesn't actually hurt as much as crashing the whole system through an alignment fault. (After all, that's also how some other architectures like x86 work by default.)
But if you're really concerned about keeping unaligned accesses out of the code base at all costs, I understand that that flag is useful and that just enabling it for DEBUG builds may not be enough. I'll consider making this change downstream for the platform port I'm concerned about, then.
Hi Julius, Yes, I see the problem, thanks for the explanation. It is a dilemma and from a purist point of view, all alignment issues should be resolved and all SMC parameters must be strenuously verified which may not be the ground reality. For the SMC parameter case, the incoming buffers should be checked for proper alignment but I agree this is something that can be missed during review.
IMO, disabling alignment check for DEBUG=0 is not good as due to the nature of the problem, some of these issues will trigger only when image layout is different. As you suggest, clearing SCTLR.A within the platform code seems to be the easiest way to go about this. Perhaps we could have a new build option which platforms can set as part of their production build to leave SCTLR.A as 0. The rationale for the build parameter is not to allow unaligned access in TF-A as a policy, but for the reasons that we have discussed.
One thing which we should do is to make TF-A policy clearer on this aspect and also have more guidance for SMC parameter validation in the documentation.
Best Regards Soby Mathew
-----Original Message----- From: Julius Werner jwerner@google.com Sent: 10 November 2020 21:57 To: Soby Mathew Soby.Mathew@arm.com Cc: raghu.ncstate@icloud.com; tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Alignment fault checking in EL3
Hi Soby,
The programmer can induce unaligned access though, although I am
struggling to understand why the programmer would want to do that. The case you have pointed out seems like an error which needs to be corrected in platform code IMO.
Yes, it's an error, no question. My concern is just that those errors can be subtle and you're not guaranteed to find them, even with SCTLR.A set. In the concrete case that I had here, the code was just something like
static uint8_t alloc_buffer[512]; static int alloc_offset;
if (alloc_offset < sizeof(alloc_buffer)) { uint16_t *p = &alloc_buffer[alloc_offset]; alloc_offset += sizeof(uint16_t); ... }
And that is bad code that can easily be fixed, no question. But we had been testing this for over a year before it suddenly started crashing at an inopportune time after we already thought our code was stable. If SCTLR.A was unset, we would've never found this, but it also wouldn't really have hurt anyone in practice.
I think the same can happen with much more subtle situations, too. Consider code like this:
... smc_handler(uint32_t smc_fid, u_register_t base_address, u_register_t size, ...) { ...range check base_address, size... mmap_surrounding_page(base_address, size); struct somestruct *s = (void *)base_address; ... }
This code is wrong, too, but would reviewers really spot that? You're extremely unlikely to ever get an alignment fault in practice here because a kernel allocating a data structure to pass through an SMC would almost certainly allocate it aligned. But on the odd chance it doesn't (or if an attacker intentionally forces that), you suddenly have an EL3 panic that you never knew was there. If SCTLR.A was unset, the code would still work fine in that situation (with maybe a miniscule performance penalty).
I think this combination of "it's extremely hard to 100% prevent alignment fault risks throughout the code base" and "on the odd chance misaligned accesses do happen, they don't really cause a notable problem (when SCTLR.A is off)" is the reason that most other code bases I've seen (e.g. Linux) chose to treat avoiding misaligned accesses as a best-effort thing and leave SCTLR.A off. Accepting the occasional misaligned access doesn't actually hurt as much as crashing the whole system through an alignment fault. (After all, that's also how some other architectures like x86 work by default.)
But if you're really concerned about keeping unaligned accesses out of the code base at all costs, I understand that that flag is useful and that just enabling it for DEBUG builds may not be enough. I'll consider making this change downstream for the platform port I'm concerned about, then.
tf-a@lists.trustedfirmware.org