Hi Julius, Yes, I see the problem, thanks for the explanation. It is a dilemma and from a purist point of view, all alignment issues should be resolved and all SMC parameters must be strenuously verified which may not be the ground reality. For the SMC parameter case, the incoming buffers should be checked for proper alignment but I agree this is something that can be missed during review.
IMO, disabling alignment check for DEBUG=0 is not good as due to the nature of the problem, some of these issues will trigger only when image layout is different. As you suggest, clearing SCTLR.A within the platform code seems to be the easiest way to go about this. Perhaps we could have a new build option which platforms can set as part of their production build to leave SCTLR.A as 0. The rationale for the build parameter is not to allow unaligned access in TF-A as a policy, but for the reasons that we have discussed.
One thing which we should do is to make TF-A policy clearer on this aspect and also have more guidance for SMC parameter validation in the documentation.
Best Regards Soby Mathew
-----Original Message----- From: Julius Werner jwerner@google.com Sent: 10 November 2020 21:57 To: Soby Mathew Soby.Mathew@arm.com Cc: raghu.ncstate@icloud.com; tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Alignment fault checking in EL3
Hi Soby,
The programmer can induce unaligned access though, although I am
struggling to understand why the programmer would want to do that. The case you have pointed out seems like an error which needs to be corrected in platform code IMO.
Yes, it's an error, no question. My concern is just that those errors can be subtle and you're not guaranteed to find them, even with SCTLR.A set. In the concrete case that I had here, the code was just something like
static uint8_t alloc_buffer[512]; static int alloc_offset;
if (alloc_offset < sizeof(alloc_buffer)) { uint16_t *p = &alloc_buffer[alloc_offset]; alloc_offset += sizeof(uint16_t); ... }
And that is bad code that can easily be fixed, no question. But we had been testing this for over a year before it suddenly started crashing at an inopportune time after we already thought our code was stable. If SCTLR.A was unset, we would've never found this, but it also wouldn't really have hurt anyone in practice.
I think the same can happen with much more subtle situations, too. Consider code like this:
... smc_handler(uint32_t smc_fid, u_register_t base_address, u_register_t size, ...) { ...range check base_address, size... mmap_surrounding_page(base_address, size); struct somestruct *s = (void *)base_address; ... }
This code is wrong, too, but would reviewers really spot that? You're extremely unlikely to ever get an alignment fault in practice here because a kernel allocating a data structure to pass through an SMC would almost certainly allocate it aligned. But on the odd chance it doesn't (or if an attacker intentionally forces that), you suddenly have an EL3 panic that you never knew was there. If SCTLR.A was unset, the code would still work fine in that situation (with maybe a miniscule performance penalty).
I think this combination of "it's extremely hard to 100% prevent alignment fault risks throughout the code base" and "on the odd chance misaligned accesses do happen, they don't really cause a notable problem (when SCTLR.A is off)" is the reason that most other code bases I've seen (e.g. Linux) chose to treat avoiding misaligned accesses as a best-effort thing and leave SCTLR.A off. Accepting the occasional misaligned access doesn't actually hurt as much as crashing the whole system through an alignment fault. (After all, that's also how some other architectures like x86 work by default.)
But if you're really concerned about keeping unaligned accesses out of the code base at all costs, I understand that that flag is useful and that just enabling it for DEBUG builds may not be enough. I'll consider making this change downstream for the platform port I'm concerned about, then.