On 13/12/2019 22:04, Julius Werner via TF-A wrote:
On Fri, Dec 13, 2019 at 6:20 AM Joanna Farley Joanna.Farley@arm.com wrote:
On the subject of DebugFS's purpose it was envisages and is today as Sandrine describes as a debug build only capability. Saying that though there has been some early thoughts that it could evolve into a Secure Debug feature where this type of capability or something like it is always on requiring debug certificates for authenticated access. This is something very much for a possible future evolution and is not in the patches available today. We would welcome any thoughts on such an evolution in this space.
I guess this gets into a bit of a philosophy discussion and becomes a matter of opinion, so there's probably no one right answer. Personally, adding authentication on top of this doesn't really resolve my concerns and adds yet more on top. I'm a strong proponent of the concept of a minimal Trusted Computing Base, i.e. keeping the amount of code executing at the highest privilege level as small and low-complexity as possible. Any code can have bugs, so the idea is that the more complicated the code you run in EL3 is (and the more complicated APIs it exposes), the more likely it becomes that you accidentally have an exploitable vulnerability in there. Like a p9 filesystem driver, a certificate-based authentication system (especially if it's based on x509/ASN.1 which are notoriously hard to implement safely) is a pretty complex piece of code with a pretty large attack surface that I'd rather not have in my EL3 firmware if I can avoid it. I understand that for certain use cases you may need something like this (if you really want a very extensive and extensible debugging API that must be restricted to a few authenticated actors), but in my use case I really just need the ability to trigger one small debugging feature and that feature itself doesn't need to be restricted, so a minimal SMC interface would work much better for that case.
Hi Julius,
Just to trying to understand, if TF-A were to expose a crash inducing SMC, this would still be restricted to special builds for your test runs ? This would not make it to production for Chromebook right ?
I agree 9p filesystem is not desirable in a EL3 runtime firmware. We could enhance it to use a more tight data structure, if there is a desire in that direction.
If that is the case, leaving aside the 9p filesystem issues, can DebugFS serve this requirement (we can remove the limitation that it is restricted to only Debug builds) ?
The intention that DebugFS can prove useful atleast in the verification/testing space and if there is more we can do to get there, it would be good to know.
On 13/12/2019, 13:01, "TF-A on behalf of Sandrine Bailleux via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote: Going back to the SMC-based solution then, I am not quite convinced SYSTEM_RESET2 is the right interface for intentionally triggering a panic in TF-A. I think the semantics do not quite match. If anything, a firmware crash seems more like a shutdown operation to me rather than a reset (we don't recover from a firmware crash). I am not even sure we should look into the PSCI SMC range, as it's not a power-management operation.
Crash recovery behavior is platform dependent (via plat_panic_handler()). On all the platforms we use in Chrome OS we have that implemented as a system reboot. I think for most systems (whether it's a Chromebook, a server or some embedded device) that's probably what you want for random runtime crashes (and least in a production environment), but I agree that TF doesn't enforce any standard behavior so it's hard to clearly match it to one or the other SMC.
So it sounds like it's not the first time that you hit this issue, is it? Do you have any other example of Normal World OS feature you would have liked to expose through a generic SMC interface? I am wondering whether this could help choosing the right SMC range, if we can identify some common criteria among a set of such features.
No, it's the first time I've really run into this. But I think we might quickly come up with more uses for a "non-secure OS" SMC range if we had one. We often see roughly the same SMC again on different platforms, because fundamentally they usually need to do the same kinds of things -- for example, most platforms have some kind of DDR frequency scaling which always needs part of it implemented in EL3, so they all need some kind of SMC to switch to a new DDR frequency. Many also need some kind of "write value to secure register" SMC that just allows the non-secure OS to write a few whitelisted registers that are only accessible in EL3 for some reason. If we could standardize these interfaces in a non-vendor-specific SMC range, we might be able to reduce some code duplication both on the TF and the Linux side.
I guess none of these things are really Linux-specific, now that I think of it. So really, I guess the problem is that it would be great to have a range of "generic" SMC IDs that can be easily and unbureaucratically allocated to try out new features, without having to ask Arm to write a big specification document about it every time. It's sort of a development velocity issue.
We have utilized the ARM SiP range for some "generic" purposes in the past (see PMF and the execution state switch SMCs). This could be direction for the some of use-cases. But if the SMCs are meant to be truly generic and to be relied on for use by generic normal world software components, it would need to be properly specified I would think.
For dynamically modifying some EL3 registers, it would be good to get these requirements out. Perhaps there is scope for architecting some of them as an ARM specification. If not, we could revert to a TF-A standard if there is enough pull for them (perhaps utilizing the ARM SiP range).
Best Regards Soby Mathew
Just to trying to understand, if TF-A were to expose a crash inducing SMC, this would still be restricted to special builds for your test runs ? This would not make it to production for Chromebook right ?
No, I would expose this in production Chromebooks (because our test facilities aren't really set up to work with special debug firmware builds). Assuming that it's a simple, straight-forward interface that can only induce a crash and nothing else, this would not be a concern for our threat model.
If that is the case, leaving aside the 9p filesystem issues, can DebugFS serve this requirement (we can remove the limitation that it is restricted to only Debug builds) ?
I don't really care that much how it is exposed to userspace on the kernel side, I think debugfs is perfectly fine for that (as long as write access is restricted to root, like /proc/sysrq-trigger). I just care about how the kernel transforms the debugfs accesses into SMCs and how those SMCs get handled in TF.
We have utilized the ARM SiP range for some "generic" purposes in the past (see PMF and the execution state switch SMCs). This could be direction for the some of use-cases. But if the SMCs are meant to be truly generic and to be relied on for use by generic normal world software components, it would need to be properly specified I would think.
For dynamically modifying some EL3 registers, it would be good to get these requirements out. Perhaps there is scope for architecting some of them as an ARM specification. If not, we could revert to a TF-A standard if there is enough pull for them (perhaps utilizing the ARM SiP range).
Hmmm... well, I think one problem with that is that it's really hard to change these interfaces after the fact. I guess that's the reason why you want to be careful with handing them out too quickly, but it also makes it very hard to unify multiple implementations after the fact. Once each vendor has implemented a custom interface for such a common use case in their SiP space, they will have kernel code using that custom SMC and they'll have products shipped with frozen firmware that only supports that SMC, so the kernel will need to continue supporting that for a long time anyway. If we then later come and say "we've identified that many platforms need to do this same common thing, so we've specified this new standardized API for that", it will be hard to get anyone to switch over to that. They're not gaining much from that and they'll still need to continue supporting their old method on the kernel side, so switching to a new one would just be extra hassle.
That's why I'm wondering if it would help to provide a "generic TF-A" SMC range where new SMCs can be allocated with little friction by just uploading a patch and going through normal Gerrit review. The only inclusion criteria should be that this is an API which looks like it might be useful for multiple vendors. Then we could keep an eye on what kind of new SiP SMCs vendors add on their platforms and nudge them to just immediately grab an ID from the generic space instead if we think it may become useful for other vendors in the future. That doesn't mean it needs to immediately be implemented in common code, the code consolidation could wait until later. This would probably lead to a lot of "dead" or vendor-specific-after-all SMCs, but there are a lot of IDs in 32 bits so I'm not sure that would really be a problem (they wouldn't really be a maintenance burden other than making sure the number doesn't get reused).
tf-a@lists.trustedfirmware.org