On Fri, Dec 13, 2019 at 6:20 AM Joanna Farley Joanna.Farley@arm.com wrote:
On the subject of DebugFS's purpose it was envisages and is today as Sandrine describes as a debug build only capability. Saying that though there has been some early thoughts that it could evolve into a Secure Debug feature where this type of capability or something like it is always on requiring debug certificates for authenticated access. This is something very much for a possible future evolution and is not in the patches available today. We would welcome any thoughts on such an evolution in this space.
I guess this gets into a bit of a philosophy discussion and becomes a matter of opinion, so there's probably no one right answer. Personally, adding authentication on top of this doesn't really resolve my concerns and adds yet more on top. I'm a strong proponent of the concept of a minimal Trusted Computing Base, i.e. keeping the amount of code executing at the highest privilege level as small and low-complexity as possible. Any code can have bugs, so the idea is that the more complicated the code you run in EL3 is (and the more complicated APIs it exposes), the more likely it becomes that you accidentally have an exploitable vulnerability in there. Like a p9 filesystem driver, a certificate-based authentication system (especially if it's based on x509/ASN.1 which are notoriously hard to implement safely) is a pretty complex piece of code with a pretty large attack surface that I'd rather not have in my EL3 firmware if I can avoid it. I understand that for certain use cases you may need something like this (if you really want a very extensive and extensible debugging API that must be restricted to a few authenticated actors), but in my use case I really just need the ability to trigger one small debugging feature and that feature itself doesn't need to be restricted, so a minimal SMC interface would work much better for that case.
On 13/12/2019, 13:01, "TF-A on behalf of Sandrine Bailleux via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote: Going back to the SMC-based solution then, I am not quite convinced SYSTEM_RESET2 is the right interface for intentionally triggering a panic in TF-A. I think the semantics do not quite match. If anything, a firmware crash seems more like a shutdown operation to me rather than a reset (we don't recover from a firmware crash). I am not even sure we should look into the PSCI SMC range, as it's not a power-management operation.
Crash recovery behavior is platform dependent (via plat_panic_handler()). On all the platforms we use in Chrome OS we have that implemented as a system reboot. I think for most systems (whether it's a Chromebook, a server or some embedded device) that's probably what you want for random runtime crashes (and least in a production environment), but I agree that TF doesn't enforce any standard behavior so it's hard to clearly match it to one or the other SMC.
So it sounds like it's not the first time that you hit this issue, is it? Do you have any other example of Normal World OS feature you would have liked to expose through a generic SMC interface? I am wondering whether this could help choosing the right SMC range, if we can identify some common criteria among a set of such features.
No, it's the first time I've really run into this. But I think we might quickly come up with more uses for a "non-secure OS" SMC range if we had one. We often see roughly the same SMC again on different platforms, because fundamentally they usually need to do the same kinds of things -- for example, most platforms have some kind of DDR frequency scaling which always needs part of it implemented in EL3, so they all need some kind of SMC to switch to a new DDR frequency. Many also need some kind of "write value to secure register" SMC that just allows the non-secure OS to write a few whitelisted registers that are only accessible in EL3 for some reason. If we could standardize these interfaces in a non-vendor-specific SMC range, we might be able to reduce some code duplication both on the TF and the Linux side.
I guess none of these things are really Linux-specific, now that I think of it. So really, I guess the problem is that it would be great to have a range of "generic" SMC IDs that can be easily and unbureaucratically allocated to try out new features, without having to ask Arm to write a big specification document about it every time. It's sort of a development velocity issue.