Just to trying to understand, if TF-A were to expose a crash inducing SMC, this would still be restricted to special builds for your test runs ? This would not make it to production for Chromebook right ?
No, I would expose this in production Chromebooks (because our test facilities aren't really set up to work with special debug firmware builds). Assuming that it's a simple, straight-forward interface that can only induce a crash and nothing else, this would not be a concern for our threat model.
If that is the case, leaving aside the 9p filesystem issues, can DebugFS serve this requirement (we can remove the limitation that it is restricted to only Debug builds) ?
I don't really care that much how it is exposed to userspace on the kernel side, I think debugfs is perfectly fine for that (as long as write access is restricted to root, like /proc/sysrq-trigger). I just care about how the kernel transforms the debugfs accesses into SMCs and how those SMCs get handled in TF.
We have utilized the ARM SiP range for some "generic" purposes in the past (see PMF and the execution state switch SMCs). This could be direction for the some of use-cases. But if the SMCs are meant to be truly generic and to be relied on for use by generic normal world software components, it would need to be properly specified I would think.
For dynamically modifying some EL3 registers, it would be good to get these requirements out. Perhaps there is scope for architecting some of them as an ARM specification. If not, we could revert to a TF-A standard if there is enough pull for them (perhaps utilizing the ARM SiP range).
Hmmm... well, I think one problem with that is that it's really hard to change these interfaces after the fact. I guess that's the reason why you want to be careful with handing them out too quickly, but it also makes it very hard to unify multiple implementations after the fact. Once each vendor has implemented a custom interface for such a common use case in their SiP space, they will have kernel code using that custom SMC and they'll have products shipped with frozen firmware that only supports that SMC, so the kernel will need to continue supporting that for a long time anyway. If we then later come and say "we've identified that many platforms need to do this same common thing, so we've specified this new standardized API for that", it will be hard to get anyone to switch over to that. They're not gaining much from that and they'll still need to continue supporting their old method on the kernel side, so switching to a new one would just be extra hassle.
That's why I'm wondering if it would help to provide a "generic TF-A" SMC range where new SMCs can be allocated with little friction by just uploading a patch and going through normal Gerrit review. The only inclusion criteria should be that this is an API which looks like it might be useful for multiple vendors. Then we could keep an eye on what kind of new SiP SMCs vendors add on their platforms and nudge them to just immediately grab an ID from the generic space instead if we think it may become useful for other vendors in the future. That doesn't mean it needs to immediately be implemented in common code, the code consolidation could wait until later. This would probably lead to a lot of "dead" or vendor-specific-after-all SMCs, but there are a lot of IDs in 32 bits so I'm not sure that would really be a problem (they wouldn't really be a maintenance burden other than making sure the number doesn't get reused).