On Wed, Apr 02, 2025 at 09:42:39AM +0100, Marc Zyngier wrote:
On Wed, 02 Apr 2025 03:58:48 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
On Tue, 01 Apr 2025 18:05:20 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
[...]
This implementation has been heavily inspired by Xen's OP-TEE mediator.
[...]
And I think this inspiration is the source of most of the problems in this series.
Routing Secure Calls from the guest to whatever is on the secure side should not be the kernel's job at all. It should be the VMM's job. All you need to do is to route the SMCs from the guest to userspace, and we already have all the required infrastructure for that.
Yes, this was an argument at the time of designing this solution.
It is the VMM that should:
signal the TEE of VM creation/teardown
translate between IPAs and host VAs without involving KVM
let the host TEE driver translate between VAs and PAs and deal with the pinning as required, just like it would do for any userspace (without ever using the KVM memslot interface)
proxy requests from the guest to the TEE
in general, bear the complexity of anything related to the TEE
Major reason why I went with placing the implementation inside the kernel is,
- OP-TEE userspace lib (client) does not support sending SMCs for VM events and needs modification.
- QEMU (or every other VMM) will have to be modified.
Sure. And what? New feature, new API, new code. And what will happen once someone wants to use something other than OP-TEE? Or one of the many forks of OP-TEE that have a completely different ABI (cue the Android forks -- yes, plural)?
If something other than OP-TEE has to be supported, a specific mediator (such as drivers/tee/optee/optee_mediator.c) has to be constructed with handlers hooked via tee_mediator_register_ops().
But yes, the ABI might change and the implementor has the freedom to mediate it as required.
- OP-TEE driver is anyways in the kernel. A mediator will just be an addition and not a completely new entity.
Of course not. The TEE can be anywhere I want. On another machine if I decide so. Just because OP-TEE has a very simplistic model doesn't mean we have to be constrained by it.
- (Potential) issues if we would want to mediate requests from VM which has private mem.
Private memory means that not even the host has access to it, as it is the case with pKVM. How would that be an issue?
Guest shares memory to OP-TEE through a buffer filled with pointers, which the mediator has to read for IPA->PA translations of all these pointers. VMM wont be able to read these if memory is private.
But, this is a "potential" solution and if at all the mediator is moved to VMM, this is completely ruled out.
- Heavy VM exits if guest makes frequent TOS calls.
Sorry, I have to completely dismiss the argument here. I'm not even remotely considering performance for something that is essentially a full context switch of the whole machine. By definition, calling into EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and an additional exit to userspace will hardly register for anything other than a pointless latency benchmark.
Okay, makes sense.
Hence, the thought of making changes to too many entities (libteec, VMM, etc.) was a strong reason, although arguable.
It is a *terrible* reason. By this reasoning, we would have subsumed the whole VMM into the kernel (just like Xen), because "we don't want to change userspace".
Furthermore, you are not even considering basic things such as permissions. Your approach completely circumvents any form of access control, meaning that if any user that can create a VM can talk to the TEE, even if they don't have access to the TEE driver.
Well, this is a good point. OP-TEE built for NS-Virt supports handles calls from different VMs under different MMU partitions (will need to go off track to explain this). But, each VM's state and data remains isolated internally in S-EL1.
Yes, you could replicate access permission, SE-Linux, seccomp (and the rest of the security theater) at the KVM/TEE boundary, making the whole thing even more of a twisted mess.
Or you could simply do the right thing and let the kernel do its job the way it was intended by using the syscall interface from userspace.
In short, the VMM is just another piece of userspace using the TEE to do whatever it wants. The TEE driver on the host must obviously know about VMs, but that's about it.
Crucially, KVM should:
be completely TEE agnostic and never call into something that is TEE-specific
allow a TEE implementation entirely in userspace, specially for the machines that do not have EL3
Yes, you're right. Although I believe there still are some changes that need to be made to KVM for facilitating this. For example, kvm_smccc_get_action() would deny TOS call.
If something is missing in KVM to allow routing of SMCs to userspace, I'm more than happy to entertain the change.
Okay.
So, having an implementation completely in VMM without any change in KVM might be challenging, any potential solutions are welcome.
I've said what I have to say already, and pointed you in a direction that I see as both correct and maintainable.
Yes, I get your point on placing mediator in VMM. And now that I think of it, I believe I can make an improvement.
But yes, since too many entities are involved, the design of this solution has been a nightmare. Good to have been pushed this way.
Thanks,
M.
-- Jazz isn't dead. It just smells funny.
Thanks, Yuvraj Sakshith