A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided by the architecture (such as, SMC) which switch control to the firmware, are trapped in EL2 when the guest is executes them.
This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE.
Overview =========
Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction", after loading in arguments into CPU registers. What these arguments consists of and how both these entities communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting the HCR_EL2.TSC bit before entering the guest.
Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE. Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to OP-TEE.
OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor:
1. Notify OP-TEE when a VM is created. 2. Notify OP-TEE when a VM is destroyed. 3. Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0. 4. Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest. 5. Memory shared by the VM to OP-TEE has to remain pinned. 6. The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE.
Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is a hypervisor with a mediator in normal world.
This implementation has been heavily inspired by Xen's OP-TEE mediator.
Design ======
The unique design of KVM makes it quite challenging to implement such a mediator. OP-TEE is not aware of the host-guest paradigm. Hence, the mediator treats the host as a VM with VMID 1. The guests are assigned VMIDs starting from 2 (note, these are not the VMIDs tagged in TLB, rather we implement our own simple indexing mechanism).
When the host's OP-TEE driver is initialised or released, OP-TEE is notified about VM 1 being created/destroyed.
When a VMM (such as, QEMU) created a guest through KVM ioctls, a call to the TEE mediator layer is made, which in-turn calls OP-TEE mediator which eventually assigns a VM context, VMID, etc. and notifies OP-TEE about guest creation. The opposite happens on guest destruction.
When the guest makes an SMC targetting OP-TEE, it is trapped by the hypervisor and the register state (kvm_vcpu) is sent to the OP-TEE mediator through the TEE layer. Here there are two possibilities.
The guest may make an SMC with arguments which are simple numeric values, exchanging UUID, version information, etc. In this case, the mediator has much less work. It has to attach VMID into X7 and pass the register state to OP-TEE.
But, when guest passes memory addresses as arguments, the mediator has to translate these into physical addresses from intermediate physical addresses (IPA). According to the OP-TEE protocol (as documented in optee_smc.h and optee_msg.h), the guest OP-TEE driver would share a buffer filled with pointers, which the mediator translates.
The OP-TEE mediator also keeps track of active calls between each guest and OP-TEE, and pins pages which are already shared. This is to avoid swapping of shared pages by the host under memory pressure. These pages are unpinned as soon as guest's transaction completes with OP-TEE.
Testing =======
The feature has been tested on QEMU virt platform using "xtest" as the test suite. As of now, all of 35000+ tests pass. The mediator has also been stressed under memory pressure and all tests pass too. Any suggestions on further testing the feature are welcome.
Call for review =============== Any insights/suggestions regarding the implementation are appreciated.
Yuvraj Sakshith (7): firmware: smccc: Add macros for Trusted OS/App owner check on SMC value tee: Add TEE Mediator module which aims to expose TEE to a KVM guest. KVM: Notify TEE Mediator when KVM creates and destroys guests KVM: arm64: Forward guest CPU state to TEE mediator on SMC trap tee: optee: Add OPTEE_SMC_VM_CREATED and OPTEE_SMC_VM_DESTROYED tee: optee: Add OP-TEE Mediator tee: optee: Notify TEE Mediator on OP-TEE driver initialization and release
arch/arm64/kvm/hypercalls.c | 15 +- drivers/tee/Kconfig | 5 + drivers/tee/Makefile | 1 + drivers/tee/optee/Kconfig | 7 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/core.c | 13 +- drivers/tee/optee/optee_mediator.c | 1319 ++++++++++++++++++++++++++++ drivers/tee/optee/optee_mediator.h | 103 +++ drivers/tee/optee/optee_smc.h | 53 ++ drivers/tee/optee/smc_abi.c | 6 + drivers/tee/tee_mediator.c | 145 +++ include/linux/arm-smccc.h | 8 + include/linux/tee_mediator.h | 39 + virt/kvm/kvm_main.c | 11 +- 14 files changed, 1721 insertions(+), 5 deletions(-) create mode 100644 drivers/tee/optee/optee_mediator.c create mode 100644 drivers/tee/optee/optee_mediator.h create mode 100644 drivers/tee/tee_mediator.c create mode 100644 include/linux/tee_mediator.h
This patch adds ARM_SMCCC_IS_OWNER_TRUSTED_APP() and ARM_SMCCC_IS_OWNER_TRUSTED_OS() macros. These can be used to identify if the SMC is targetted at a Trusted OS/App in the secure world.
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- include/linux/arm-smccc.h | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index f19be5754090..da2b4565d5b3 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -56,6 +56,14 @@ #define ARM_SMCCC_OWNER_TRUSTED_OS 50 #define ARM_SMCCC_OWNER_TRUSTED_OS_END 63
+#define ARM_SMCCC_IS_OWNER_TRUSTED_APP(smc_val) \ + ((ARM_SMCCC_OWNER_NUM(smc_val) >= ARM_SMCCC_OWNER_TRUSTED_APP) && \ + (ARM_SMCCC_OWNER_NUM(smc_val) <= ARM_SMCCC_OWNER_TRUSTED_APP_END)) + +#define ARM_SMCCC_IS_OWNER_TRUSTED_OS(smc_val) \ + ((ARM_SMCCC_OWNER_NUM(smc_val) >= ARM_SMCCC_OWNER_TRUSTED_OS) && \ + (ARM_SMCCC_OWNER_NUM(smc_val) <= ARM_SMCCC_OWNER_TRUSTED_OS_END)) + #define ARM_SMCCC_FUNC_QUERY_CALL_UID 0xff01
#define ARM_SMCCC_QUIRK_NONE 0
The TEE Mediator module is an upper abstraction layer which lets KVM guests interact with a trusted execution environment.
TEE specific subsystems (such as OP-TEE, for example) can register a set of handlers through tee_mediator_register_ops() with the TEE Mediator, which will be called by the kernel when required.
Given this module, architecture specific TEE drivers can implement handler functions to work with these events if necessary. In most implementations, a special instruction (such as SMC, in arm64) switches control leading to the TEE. These instructions are usually trapped by the hypervisor when executed by the guest.
This module allows making use of these trapped instructions and mediating the request between guest and TEE.
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- drivers/tee/Kconfig | 5 ++ drivers/tee/Makefile | 1 + drivers/tee/tee_mediator.c | 145 +++++++++++++++++++++++++++++++++++ include/linux/tee_mediator.h | 39 ++++++++++ 4 files changed, 190 insertions(+) create mode 100644 drivers/tee/tee_mediator.c create mode 100644 include/linux/tee_mediator.h
diff --git a/drivers/tee/Kconfig b/drivers/tee/Kconfig index 61b507c18780..dc446c9746ee 100644 --- a/drivers/tee/Kconfig +++ b/drivers/tee/Kconfig @@ -11,6 +11,11 @@ menuconfig TEE This implements a generic interface towards a Trusted Execution Environment (TEE).
+config TEE_MEDIATOR + bool "Trusted Execution Environment Mediator support" + depends on KVM + help + Provides an abstraction layer for TEE drivers to mediate KVM guest requests to the TEE. if TEE
source "drivers/tee/optee/Kconfig" diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile index 5488cba30bd2..46c44e59dd0b 100644 --- a/drivers/tee/Makefile +++ b/drivers/tee/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_TEE) += tee.o +obj-$(CONFIG_TEE_MEDIATOR) += tee_mediator.o tee-objs += tee_core.o tee-objs += tee_shm.o tee-objs += tee_shm_pool.o diff --git a/drivers/tee/tee_mediator.c b/drivers/tee/tee_mediator.c new file mode 100644 index 000000000000..d1ae7f4cb994 --- /dev/null +++ b/drivers/tee/tee_mediator.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * TEE Mediator for the Linux Kernel + * + * This module enables a KVM guest to interact with a + * Trusted Execution Environment in the secure processing + * state provided by the architecture. + * + * Author: + * Yuvraj Sakshith yuvraj.kernel@gmail.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/tee_mediator.h> + +static struct tee_mediator *mediator; + +int tee_mediator_register_ops(struct tee_mediator_ops *ops) +{ + + int ret = 0; + + if (!ops) { + ret = -EINVAL; + goto out; + } + + if (!mediator) { + ret = -EOPNOTSUPP; + goto out; + } + + mediator->ops = ops; + +out: + return ret; +} + +int tee_mediator_is_active(void) +{ + return (mediator != NULL && + mediator->ops != NULL && mediator->ops->is_active()); +} + +int tee_mediator_create_host(void) +{ + int ret = 0; + + if (!tee_mediator_is_active() || !mediator->ops->create_host) { + ret = -ENODEV; + goto out; + } + + ret = mediator->ops->create_host(); + +out: + return ret; +} + +int tee_mediator_destroy_host(void) +{ + int ret = 0; + + if (!tee_mediator_is_active() || !mediator->ops->destroy_host) { + ret = -ENODEV; + goto out; + } + + ret = mediator->ops->destroy_host(); +out: + return ret; +} + +int tee_mediator_create_vm(struct kvm *kvm) +{ + int ret = 0; + + if (!kvm) { + ret = -EINVAL; + goto out; + } + + if (!tee_mediator_is_active() || !mediator->ops->create_vm) { + ret = -ENODEV; + goto out; + } + + ret = mediator->ops->create_vm(kvm); + +out: + return ret; +} + +int tee_mediator_destroy_vm(struct kvm *kvm) +{ + int ret = 0; + + if (!kvm) { + ret = -EINVAL; + goto out; + } + + if (!tee_mediator_is_active() || !mediator->ops->destroy_vm) { + ret = -ENODEV; + goto out; + } + + ret = mediator->ops->destroy_vm(kvm); + +out: + return ret; +} + +void tee_mediator_forward_request(struct kvm_vcpu *vcpu) +{ + if (!vcpu || !tee_mediator_is_active() || !mediator->ops->forward_request) + return; + + mediator->ops->forward_request(vcpu); +} + +static int __init tee_mediator_init(void) +{ + int ret = 0; + + mediator = kzalloc(sizeof(*mediator), GFP_KERNEL); + if (!mediator) { + ret = -ENOMEM; + goto out; + } + + pr_info("mediator initialised\n"); +out: + return ret; +} +module_init(tee_mediator_init); + +static void __exit tee_mediator_exit(void) +{ + kfree(mediator); + + pr_info("mediator exiting\n"); +} +module_exit(tee_mediator_exit); diff --git a/include/linux/tee_mediator.h b/include/linux/tee_mediator.h new file mode 100644 index 000000000000..4a971de158ec --- /dev/null +++ b/include/linux/tee_mediator.h @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * TEE Mediator for the Linux Kernel + * + * This module enables a KVM guest to interact with a + * Trusted Execution Environment in the secure processing + * state provided by the architecture. + * + * Author: + * Yuvraj Sakshith yuvraj.kernel@gmail.com + */ + +#ifndef __TEE_MEDIATOR_H +#define __TEE_MEDIATOR_H + +#include <linux/kvm_host.h> + +struct tee_mediator_ops { + int (*create_host)(void); + int (*destroy_host)(void); + int (*create_vm)(struct kvm *kvm); + int (*destroy_vm)(struct kvm *kvm); + void (*forward_request)(struct kvm_vcpu *vcpu); + int (*is_active)(void); +}; + +struct tee_mediator { + struct tee_mediator_ops *ops; +}; + +int tee_mediator_create_host(void); +int tee_mediator_destroy_host(void); +int tee_mediator_create_vm(struct kvm *kvm); +int tee_mediator_destroy_vm(struct kvm *kvm); +void tee_mediator_forward_request(struct kvm_vcpu *vcpu); +int tee_mediator_is_active(void); +int tee_mediator_register_ops(struct tee_mediator_ops *ops); + +#endif
TEEs supporting virtualization in the rich execution environment would want to know about guest creation and destruction by the hypervisor.
This change notifies the TEE mediator of these events (if its active).
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- virt/kvm/kvm_main.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ba0327e2d0d3..65f1f5075fdd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -49,6 +49,7 @@ #include <linux/lockdep.h> #include <linux/kthread.h> #include <linux/suspend.h> +#include <linux/tee_mediator.h>
#include <asm/processor.h> #include <asm/ioctl.h> @@ -1250,7 +1251,10 @@ static void kvm_destroy_vm(struct kvm *kvm) { int i; struct mm_struct *mm = kvm->mm; - +#ifdef CONFIG_TEE_MEDIATOR + if (tee_mediator_is_active()) + (void) tee_mediator_destroy_vm(kvm); +#endif kvm_destroy_pm_notifier(kvm); kvm_uevent_notify_change(KVM_EVENT_DESTROY_VM, kvm); kvm_destroy_vm_debugfs(kvm); @@ -5407,7 +5411,10 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) * care of doing kvm_put_kvm(kvm). */ kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm); - +#ifdef CONFIG_TEE_MEDIATOR + if (tee_mediator_is_active()) + (void) tee_mediator_create_vm(kvm); +#endif fd_install(fd, file); return fd;
When guest makes an SMC, the call is denied by the hypervisor and not handled (ignored). In the presence of the TEE Mediator module, the SMC from guest is forwarded with it's vCPU register state through tee_mediator_forward_request().
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- arch/arm64/kvm/hypercalls.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 569941eeb3fe..cb34bb87188c 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -3,6 +3,7 @@
#include <linux/arm-smccc.h> #include <linux/kvm_host.h> +#include <linux/tee_mediator.h>
#include <asm/kvm_emulate.h>
@@ -90,7 +91,10 @@ static bool kvm_smccc_default_allowed(u32 func_id) */ if (func_id >= KVM_PSCI_FN(0) && func_id <= KVM_PSCI_FN(3)) return true; - +#ifdef CONFIG_TEE_MEDIATOR + if (ARM_SMCCC_IS_OWNER_TRUSTED_APP(func_id) || ARM_SMCCC_IS_OWNER_TRUSTED_OS(func_id)) + return true; +#endif return false; } } @@ -284,7 +288,14 @@ int kvm_smccc_call_handler(struct kvm_vcpu *vcpu) WARN_RATELIMIT(1, "Unhandled SMCCC filter action: %d\n", action); goto out; } - +#ifdef CONFIG_TEE_MEDIATOR + if (ARM_SMCCC_IS_OWNER_TRUSTED_APP(func_id) || ARM_SMCCC_IS_OWNER_TRUSTED_OS(func_id)) { + if (tee_mediator_is_active()) { + tee_mediator_forward_request(vcpu); + return 1; + } + } +#endif switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: val[0] = ARM_SMCCC_VERSION_1_1;
OP-TEE when compiled with NS-Virtualization support expects NS-Hypervisor to notify events such as guest creation and destruction through SMCs.
This change adds two macros OPTEE_SMC_VM_CREATED and OPTEE_SMC_VM_DESTROYED.
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- drivers/tee/optee/optee_smc.h | 53 +++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+)
diff --git a/drivers/tee/optee/optee_smc.h b/drivers/tee/optee/optee_smc.h index 879426300821..988539b2407b 100644 --- a/drivers/tee/optee/optee_smc.h +++ b/drivers/tee/optee/optee_smc.h @@ -452,6 +452,59 @@ struct optee_smc_disable_shm_cache_result { /* See OPTEE_SMC_CALL_WITH_REGD_ARG above */ #define OPTEE_SMC_FUNCID_CALL_WITH_REGD_ARG 19
+/* + * Inform OP-TEE about a new virtual machine + * + * Hypervisor issues this call during virtual machine (guest) creation. + * OP-TEE records client id of new virtual machine and prepares + * to receive requests from it. This call is available only if OP-TEE + * was built with virtualization support. + * + * Call requests usage: + * a0 SMC Function ID, OPTEE_SMC_VM_CREATED + * a1 Hypervisor Client ID of newly created virtual machine + * a2-6 Not used + * a7 Hypervisor Client ID register. Must be 0, because only hypervisor + * can issue this call + * + * Normal return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1-7 Preserved + * + * Error return: + * a0 OPTEE_SMC_RETURN_ENOTAVAIL OP-TEE have no resources for + * another VM + * a1-7 Preserved + * + */ +#define OPTEE_SMC_FUNCID_VM_CREATED 13 +#define OPTEE_SMC_VM_CREATED \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_VM_CREATED) + +/* + * Inform OP-TEE about shutdown of a virtual machine + * + * Hypervisor issues this call during virtual machine (guest) destruction. + * OP-TEE will clean up all resources associated with this VM. This call is + * available only if OP-TEE was built with virtualization support. + * + * Call requests usage: + * a0 SMC Function ID, OPTEE_SMC_VM_DESTROYED + * a1 Hypervisor Client ID of virtual machine being shut down + * a2-6 Not used + * a7 Hypervisor Client ID register. Must be 0, because only hypervisor + * can issue this call + * + * Normal return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1-7 Preserved + * + */ + +#define OPTEE_SMC_FUNCID_VM_DESTROYED 14 +#define OPTEE_SMC_VM_DESTROYED \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_VM_DESTROYED) + /* * Resume from RPC (for example after processing a foreign interrupt) *
The OP-TEE mediator is a software entity resposible for bridging the gap between a KVM guest and OP-TEE in the secure world.
The guest's OP-TEE driver issues an SMC instruction after populating its CPU registers with appropriate arguments. This SMC is trapped by the hypervisor and control is passed to the OP-TEE mediator with the vCPU state.
The mediator is resposible for manipulating the vCPU state accordingly and keeping track of active transactions between the guest and the TEE.
This implementation adds event handlers that gets hooked to the TEE Mediator layer and are called when events such as guest creation/destruction, mediator status check and guest SMC trap happen.
Important routines implemented:
- optee_mediator_{create|destroy}_vm(): Sends an SMC to OP-TEE notifying KVM guest creation or destruction.
- optee_mediator_{create|destroy}_host(): Sends an SMC to OP-TEE notifying host OP-TEE driver initalization/release (OP-TEE treats all NS-EL1 entities as a guest, and is not aware of host privilege).
- optee_mediator_forward_smc(): Changes vCPU register state as required by OP-TEE and keeps track of standard calls and memory shared by guest.
The OP-TEE mediator is implemented in such a way that, the guest/VMM can remain unmodified. The guest interacts with OP-TEE just as it would if it was running natively.
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- drivers/tee/optee/Kconfig | 7 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/optee_mediator.c | 1319 ++++++++++++++++++++++++++++ drivers/tee/optee/optee_mediator.h | 103 +++ 4 files changed, 1430 insertions(+) create mode 100644 drivers/tee/optee/optee_mediator.c create mode 100644 drivers/tee/optee/optee_mediator.h
diff --git a/drivers/tee/optee/Kconfig b/drivers/tee/optee/Kconfig index 7bb7990d0b07..ef41d6d1793e 100644 --- a/drivers/tee/optee/Kconfig +++ b/drivers/tee/optee/Kconfig @@ -25,3 +25,10 @@ config OPTEE_INSECURE_LOAD_IMAGE
Additional documentation on kernel security risks are at Documentation/tee/op-tee.rst. + +config OPTEE_MEDIATOR + bool "OP-TEE Mediator support" + depends on TEE_MEDIATOR && OPTEE && ARM64 && KVM + help + This enables a KVM guest equipped with an OP-TEE driver to interact with OP-TEE + in the secure world. diff --git a/drivers/tee/optee/Makefile b/drivers/tee/optee/Makefile index a6eff388d300..4a777940e0df 100644 --- a/drivers/tee/optee/Makefile +++ b/drivers/tee/optee/Makefile @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_OPTEE_MEDIATOR) += optee_mediator.o obj-$(CONFIG_OPTEE) += optee.o optee-objs += core.o optee-objs += call.o diff --git a/drivers/tee/optee/optee_mediator.c b/drivers/tee/optee/optee_mediator.c new file mode 100644 index 000000000000..d164eae570a9 --- /dev/null +++ b/drivers/tee/optee/optee_mediator.c @@ -0,0 +1,1319 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * OP-TEE Mediator for the Linux Kernel + * + * This module enables a KVM guest to interact with OP-TEE + * in the secure world by hooking event handlers with + * the TEE Mediator layer. + * + * Author: + * Yuvraj Sakshith yuvraj.kernel@gmail.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include "optee_mediator.h" +#include "optee_smc.h" +#include "optee_msg.h" +#include "optee_private.h" +#include "optee_rpc_cmd.h" + +#include <linux/tee_mediator.h> +#include <linux/kvm_host.h> +#include <linux/arm-smccc.h> +#include <linux/types.h> +#include <linux/list.h> +#include <linux/spinlock.h> +#include <linux/mm_types.h> +#include <linux/minmax.h> + +#include <asm/kvm_emulate.h> + +#define OPTEE_KNOWN_NSEC_CAPS OPTEE_SMC_NSEC_CAP_UNIPROCESSOR +#define OPTEE_KNOWN_SEC_CAPS (OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM | \ + OPTEE_SMC_SEC_CAP_UNREGISTERED_SHM | \ + OPTEE_SMC_SEC_CAP_DYNAMIC_SHM | \ + OPTEE_SMC_SEC_CAP_MEMREF_NULL) + +static struct optee_mediator *mediator; +static spinlock_t mediator_lock; +static u32 optee_thread_limit; + +static void copy_regs_from_vcpu(struct kvm_vcpu *vcpu, struct guest_regs *regs) +{ + if (!vcpu || !regs) + return; + + regs->a0 = vcpu_get_reg(vcpu, 0); + regs->a1 = vcpu_get_reg(vcpu, 1); + regs->a2 = vcpu_get_reg(vcpu, 2); + regs->a3 = vcpu_get_reg(vcpu, 3); + regs->a4 = vcpu_get_reg(vcpu, 4); + regs->a5 = vcpu_get_reg(vcpu, 5); + regs->a6 = vcpu_get_reg(vcpu, 6); + regs->a7 = vcpu_get_reg(vcpu, 7); +} + +static void copy_smccc_res_to_vcpu(struct kvm_vcpu *vcpu, struct arm_smccc_res *res) +{ + + vcpu_set_reg(vcpu, 0, res->a0); + vcpu_set_reg(vcpu, 1, res->a1); + vcpu_set_reg(vcpu, 2, res->a2); + vcpu_set_reg(vcpu, 3, res->a3); +} + +static void optee_mediator_smccc_smc(struct guest_regs *regs, struct arm_smccc_res *res) +{ + + arm_smccc_smc(regs->a0, regs->a1, regs->a2, regs->a3, + regs->a4, regs->a5, regs->a6, regs->a7, res); +} + +static int optee_mediator_pin_guest_page(struct kvm *kvm, gpa_t gpa) +{ + + int ret = 0; + + gfn_t gfn = gpa >> PAGE_SHIFT; + + struct kvm_memory_slot *memslot = gfn_to_memslot(kvm, gfn); + + if (!memslot) { + ret = -EAGAIN; + goto out; + } + + struct page *pages; + + if (!pin_user_pages_unlocked(memslot->userspace_addr, + 1, + &pages, + FOLL_LONGTERM)) { + ret = -EAGAIN; + goto out; + } + +out: + return ret; +} + +static void optee_mediator_unpin_guest_page(struct kvm *kvm, gpa_t gpa) +{ + + gfn_t gfn = gpa >> PAGE_SHIFT; + + struct page *page = gfn_to_page(kvm, gfn); + + if (!page) + goto out; + + unpin_user_page(page); + +out: + return; +} + +static struct optee_vm_context *optee_mediator_find_vm_context(struct kvm *kvm) +{ + + struct optee_vm_context *vm_context, *tmp; + int found = 0; + + if (!kvm) + goto out; + + mutex_lock(&mediator->vm_list_lock); + + list_for_each_entry_safe(vm_context, tmp, &mediator->vm_list, list) { + if (vm_context->kvm == kvm) { + found = 1; + break; + } + } + + mutex_unlock(&mediator->vm_list_lock); + +out: + if (!found) + return NULL; + + return vm_context; +} + +static void optee_mediator_add_vm_context(struct optee_vm_context *vm_context) +{ + + if (!vm_context) + goto out; + + mutex_lock(&mediator->vm_list_lock); + list_add_tail(&vm_context->list, &mediator->vm_list); + mutex_unlock(&mediator->vm_list_lock); + +out: + return; +} + +static void optee_mediator_delete_vm_context(struct optee_vm_context *vm_context) +{ + + struct optee_vm_context *cursor_vm_context, *tmp; + struct optee_std_call *call, *tmp_call; + struct optee_shm_rpc *shm_rpc, *tmp_shm_rpc; + struct optee_shm_buf *shm_buf, *tmp_shm_buf; + + if (!vm_context) + goto out; + + mutex_lock(&vm_context->lock); + + list_for_each_entry_safe(call, tmp_call, &vm_context->std_call_list, list) { + if (call) { + optee_mediator_unpin_guest_page(vm_context->kvm, (gpa_t) call->guest_arg_gpa); + + list_del(&call->list); + kfree(call->shadow_arg); + kfree(call); + } + } + + + list_for_each_entry_safe(shm_buf, tmp_shm_buf, &vm_context->shm_buf_list, list) { + if (shm_buf) { + + for (int j = 0; j < shm_buf->num_pages; j++) + optee_mediator_unpin_guest_page(vm_context->kvm, (gpa_t) shm_buf->guest_page_list[j]); + + list_del(&shm_buf->list); + kfree(shm_buf->shadow_buffer_list); + kfree(shm_buf->guest_page_list); + kfree(shm_buf); + } + } + + list_for_each_entry_safe(shm_rpc, tmp_shm_rpc, &vm_context->shm_rpc_list, list) { + if (shm_rpc) { + optee_mediator_unpin_guest_page(vm_context->kvm, (gpa_t) shm_rpc->rpc_arg_gpa); + list_del(&shm_rpc->list); + kfree(shm_rpc); + } + } + + + mutex_unlock(&vm_context->lock); + + mutex_lock(&mediator->vm_list_lock); + + list_for_each_entry_safe(cursor_vm_context, tmp, &mediator->vm_list, list) { + if (cursor_vm_context == vm_context) { + list_del(&cursor_vm_context->list); + kfree(cursor_vm_context); + + goto out_unlock; + } + } + +out_unlock: + mutex_unlock(&mediator->vm_list_lock); +out: + return; +} + +static struct optee_std_call *optee_mediator_new_std_call(void) +{ + struct optee_std_call *call = kzalloc(sizeof(*call), GFP_KERNEL); + + if (!call) + return NULL; + + return call; +} + +static void optee_mediator_del_std_call(struct optee_std_call *call) +{ + if (!call) + return; + + kfree(call); +} + +static void optee_mediator_enlist_std_call(struct optee_vm_context *vm_context, struct optee_std_call *call) +{ + mutex_lock(&vm_context->lock); + list_add_tail(&call->list, &vm_context->std_call_list); + vm_context->call_count++; + mutex_unlock(&vm_context->lock); + + optee_mediator_pin_guest_page(vm_context->kvm, (gpa_t) call->guest_arg_gpa); +} + +static void optee_mediator_delist_std_call(struct optee_vm_context *vm_context, struct optee_std_call *call) +{ + mutex_lock(&vm_context->lock); + list_del(&call->list); + vm_context->call_count--; + mutex_unlock(&vm_context->lock); + + optee_mediator_unpin_guest_page(vm_context->kvm, (gpa_t) call->guest_arg_gpa); +} + +static struct optee_std_call *optee_mediator_find_std_call(struct optee_vm_context *vm_context, u32 thread_id) +{ + struct optee_std_call *call; + int found = 0; + + mutex_lock(&vm_context->lock); + list_for_each_entry(call, &vm_context->std_call_list, list) { + if (call->thread_id == thread_id) { + found = 1; + break; + } + } + mutex_unlock(&vm_context->lock); + + if (!found) + return NULL; + + return call; +} + +static struct optee_shm_buf *optee_mediator_new_shm_buf(void) +{ + struct optee_shm_buf *shm_buf = kzalloc(sizeof(*shm_buf), GFP_KERNEL); + + return shm_buf; +} + +static void optee_mediator_enlist_shm_buf(struct optee_vm_context *vm_context, struct optee_shm_buf *shm_buf) +{ + mutex_lock(&vm_context->lock); + list_add_tail(&shm_buf->list, &vm_context->shm_buf_list); + vm_context->shm_buf_page_count += shm_buf->num_pages; + mutex_unlock(&vm_context->lock); + + for (int i = 0; i < shm_buf->num_pages; i++) + optee_mediator_pin_guest_page(vm_context->kvm, (gpa_t) shm_buf->guest_page_list[i]); +} + +static void optee_mediator_free_shm_buf(struct optee_vm_context *vm_context, u64 cookie) +{ + + struct optee_shm_buf *shm_buf, *tmp; + + mutex_lock(&vm_context->lock); + list_for_each_entry_safe(shm_buf, tmp, &vm_context->shm_buf_list, list) { + if (shm_buf->cookie == cookie) { + for (int buf = 0; buf < shm_buf->num_buffers; buf++) + kfree(shm_buf->shadow_buffer_list[buf]); + + for (int buf = 0; buf < shm_buf->num_pages; buf++) + optee_mediator_unpin_guest_page(vm_context->kvm, (gpa_t) shm_buf->guest_page_list[buf]); + + vm_context->shm_buf_page_count -= shm_buf->num_pages; + + list_del(&shm_buf->list); + + kfree(shm_buf->shadow_buffer_list); + kfree(shm_buf->guest_page_list); + kfree(shm_buf); + break; + } + } + mutex_unlock(&vm_context->lock); +} + +static void optee_mediator_free_all_buffers(struct optee_vm_context *vm_context, struct optee_std_call *call) +{ + + for (int i = 0; i < call->shadow_arg->num_params; i++) { + u64 attr = call->shadow_arg->params[i].attr; + + switch (attr & OPTEE_MSG_ATTR_TYPE_MASK) { + + case OPTEE_MSG_ATTR_TYPE_TMEM_INPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_INOUT: + optee_mediator_free_shm_buf(vm_context, call->shadow_arg->params[i].u.tmem.shm_ref); + break; + default: + break; + + } + } +} + +static void optee_mediator_free_shm_buf_page_list(struct optee_vm_context *vm_context, u64 cookie) +{ + mutex_lock(&vm_context->lock); + + struct optee_shm_buf *shm_buf; + + list_for_each_entry(shm_buf, &vm_context->shm_buf_list, list) { + if (shm_buf->cookie == cookie) { + for (int entry = 0; entry < shm_buf->num_buffers; entry++) { + kfree(shm_buf->shadow_buffer_list[entry]); + shm_buf->shadow_buffer_list[entry] = NULL; + } + break; + } + } + + mutex_unlock(&vm_context->lock); +} + +static struct optee_shm_rpc *optee_mediator_new_shm_rpc(void) +{ + struct optee_shm_rpc *shm_rpc = kzalloc(sizeof(*shm_rpc), GFP_KERNEL); + + return shm_rpc; +} + +static void optee_mediator_enlist_shm_rpc(struct optee_vm_context *vm_context, struct optee_shm_rpc *shm_rpc) +{ + mutex_lock(&vm_context->lock); + list_add_tail(&shm_rpc->list, &vm_context->shm_rpc_list); + mutex_unlock(&vm_context->lock); + + optee_mediator_pin_guest_page(vm_context->kvm, (gpa_t) shm_rpc->rpc_arg_gpa); +} + +static struct optee_shm_rpc *optee_mediator_find_shm_rpc(struct optee_vm_context *vm_context, u64 cookie) +{ + + struct optee_shm_rpc *shm_rpc; + int found = 0; + + mutex_lock(&vm_context->lock); + list_for_each_entry(shm_rpc, &vm_context->shm_rpc_list, list) { + if (shm_rpc->cookie == cookie) { + found = 1; + break; + } + } + mutex_unlock(&vm_context->lock); + + if (!found) + return NULL; + + return shm_rpc; +} + +static void optee_mediator_free_shm_rpc(struct optee_vm_context *vm_context, u64 cookie) +{ + + struct optee_shm_rpc *shm_rpc, *tmp; + + mutex_lock(&vm_context->lock); + + list_for_each_entry_safe(shm_rpc, tmp, &vm_context->shm_rpc_list, list) { + if (shm_rpc->cookie == cookie) { + + optee_mediator_unpin_guest_page(vm_context->kvm, (gpa_t) shm_rpc->rpc_arg_gpa); + + list_del(&shm_rpc->list); + kfree(shm_rpc); + break; + } + } + + mutex_unlock(&vm_context->lock); +} + +static hva_t optee_mediator_gpa_to_hva(struct kvm *kvm, gpa_t gpa) +{ + gfn_t gfn = gpa >> PAGE_SHIFT; + + struct page *page = gfn_to_page(kvm, gfn); + + if (!page) + return 0; + + hva_t hva = (hva_t) page_to_virt(page); + return hva; +} + +static hva_t optee_mediator_gpa_to_phys(struct kvm *kvm, gpa_t gpa) +{ + gfn_t gfn = gpa >> PAGE_SHIFT; + + struct page *page = gfn_to_page(kvm, gfn); + + if (!page) + return 0; + + phys_addr_t phys = (phys_addr_t) page_to_phys(page); + return phys; +} + + +static int optee_mediator_shadow_msg_arg(struct kvm *kvm, struct optee_std_call *call) +{ + + int ret = 0; + + call->shadow_arg = kzalloc(OPTEE_MSG_NONCONTIG_PAGE_SIZE, GFP_KERNEL); + + if (!call->shadow_arg) { + ret = OPTEE_SMC_RETURN_ENOMEM; + goto out; + } + + ret = kvm_read_guest(kvm, (gpa_t)call->guest_arg_gpa, (void *) call->shadow_arg, OPTEE_MSG_NONCONTIG_PAGE_SIZE); + +out: + + return ret; +} + +static void optee_mediator_shadow_arg_sync(struct optee_std_call *call) +{ + + + + call->guest_arg_hva->ret = call->shadow_arg->ret; + call->guest_arg_hva->ret_origin = call->shadow_arg->ret_origin; + call->guest_arg_hva->session = call->shadow_arg->session; + + for (int i = 0; i < call->shadow_arg->num_params; i++) { + u32 attr = call->shadow_arg->params[i].attr; + + switch (attr & OPTEE_MSG_ATTR_TYPE_MASK) { + + case OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_INOUT: + call->guest_arg_hva->params[i].u.tmem.size = + call->shadow_arg->params[i].u.tmem.size; + continue; + case OPTEE_MSG_ATTR_TYPE_RMEM_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_RMEM_INOUT: + call->guest_arg_hva->params[i].u.rmem.size = + call->shadow_arg->params[i].u.rmem.size; + continue; + case OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_VALUE_INOUT: + call->guest_arg_hva->params[i].u.value.a = + call->shadow_arg->params[i].u.value.a; + call->guest_arg_hva->params[i].u.value.b = + call->shadow_arg->params[i].u.value.b; + call->guest_arg_hva->params[i].u.value.c = + call->shadow_arg->params[i].u.value.c; + continue; + case OPTEE_MSG_ATTR_TYPE_NONE: + case OPTEE_MSG_ATTR_TYPE_RMEM_INPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_INPUT: + continue; + + } + } +} + +static int optee_mediator_resolve_noncontig(struct optee_vm_context *vm_context, struct optee_msg_param *param) +{ + + int ret = 0; + + if (!param->u.tmem.buf_ptr) + goto out; + + struct kvm *kvm = vm_context->kvm; + + struct page_data *guest_buffer_gpa = (struct page_data *) param->u.tmem.buf_ptr; + struct page_data *guest_buffer_hva = (struct page_data *) optee_mediator_gpa_to_hva(kvm, (gpa_t) guest_buffer_gpa); + + if (!guest_buffer_hva) { + ret = -EINVAL; + goto out; + } + + u64 guest_buffer_size = param->u.tmem.size; + u64 guest_buffer_offset = param->u.tmem.buf_ptr & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1); + u64 num_entries = DIV_ROUND_UP(guest_buffer_size + guest_buffer_offset, OPTEE_MSG_NONCONTIG_PAGE_SIZE); + + mutex_lock(&vm_context->lock); + if (vm_context->shm_buf_page_count + num_entries > OPTEE_MAX_SHM_BUFFER_PAGES) { + ret = -ENOMEM; + mutex_unlock(&vm_context->lock); + goto out; + } + mutex_unlock(&vm_context->lock); + + u64 num_buffers = DIV_ROUND_UP(num_entries, OPTEE_BUFFER_ENTRIES); + + struct page_data **shadow_buffer_list = kzalloc(num_buffers * sizeof(struct page_data *), GFP_KERNEL); + + if (!shadow_buffer_list) { + ret = -ENOMEM; + goto out; + } + + gpa_t *guest_page_list = kzalloc(num_entries * sizeof(gpa_t), GFP_KERNEL); + + if (!guest_page_list) { + ret = -ENOMEM; + goto out_free_shadow_buffer_list; + } + + u32 guest_page_num = 0; + + for (int i = 0; i < num_buffers; i++) { + struct page_data *shadow_buffer = kzalloc(sizeof(struct page_data), GFP_KERNEL); + + if (!shadow_buffer) { + ret = -ENOMEM; + goto out_free_guest_page_list; + } + + for (int entry = 0; entry < MIN(num_entries, OPTEE_BUFFER_ENTRIES); entry++) { + gpa_t buffer_entry_gpa = guest_buffer_hva->pages_list[entry]; + + guest_page_list[guest_page_num++] = buffer_entry_gpa; + + phys_addr_t buffer_entry_phys = optee_mediator_gpa_to_phys(kvm, buffer_entry_gpa); + + shadow_buffer->pages_list[entry] = (u64) buffer_entry_phys; + } + + shadow_buffer_list[i] = shadow_buffer; + if (i > 0) + shadow_buffer_list[i-1]->next_page_data = (u64) virt_to_phys(shadow_buffer_list[i]); + + guest_buffer_hva = (struct page_data *) optee_mediator_gpa_to_hva(kvm, (gpa_t) guest_buffer_hva->next_page_data); + if (!guest_buffer_hva && (i != num_buffers - 1)) { + ret = -EINVAL; + goto out_free_guest_page_list; + } + + } + + struct optee_shm_buf *shm_buf = optee_mediator_new_shm_buf(); + + if (!shm_buf) { + ret = -ENOMEM; + goto out_free_guest_page_list; + } + + shm_buf->shadow_buffer_list = shadow_buffer_list; + shm_buf->num_buffers = num_buffers; + shm_buf->guest_page_list = guest_page_list; + shm_buf->num_pages = num_entries; + shm_buf->cookie = param->u.tmem.shm_ref; + + optee_mediator_enlist_shm_buf(vm_context, shm_buf); + + param->u.tmem.buf_ptr = (u64) virt_to_phys(shadow_buffer_list[0]) | guest_buffer_offset; + + return ret; + +out_free_guest_page_list: + kfree(guest_page_list); +out_free_shadow_buffer_list: + for (int i = 0; i < num_buffers; i++) + kfree(shadow_buffer_list[i]); + + kfree(shadow_buffer_list); +out: + return ret; +} + +static int optee_mediator_resolve_params(struct optee_vm_context *vm_context, struct optee_std_call *call) +{ + + int ret = 0; + + for (int i = 0; i < call->shadow_arg->num_params; i++) { + u32 attr = call->shadow_arg->params[i].attr; + + switch (attr & OPTEE_MSG_ATTR_TYPE_MASK) { + + case OPTEE_MSG_ATTR_TYPE_TMEM_INPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT: + case OPTEE_MSG_ATTR_TYPE_TMEM_INOUT: + if (attr & OPTEE_MSG_ATTR_NONCONTIG) { + ret = optee_mediator_resolve_noncontig(vm_context, call->shadow_arg->params + i); + + if (ret == -ENOMEM) { + call->shadow_arg->ret_origin = TEEC_ORIGIN_COMMS; + call->shadow_arg->ret = TEEC_ERROR_OUT_OF_MEMORY; + goto out; + } + if (ret == -EINVAL) { + call->shadow_arg->ret_origin = TEEC_ORIGIN_COMMS; + call->shadow_arg->ret = TEEC_ERROR_BAD_PARAMETERS; + goto out; + } + } else { + if (call->shadow_arg->params[i].u.tmem.buf_ptr) { + call->shadow_arg->ret_origin = TEEC_ORIGIN_COMMS; + call->shadow_arg->ret = TEEC_ERROR_BAD_PARAMETERS; + ret = -EINVAL; + goto out; + } + } + default: + continue; + + } + } +out: + + return ret; +} + + +static int optee_mediator_new_vmid(u64 *vmid_out) +{ + + int ret = 0; + + u64 vmid = atomic_read(&mediator->next_vmid); + + atomic_inc(&mediator->next_vmid); + + *vmid_out = vmid; + + return ret; +} + +static int optee_mediator_create_host(void) +{ + + int ret = 0; + + struct arm_smccc_res res; + + arm_smccc_smc(OPTEE_SMC_VM_CREATED, OPTEE_HOST_VMID, 0, 0, 0, 0, 0, 0, &res); + + if (res.a0 == OPTEE_SMC_RETURN_ENOTAVAIL) { + ret = -EBUSY; + goto out; + } + +out: + return ret; +} + +static int optee_mediator_destroy_host(void) +{ + + int ret = 0; + + struct arm_smccc_res res; + + arm_smccc_smc(OPTEE_SMC_VM_DESTROYED, OPTEE_HOST_VMID, 0, 0, 0, 0, 0, 0, &res); + + return ret; +} + +static int optee_mediator_create_vm(struct kvm *kvm) +{ + + int ret = 0; + struct arm_smccc_res res; + + if (!kvm) { + ret = -EINVAL; + goto out; + } + + struct optee_vm_context *vm_context = kzalloc(sizeof(*vm_context), GFP_KERNEL); + + if (!vm_context) { + ret = -ENOMEM; + goto out; + } + + ret = optee_mediator_new_vmid(&vm_context->vmid); + if (ret < 0) + goto out_context_free; + + INIT_LIST_HEAD(&vm_context->std_call_list); + INIT_LIST_HEAD(&vm_context->shm_buf_list); + INIT_LIST_HEAD(&vm_context->shm_rpc_list); + + mutex_init(&vm_context->lock); + + vm_context->kvm = kvm; + + arm_smccc_smc(OPTEE_SMC_VM_CREATED, vm_context->vmid, 0, 0, 0, 0, 0, 0, &res); + + if (res.a0 == OPTEE_SMC_RETURN_ENOTAVAIL) { + ret = -EBUSY; + goto out_context_free; + } + + optee_mediator_add_vm_context(vm_context); + +out: + return ret; +out_context_free: + kfree(vm_context); + return ret; +} + +static int optee_mediator_destroy_vm(struct kvm *kvm) +{ + + int ret = 0; + struct arm_smccc_res res; + + if (!kvm) { + ret = -EINVAL; + goto out; + } + + struct optee_vm_context *vm_context = optee_mediator_find_vm_context(kvm); + + if (!vm_context) { + ret = -EINVAL; + goto out; + } + + arm_smccc_smc(OPTEE_SMC_VM_DESTROYED, vm_context->vmid, 0, 0, 0, 0, 0, 0, &res); + + optee_mediator_delete_vm_context(vm_context); + +out: + return ret; +} + +static void optee_mediator_handle_fast_call(struct kvm_vcpu *vcpu, struct guest_regs *regs) +{ + + struct arm_smccc_res res; + struct kvm *kvm = vcpu->kvm; + + struct optee_vm_context *vm_context = optee_mediator_find_vm_context(kvm); + + if (!vm_context) { + res.a0 = OPTEE_SMC_RETURN_ENOTAVAIL; + goto out; + } + + regs->a7 = vm_context->vmid; + + optee_mediator_smccc_smc(regs, &res); + + switch (ARM_SMCCC_FUNC_NUM(regs->a0)) { + + case OPTEE_SMC_FUNCID_GET_THREAD_COUNT: + optee_thread_limit = 0; + if (res.a0 != OPTEE_SMC_RETURN_UNKNOWN_FUNCTION) + optee_thread_limit = res.a1; + break; + + case OPTEE_SMC_FUNCID_DISABLE_SHM_CACHE: + if (res.a0 == OPTEE_SMC_RETURN_OK) { + u64 cookie = (u64) reg_pair_to_ptr(res.a1, res.a2); + + optee_mediator_free_shm_buf(vm_context, cookie); + } + break; + + default: + break; + + } + + copy_smccc_res_to_vcpu(vcpu, &res); +out: + return; +} + +static int optee_mediator_handle_rpc_return(struct optee_vm_context *vm_context, + struct optee_std_call *call, + struct guest_regs *regs, + struct arm_smccc_res *res) +{ + + int ret = 0; + + call->rpc_state.a0 = res->a0; + call->rpc_state.a1 = res->a1; + call->rpc_state.a2 = res->a2; + call->rpc_state.a3 = res->a3; + + call->rpc_func = OPTEE_SMC_RETURN_GET_RPC_FUNC(res->a0); + call->thread_id = res->a3; + + if (call->rpc_func == OPTEE_SMC_RPC_FUNC_FREE) { + u64 cookie = (u64) reg_pair_to_ptr(res->a1, res->a2); + + optee_mediator_free_shm_rpc(vm_context, cookie); + } + + if (call->rpc_func == OPTEE_SMC_RPC_FUNC_CMD) { + u64 cookie = (u64) reg_pair_to_ptr(res->a1, res->a2); + struct optee_shm_rpc *shm_rpc = optee_mediator_find_shm_rpc(vm_context, cookie); + + if (!shm_rpc) { + ret = -ERESTART; + goto out; + } + if (shm_rpc->rpc_arg_hva->cmd == OPTEE_RPC_CMD_SHM_FREE) + optee_mediator_free_shm_buf(vm_context, shm_rpc->rpc_arg_hva->params[0].u.value.b); + } + +out: + return ret; +} + +static void optee_mediator_do_call_with_arg(struct optee_vm_context *vm_context, + struct optee_std_call *call, + struct guest_regs *regs, + struct arm_smccc_res *res) +{ + + regs->a7 = vm_context->vmid; + + optee_mediator_smccc_smc(regs, res); + + if (OPTEE_SMC_RETURN_IS_RPC(res->a0)) { + while (optee_mediator_handle_rpc_return(vm_context, call, regs, res) == -ERESTART) { + optee_mediator_smccc_smc(regs, res); + if (!OPTEE_SMC_RETURN_IS_RPC(res->a0)) + break; + } + } else { + + u32 cmd = call->shadow_arg->cmd; + u32 call_ret = call->shadow_arg->ret; + + switch (cmd) { + + case OPTEE_MSG_CMD_REGISTER_SHM: + if (call_ret == 0) + optee_mediator_free_shm_buf_page_list(vm_context, (u64) call->shadow_arg->params[0].u.tmem.shm_ref); + else + optee_mediator_free_shm_buf(vm_context, (u64) call->shadow_arg->params[0].u.tmem.shm_ref); + + break; + + case OPTEE_MSG_CMD_UNREGISTER_SHM: + if (call_ret == 0) + optee_mediator_free_shm_buf(vm_context, (u64) call->shadow_arg->params[0].u.rmem.shm_ref); + break; + + default: + optee_mediator_free_all_buffers(vm_context, call); + break; + + } + } +} + +static void optee_mediator_handle_std_call(struct kvm_vcpu *vcpu, struct guest_regs *regs) +{ + + struct arm_smccc_res res; + struct kvm *kvm = vcpu->kvm; + int ret; + + struct optee_vm_context *vm_context = optee_mediator_find_vm_context(kvm); + + if (!vm_context) { + res.a0 = OPTEE_SMC_RETURN_ENOTAVAIL; + goto out_copy; + } + + struct optee_std_call *call = optee_mediator_new_std_call(); + + if (!call) { + res.a0 = OPTEE_SMC_RETURN_ENOMEM; + goto out_copy; + } + + call->thread_id = 0xffffffff; + call->guest_arg_gpa = (struct optee_msg_arg *) reg_pair_to_ptr(regs->a1, regs->a2); + call->guest_arg_hva = (struct optee_msg_arg *) optee_mediator_gpa_to_hva(kvm, (gpa_t) call->guest_arg_gpa); + + if (!call->guest_arg_hva) { + res.a0 = OPTEE_SMC_RETURN_EBADADDR; + goto out_call_free; + } + + mutex_lock(&vm_context->lock); + + if (vm_context->call_count >= optee_thread_limit) { + res.a0 = OPTEE_SMC_RETURN_ETHREAD_LIMIT; + mutex_unlock(&vm_context->lock); + goto out_call_free; + } + + mutex_unlock(&vm_context->lock); + + INIT_LIST_HEAD(&call->list); + + ret = optee_mediator_shadow_msg_arg(kvm, call); + if (ret) { + res.a0 = OPTEE_SMC_RETURN_EBADADDR; + goto out_call_free; + } + + optee_mediator_enlist_std_call(vm_context, call); + + if (OPTEE_MSG_GET_ARG_SIZE(call->shadow_arg->num_params) > OPTEE_MSG_NONCONTIG_PAGE_SIZE) { + call->shadow_arg->ret = TEEC_ERROR_BAD_PARAMETERS; + call->shadow_arg->ret_origin = TEEC_ORIGIN_COMMS; + call->shadow_arg->num_params = 0; + + optee_mediator_shadow_arg_sync(call); + goto out_delist_std_call; + } + + + u32 cmd = call->shadow_arg->cmd; + + + switch (cmd) { + + case OPTEE_MSG_CMD_OPEN_SESSION: + case OPTEE_MSG_CMD_CLOSE_SESSION: + case OPTEE_MSG_CMD_INVOKE_COMMAND: + case OPTEE_MSG_CMD_CANCEL: + case OPTEE_MSG_CMD_REGISTER_SHM: + case OPTEE_MSG_CMD_UNREGISTER_SHM: + ret = optee_mediator_resolve_params(vm_context, call); + if (ret) { + res.a0 = OPTEE_SMC_RETURN_OK; + optee_mediator_shadow_arg_sync(call); + goto out_delist_std_call; + } + break; + default: + res.a0 = OPTEE_SMC_RETURN_EBADCMD; + goto out_delist_std_call; + } + + reg_pair_from_64(®s->a1, ®s->a2, (u64) virt_to_phys(call->shadow_arg)); + regs->a3 = OPTEE_SMC_SHM_CACHED; + + optee_mediator_do_call_with_arg(vm_context, call, regs, &res); + optee_mediator_shadow_arg_sync(call); + + if (OPTEE_SMC_RETURN_IS_RPC(res.a0)) + goto out_copy; + +out_delist_std_call: + optee_mediator_delist_std_call(vm_context, call); +out_call_free: + optee_mediator_del_std_call(call); +out_copy: + copy_smccc_res_to_vcpu(vcpu, &res); +} + +static void optee_mediator_handle_rpc_alloc(struct optee_vm_context *vm_context, struct guest_regs *regs) +{ + + u64 ptr = (u64) reg_pair_to_ptr(regs->a1, regs->a2); + u64 cookie = (u64) reg_pair_to_ptr(regs->a4, regs->a5); + + struct optee_shm_rpc *shm_rpc = optee_mediator_new_shm_rpc(); + + if (!shm_rpc) + goto out_err; + + struct optee_shm_rpc *temp_shm_rpc = optee_mediator_find_shm_rpc(vm_context, cookie); + + if (temp_shm_rpc) { // guest is trying to reuse cookie + goto out_err; + } + + shm_rpc->rpc_arg_gpa = (struct optee_msg_arg *) ptr; + shm_rpc->rpc_arg_hva = (struct optee_msg_arg *) optee_mediator_gpa_to_hva(vm_context->kvm, (gpa_t) shm_rpc->rpc_arg_gpa); + + if (!shm_rpc->rpc_arg_hva) { + ptr = 0; + goto out_err_free_rpc; + } + + shm_rpc->cookie = cookie; + + optee_mediator_enlist_shm_rpc(vm_context, shm_rpc); + + ptr = optee_mediator_gpa_to_phys(vm_context->kvm, (gpa_t) shm_rpc->rpc_arg_gpa); + + reg_pair_from_64(®s->a1, ®s->a2, ptr); + return; + +out_err_free_rpc: + kfree(shm_rpc); +out_err: + reg_pair_from_64(®s->a1, ®s->a2, 0); +} + +static int optee_mediator_handle_rpc_cmd(struct optee_vm_context *vm_context, struct guest_regs *regs) +{ + int ret = 0; + u64 cookie = (u64) reg_pair_to_ptr(regs->a1, regs->a2); + + struct optee_shm_rpc *shm_rpc = optee_mediator_find_shm_rpc(vm_context, cookie); + + if (!shm_rpc) { + ret = -EINVAL; + goto out; + } + + if (OPTEE_MSG_GET_ARG_SIZE(shm_rpc->rpc_arg_hva->num_params) > OPTEE_MSG_NONCONTIG_PAGE_SIZE) { + shm_rpc->rpc_arg_hva->ret = TEEC_ERROR_BAD_PARAMETERS; + goto out; + } + + switch (shm_rpc->rpc_arg_hva->cmd) { + case OPTEE_RPC_CMD_SHM_ALLOC: + ret = optee_mediator_resolve_noncontig(vm_context, shm_rpc->rpc_arg_hva->params + 0); + break; + + case OPTEE_RPC_CMD_SHM_FREE: + optee_mediator_free_shm_buf(vm_context, shm_rpc->rpc_arg_hva->params[0].u.value.b); + break; + } + +out: + return ret; +} + +static void optee_mediator_handle_rpc_call(struct kvm_vcpu *vcpu, struct guest_regs *regs) +{ + + int ret = 0; + struct arm_smccc_res res; + struct optee_std_call *call; + u32 thread_id = regs->a3; + + struct optee_vm_context *vm_context = optee_mediator_find_vm_context(vcpu->kvm); + + if (!vm_context) { + res.a0 = OPTEE_SMC_RETURN_ENOTAVAIL; + goto out_copy; + } + + call = optee_mediator_find_std_call(vm_context, thread_id); + if (!call) { + res.a0 = OPTEE_SMC_RETURN_ERESUME; + goto out_copy; + } + + + call->thread_id = 0xffffffff; + + switch (call->rpc_func) { + + case OPTEE_SMC_RPC_FUNC_ALLOC: + optee_mediator_handle_rpc_alloc(vm_context, regs); + break; + case OPTEE_SMC_RPC_FUNC_FOREIGN_INTR: + break; + case OPTEE_SMC_RPC_FUNC_CMD: + ret = optee_mediator_handle_rpc_cmd(vm_context, regs); + + if (ret < 0) + goto out; + break; + } + + + + optee_mediator_do_call_with_arg(vm_context, call, regs, &res); + + optee_mediator_shadow_arg_sync(call); + + if (OPTEE_SMC_RETURN_IS_RPC(res.a0) || res.a0 == OPTEE_SMC_RETURN_ERESUME) + goto out_copy; + + optee_mediator_delist_std_call(vm_context, call); + optee_mediator_del_std_call(call); +out_copy: + copy_smccc_res_to_vcpu(vcpu, &res); +out: + return; +} + +static void optee_mediator_handle_exchange_cap(struct kvm_vcpu *vcpu, struct guest_regs *regs) +{ + + struct arm_smccc_res res; + struct kvm *kvm = vcpu->kvm; + + struct optee_vm_context *vm_context = optee_mediator_find_vm_context(kvm); + + if (!vm_context) { + res.a0 = OPTEE_SMC_RETURN_ENOTAVAIL; + goto out_copy; + } + + regs->a1 &= OPTEE_KNOWN_NSEC_CAPS; + regs->a7 = vm_context->vmid; + + optee_mediator_smccc_smc(regs, &res); + if (res.a0 != OPTEE_SMC_RETURN_OK) + goto out_copy; + + res.a1 &= OPTEE_KNOWN_SEC_CAPS; + res.a1 &= ~OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM; + + if (!(res.a1 & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)) { + res.a0 = OPTEE_SMC_RETURN_ENOTAVAIL; + goto out_copy; + } + +out_copy: + copy_smccc_res_to_vcpu(vcpu, &res); +} + +static void optee_mediator_forward_smc(struct kvm_vcpu *vcpu) +{ + + if (!vcpu) + goto out; + + struct guest_regs regs; + + copy_regs_from_vcpu(vcpu, ®s); + + switch (ARM_SMCCC_FUNC_NUM(regs.a0)) { + + case OPTEE_SMC_FUNCID_CALLS_COUNT: + case OPTEE_SMC_FUNCID_CALLS_UID: + case OPTEE_SMC_FUNCID_CALLS_REVISION: + case OPTEE_SMC_FUNCID_GET_OS_UUID: + case OPTEE_SMC_FUNCID_GET_OS_REVISION: + case OPTEE_SMC_FUNCID_GET_THREAD_COUNT: + case OPTEE_SMC_FUNCID_ENABLE_ASYNC_NOTIF: + case OPTEE_SMC_FUNCID_ENABLE_SHM_CACHE: + case OPTEE_SMC_FUNCID_GET_ASYNC_NOTIF_VALUE: + case OPTEE_SMC_FUNCID_DISABLE_SHM_CACHE: + optee_mediator_handle_fast_call(vcpu, ®s); + break; + + case OPTEE_SMC_FUNCID_EXCHANGE_CAPABILITIES: + optee_mediator_handle_exchange_cap(vcpu, ®s); + break; + + case OPTEE_SMC_FUNCID_CALL_WITH_ARG: + optee_mediator_handle_std_call(vcpu, ®s); + break; + + case OPTEE_SMC_FUNCID_RETURN_FROM_RPC: + optee_mediator_handle_rpc_call(vcpu, ®s); + break; + + default: + vcpu_set_reg(vcpu, 0, OPTEE_SMC_RETURN_UNKNOWN_FUNCTION); + break; + } + +out: + return; +} + +static int optee_mediator_is_active(void) +{ + + int ret = 1; + + spin_lock(&mediator_lock); + + if (!mediator) + ret = 0; + + spin_unlock(&mediator_lock); + + return ret; +} + +struct tee_mediator_ops optee_mediator_ops = { + .create_host = optee_mediator_create_host, + .destroy_host = optee_mediator_destroy_host, + .create_vm = optee_mediator_create_vm, + .destroy_vm = optee_mediator_destroy_vm, + .forward_request = optee_mediator_forward_smc, + .is_active = optee_mediator_is_active, +}; + +static int optee_check_virtualization(void) +{ + + int ret = 0; + + struct arm_smccc_res res; + + arm_smccc_smc(OPTEE_SMC_VM_DESTROYED, 0, 0, 0, 0, 0, 0, 0, &res); + + if (res.a0 == OPTEE_SMC_RETURN_UNKNOWN_FUNCTION) { + ret = -ENODEV; + goto out; + } + +out: + return ret; +} + +static int optee_check_page_size(void) +{ + if (OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE) + return -EINVAL; + + return 0; +} + +static int __init optee_mediator_init(void) +{ + + int ret; + + ret = optee_check_virtualization(); + if (ret < 0) { + pr_info("optee virtualization unsupported\n"); + goto out; + } + + ret = optee_check_page_size(); + if (ret < 0) { + pr_info("optee noncontig page size too large"); + goto out; + } + + mediator = kzalloc(sizeof(*mediator), GFP_KERNEL); + if (!mediator) { + ret = -ENOMEM; + goto out; + } + + ret = tee_mediator_register_ops(&optee_mediator_ops); + if (ret < 0) + goto out_free; + + atomic_set(&mediator->next_vmid, 2); // VMID 0 is reserved for the hypervisor and 1 is for host. + + INIT_LIST_HEAD(&mediator->vm_list); + mutex_init(&mediator->vm_list_lock); + spin_lock_init(&mediator_lock); + + pr_info("mediator initialised\n"); + +out: + return ret; +out_free: + kfree(mediator); + return ret; +} +module_init(optee_mediator_init); + +static void __exit optee_mediator_exit(void) +{ + + struct optee_vm_context *vm_context; + struct optee_vm_context *tmp; + + list_for_each_entry_safe(vm_context, tmp, &mediator->vm_list, list) { + list_del(&vm_context->list); + kfree(vm_context); + } + + kfree(mediator); + + pr_info("mediator exiting\n"); + +} +module_exit(optee_mediator_exit); diff --git a/drivers/tee/optee/optee_mediator.h b/drivers/tee/optee/optee_mediator.h new file mode 100644 index 000000000000..d632ed437aa6 --- /dev/null +++ b/drivers/tee/optee/optee_mediator.h @@ -0,0 +1,103 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * OP-TEE Mediator for the Linux Kernel + * + * This module enables a KVM guest to interact with OP-TEE + * in the secure world by hooking event handlers with + * the TEE Mediator layer. + * + * Author: + * Yuvraj Sakshith yuvraj.kernel@gmail.com + */ + +#ifndef __OPTEE_MEDIATOR_H +#define __OPTEE_MEDIATOR_H + +#include "optee_msg.h" + +#include <linux/types.h> +#include <linux/mm_types.h> +#include <linux/kvm_types.h> +#include <linux/list.h> +#include <linux/mutex.h> + +#define OPTEE_HYP_CLIENT_ID 0 +#define OPTEE_HOST_VMID 1 +#define OPTEE_BUFFER_ENTRIES ((OPTEE_MSG_NONCONTIG_PAGE_SIZE / sizeof(u64)) - 1) +#define OPTEE_MAX_SHM_BUFFER_PAGES 512 + +struct optee_mediator { + struct list_head vm_list; + struct mutex vm_list_lock; + + atomic_t next_vmid; +}; + +struct optee_vm_context { + struct list_head list; + struct list_head std_call_list; + struct list_head shm_buf_list; + struct list_head shm_rpc_list; + + struct mutex lock; + + struct kvm *kvm; + u64 vmid; + u32 call_count; + u64 shm_buf_page_count; +}; + +struct guest_regs { + u32 a0; + u32 a1; + u32 a2; + u32 a3; + u32 a4; + u32 a5; + u32 a6; + u32 a7; +}; + +struct optee_std_call { + struct list_head list; + + struct optee_msg_arg *guest_arg_gpa; + struct optee_msg_arg *guest_arg_hva; + struct optee_msg_arg *shadow_arg; + + u32 thread_id; + + u32 rpc_func; + u64 rpc_buffer_type; + + struct guest_regs rpc_state; +}; + +struct page_data { + u64 pages_list[OPTEE_BUFFER_ENTRIES]; + u64 next_page_data; +}; + +struct optee_shm_buf { + struct list_head list; + + struct page_data **shadow_buffer_list; + u64 num_buffers; + + gpa_t *guest_page_list; + u64 num_pages; + + u64 cookie; +}; + +struct optee_shm_rpc { + struct list_head list; + + struct optee_msg_arg *rpc_arg_gpa; + struct optee_msg_arg *rpc_arg_hva; + + u64 cookie; +}; + + +#endif
When host initializes or releases its OP-TEE driver through optee_core_init()/optee_core_exit(), notify OP-TEE in the secure world about this change.
If OP-TEE is built with NS-Virtualization support, it will treat SMCs coming from the host as if it were coming from a VM (as OP-TEE does not understand the KVM paradigm).
Hence, OPTEE_SMC_VM_CREATED and OPTEE_SMC_VM_DESTROYED SMCs have to be made for its internal book-keeping.
Signed-off-by: Yuvraj Sakshith yuvraj.kernel@gmail.com --- drivers/tee/optee/core.c | 13 ++++++++++++- drivers/tee/optee/smc_abi.c | 6 ++++++ 2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c index c75fddc83576..5f2ab0ee0893 100644 --- a/drivers/tee/optee/core.c +++ b/drivers/tee/optee/core.c @@ -14,6 +14,7 @@ #include <linux/slab.h> #include <linux/string.h> #include <linux/tee_core.h> +#include <linux/tee_mediator.h> #include <linux/types.h> #include "optee_private.h"
@@ -195,7 +196,13 @@ static bool intf_is_regged; static int __init optee_core_init(void) { int rc; - +#ifdef CONFIG_TEE_MEDIATOR + if (tee_mediator_is_active()) { + rc = tee_mediator_create_host(); + if (rc < 0) + return rc; + } +#endif /* * The kernel may have crashed at the same time that all available * secure world threads were suspended and we cannot reschedule the @@ -240,6 +247,10 @@ static void __exit optee_core_exit(void) optee_smc_abi_unregister(); if (!ffa_abi_rc) optee_ffa_abi_unregister(); +#ifdef CONFIG_TEE_MEDIATOR + if (tee_mediator_is_active()) + tee_mediator_destroy_host(); +#endif } module_exit(optee_core_exit);
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c index f0c3ac1103bb..a930ca8cde23 100644 --- a/drivers/tee/optee/smc_abi.c +++ b/drivers/tee/optee/smc_abi.c @@ -25,8 +25,10 @@ #include <linux/slab.h> #include <linux/string.h> #include <linux/tee_core.h> +#include <linux/tee_mediator.h> #include <linux/types.h> #include <linux/workqueue.h> +#include "optee_mediator.h" #include "optee_private.h" #include "optee_smc.h" #include "optee_rpc_cmd.h" @@ -1396,6 +1398,10 @@ static void optee_smccc_smc(unsigned long a0, unsigned long a1, unsigned long a6, unsigned long a7, struct arm_smccc_res *res) { +#ifdef CONFIG_TEE_MEDIATOR + if (tee_mediator_is_active()) + a7 = OPTEE_HOST_VMID; +#endif arm_smccc_smc(a0, a1, a2, a3, a4, a5, a6, a7, res); }
On Tue, 01 Apr 2025 18:05:20 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided by the architecture (such as, SMC) which switch control to the firmware, are trapped in EL2 when the guest is executes them.
This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE.
Overview
Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction", after loading in arguments into CPU registers. What these arguments consists of and how both these entities communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting the HCR_EL2.TSC bit before entering the guest.
Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE. Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to OP-TEE.
OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor:
- Notify OP-TEE when a VM is created.
- Notify OP-TEE when a VM is destroyed.
- Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0.
- Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest.
- Memory shared by the VM to OP-TEE has to remain pinned.
- The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE.
Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is a hypervisor with a mediator in normal world.
This implementation has been heavily inspired by Xen's OP-TEE mediator.
[...]
And I think this inspiration is the source of most of the problems in this series.
Routing Secure Calls from the guest to whatever is on the secure side should not be the kernel's job at all. It should be the VMM's job. All you need to do is to route the SMCs from the guest to userspace, and we already have all the required infrastructure for that.
It is the VMM that should:
- signal the TEE of VM creation/teardown
- translate between IPAs and host VAs without involving KVM
- let the host TEE driver translate between VAs and PAs and deal with the pinning as required, just like it would do for any userspace (without ever using the KVM memslot interface)
- proxy requests from the guest to the TEE
- in general, bear the complexity of anything related to the TEE
In short, the VMM is just another piece of userspace using the TEE to do whatever it wants. The TEE driver on the host must obviously know about VMs, but that's about it.
Crucially, KVM should:
- be completely TEE agnostic and never call into something that is TEE-specific
- allow a TEE implementation entirely in userspace, specially for the machines that do not have EL3
As it stands, your design looks completely upside-down. Most of this code should be userspace code and live in (or close to) the VMM, with the host kernel only providing the basic primitives, most of which should already be there.
Thanks,
M.
On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
On Tue, 01 Apr 2025 18:05:20 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
A KVM guest running on an arm64 machine will not be able to interact with a trusted execution environment (which supports non-secure guests) like OP-TEE in the secure world. This is because, instructions provided by the architecture (such as, SMC) which switch control to the firmware, are trapped in EL2 when the guest is executes them.
This series adds a feature into the kernel called the TEE mediator abstraction layer, which lets a guest interact with the secure world. Additionally, a OP-TEE specific mediator is also implemented, which hooks itself to the TEE mediator layer and intercepts guest SMCs targetted at OP-TEE.
Overview
Essentially, if the kernel wants to interact with OP-TEE, it makes an "smc - secure monitor call instruction", after loading in arguments into CPU registers. What these arguments consists of and how both these entities communicate can vary. If a guest wants to establish a connection with the secure world, its not possible. This is because of the fact that "smc" by the guest are trapped by the hypervisor in EL2. This is done by setting the HCR_EL2.TSC bit before entering the guest.
Hence, this feature which I we may call TEE mediator, acts as an intermediary between the guest and OP-TEE. Instead of denying the guest SMC and jumping back into the guest, the mediator forwards the request to OP-TEE.
OP-TEE supports virtualization in the normal world and expects 6 things from the NS-hypervisor:
- Notify OP-TEE when a VM is created.
- Notify OP-TEE when a VM is destroyed.
- Any SMC to OP-TEE has to contain the VMID in x7. If its the hypervisor sending, then VMID is 0.
- Hypervisor has to perform IPA->PA translations of the memory addresses sent by guest.
- Memory shared by the VM to OP-TEE has to remain pinned.
- The hypervisor has to follow the OP-TEE protocol, so the guest thinks it is directly speaking to OP-TEE.
Its important to note that, if OP-TEE is built with NS-virtualization support, it can only function if there is a hypervisor with a mediator in normal world.
This implementation has been heavily inspired by Xen's OP-TEE mediator.
[...]
And I think this inspiration is the source of most of the problems in this series.
Routing Secure Calls from the guest to whatever is on the secure side should not be the kernel's job at all. It should be the VMM's job. All you need to do is to route the SMCs from the guest to userspace, and we already have all the required infrastructure for that.
Yes, this was an argument at the time of designing this solution.
It is the VMM that should:
signal the TEE of VM creation/teardown
translate between IPAs and host VAs without involving KVM
let the host TEE driver translate between VAs and PAs and deal with the pinning as required, just like it would do for any userspace (without ever using the KVM memslot interface)
proxy requests from the guest to the TEE
in general, bear the complexity of anything related to the TEE
Major reason why I went with placing the implementation inside the kernel is, - OP-TEE userspace lib (client) does not support sending SMCs for VM events and needs modification. - QEMU (or every other VMM) will have to be modified. - OP-TEE driver is anyways in the kernel. A mediator will just be an addition and not a completely new entity. - (Potential) issues if we would want to mediate requests from VM which has private mem. - Heavy VM exits if guest makes frequent TOS calls.
Hence, the thought of making changes to too many entities (libteec, VMM, etc.) was a strong reason, although arguable.
In short, the VMM is just another piece of userspace using the TEE to do whatever it wants. The TEE driver on the host must obviously know about VMs, but that's about it.
Crucially, KVM should:
be completely TEE agnostic and never call into something that is TEE-specific
allow a TEE implementation entirely in userspace, specially for the machines that do not have EL3
Yes, you're right. Although I believe there still are some changes that need to be made to KVM for facilitating this. For example, kvm_smccc_get_action() would deny TOS call.
So, having an implementation completely in VMM without any change in KVM might be challenging, any potential solutions are welcome.
As it stands, your design looks completely upside-down. Most of this code should be userspace code and live in (or close to) the VMM, with the host kernel only providing the basic primitives, most of which should already be there.
Thanks,
M.
-- Jazz isn't dead. It just smells funny.
On Wed, 02 Apr 2025 03:58:48 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
On Tue, 01 Apr 2025 18:05:20 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
[...]
This implementation has been heavily inspired by Xen's OP-TEE mediator.
[...]
And I think this inspiration is the source of most of the problems in this series.
Routing Secure Calls from the guest to whatever is on the secure side should not be the kernel's job at all. It should be the VMM's job. All you need to do is to route the SMCs from the guest to userspace, and we already have all the required infrastructure for that.
Yes, this was an argument at the time of designing this solution.
It is the VMM that should:
signal the TEE of VM creation/teardown
translate between IPAs and host VAs without involving KVM
let the host TEE driver translate between VAs and PAs and deal with the pinning as required, just like it would do for any userspace (without ever using the KVM memslot interface)
proxy requests from the guest to the TEE
in general, bear the complexity of anything related to the TEE
Major reason why I went with placing the implementation inside the kernel is,
- OP-TEE userspace lib (client) does not support sending SMCs for VM events and needs modification.
- QEMU (or every other VMM) will have to be modified.
Sure. And what? New feature, new API, new code. And what will happen once someone wants to use something other than OP-TEE? Or one of the many forks of OP-TEE that have a completely different ABI (cue the Android forks -- yes, plural)?
- OP-TEE driver is anyways in the kernel. A mediator will just be an addition and not a completely new entity.
Of course not. The TEE can be anywhere I want. On another machine if I decide so. Just because OP-TEE has a very simplistic model doesn't mean we have to be constrained by it.
- (Potential) issues if we would want to mediate requests from VM which has private mem.
Private memory means that not even the host has access to it, as it is the case with pKVM. How would that be an issue?
- Heavy VM exits if guest makes frequent TOS calls.
Sorry, I have to completely dismiss the argument here. I'm not even remotely considering performance for something that is essentially a full context switch of the whole machine. By definition, calling into EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and an additional exit to userspace will hardly register for anything other than a pointless latency benchmark.
Hence, the thought of making changes to too many entities (libteec, VMM, etc.) was a strong reason, although arguable.
It is a *terrible* reason. By this reasoning, we would have subsumed the whole VMM into the kernel (just like Xen), because "we don't want to change userspace".
Furthermore, you are not even considering basic things such as permissions. Your approach completely circumvents any form of access control, meaning that if any user that can create a VM can talk to the TEE, even if they don't have access to the TEE driver.
Yes, you could replicate access permission, SE-Linux, seccomp (and the rest of the security theater) at the KVM/TEE boundary, making the whole thing even more of a twisted mess.
Or you could simply do the right thing and let the kernel do its job the way it was intended by using the syscall interface from userspace.
In short, the VMM is just another piece of userspace using the TEE to do whatever it wants. The TEE driver on the host must obviously know about VMs, but that's about it.
Crucially, KVM should:
be completely TEE agnostic and never call into something that is TEE-specific
allow a TEE implementation entirely in userspace, specially for the machines that do not have EL3
Yes, you're right. Although I believe there still are some changes that need to be made to KVM for facilitating this. For example, kvm_smccc_get_action() would deny TOS call.
If something is missing in KVM to allow routing of SMCs to userspace, I'm more than happy to entertain the change.
So, having an implementation completely in VMM without any change in KVM might be challenging, any potential solutions are welcome.
I've said what I have to say already, and pointed you in a direction that I see as both correct and maintainable.
Thanks,
M.
On Wed, Apr 02, 2025 at 09:42:39AM +0100, Marc Zyngier wrote:
On Wed, 02 Apr 2025 03:58:48 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
On Tue, Apr 01, 2025 at 07:13:26PM +0100, Marc Zyngier wrote:
On Tue, 01 Apr 2025 18:05:20 +0100, Yuvraj Sakshith yuvraj.kernel@gmail.com wrote:
[...]
This implementation has been heavily inspired by Xen's OP-TEE mediator.
[...]
And I think this inspiration is the source of most of the problems in this series.
Routing Secure Calls from the guest to whatever is on the secure side should not be the kernel's job at all. It should be the VMM's job. All you need to do is to route the SMCs from the guest to userspace, and we already have all the required infrastructure for that.
Yes, this was an argument at the time of designing this solution.
It is the VMM that should:
signal the TEE of VM creation/teardown
translate between IPAs and host VAs without involving KVM
let the host TEE driver translate between VAs and PAs and deal with the pinning as required, just like it would do for any userspace (without ever using the KVM memslot interface)
proxy requests from the guest to the TEE
in general, bear the complexity of anything related to the TEE
Major reason why I went with placing the implementation inside the kernel is,
- OP-TEE userspace lib (client) does not support sending SMCs for VM events and needs modification.
- QEMU (or every other VMM) will have to be modified.
Sure. And what? New feature, new API, new code. And what will happen once someone wants to use something other than OP-TEE? Or one of the many forks of OP-TEE that have a completely different ABI (cue the Android forks -- yes, plural)?
If something other than OP-TEE has to be supported, a specific mediator (such as drivers/tee/optee/optee_mediator.c) has to be constructed with handlers hooked via tee_mediator_register_ops().
But yes, the ABI might change and the implementor has the freedom to mediate it as required.
- OP-TEE driver is anyways in the kernel. A mediator will just be an addition and not a completely new entity.
Of course not. The TEE can be anywhere I want. On another machine if I decide so. Just because OP-TEE has a very simplistic model doesn't mean we have to be constrained by it.
- (Potential) issues if we would want to mediate requests from VM which has private mem.
Private memory means that not even the host has access to it, as it is the case with pKVM. How would that be an issue?
Guest shares memory to OP-TEE through a buffer filled with pointers, which the mediator has to read for IPA->PA translations of all these pointers. VMM wont be able to read these if memory is private.
But, this is a "potential" solution and if at all the mediator is moved to VMM, this is completely ruled out.
- Heavy VM exits if guest makes frequent TOS calls.
Sorry, I have to completely dismiss the argument here. I'm not even remotely considering performance for something that is essentially a full context switch of the whole machine. By definition, calling into EL3, and then S-EL1/S-EL2 is going to be as fast as a dying snail, and an additional exit to userspace will hardly register for anything other than a pointless latency benchmark.
Okay, makes sense.
Hence, the thought of making changes to too many entities (libteec, VMM, etc.) was a strong reason, although arguable.
It is a *terrible* reason. By this reasoning, we would have subsumed the whole VMM into the kernel (just like Xen), because "we don't want to change userspace".
Furthermore, you are not even considering basic things such as permissions. Your approach completely circumvents any form of access control, meaning that if any user that can create a VM can talk to the TEE, even if they don't have access to the TEE driver.
Well, this is a good point. OP-TEE built for NS-Virt supports handles calls from different VMs under different MMU partitions (will need to go off track to explain this). But, each VM's state and data remains isolated internally in S-EL1.
Yes, you could replicate access permission, SE-Linux, seccomp (and the rest of the security theater) at the KVM/TEE boundary, making the whole thing even more of a twisted mess.
Or you could simply do the right thing and let the kernel do its job the way it was intended by using the syscall interface from userspace.
In short, the VMM is just another piece of userspace using the TEE to do whatever it wants. The TEE driver on the host must obviously know about VMs, but that's about it.
Crucially, KVM should:
be completely TEE agnostic and never call into something that is TEE-specific
allow a TEE implementation entirely in userspace, specially for the machines that do not have EL3
Yes, you're right. Although I believe there still are some changes that need to be made to KVM for facilitating this. For example, kvm_smccc_get_action() would deny TOS call.
If something is missing in KVM to allow routing of SMCs to userspace, I'm more than happy to entertain the change.
Okay.
So, having an implementation completely in VMM without any change in KVM might be challenging, any potential solutions are welcome.
I've said what I have to say already, and pointed you in a direction that I see as both correct and maintainable.
Yes, I get your point on placing mediator in VMM. And now that I think of it, I believe I can make an improvement.
But yes, since too many entities are involved, the design of this solution has been a nightmare. Good to have been pushed this way.
Thanks,
M.
-- Jazz isn't dead. It just smells funny.
Thanks, Yuvraj Sakshith
op-tee@lists.trustedfirmware.org