Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also a use-case parameter. This is used by the backend TEE driver to decide on allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) has been identified so far to serve as examples of what can be expected. More use-cases can be added in userspace ABI, but it's up to the backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases requires isolation from different parts of the system. A restricted memory pool can be based on a static carveout instantiated while probing the TEE backend driver, or dynamically allocated from CMA and made restricted as needed by the TEE.
This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v4 repo sync -j8 cd build make toolchains -j$(nproc) make SPMC_AT_EL=1 all -j$(nproc) make SPMC_AT_EL=1 run-only # login and at the prompt: xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test without FF-A using the original SMC ABI instead. Please remember to do %rm -rf ../trusted-firmware-a/build/qemu for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. There are also some negative tests for out of bounds buffers etc.
Thanks, Jens
Changes since V3: * Make the use_case and flags field in struct tee_shm u32's instead of u16's * Add more description for TEE_IOC_RSTMEM_ALLOC in the header file * Import namespace DMA_BUF in module tee, reported by lkp@intel.com * Added a note in the commit message for "optee: account for direction while converting parameters" why it's needed * Factor out dynamic restricted memory allocation from "optee: support restricted memory allocation" into two new commits "optee: FF-A: dynamic restricted memory allocation" and "optee: smc abi: dynamic restricted memory allocation" * Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic restricted memory allocate if CMA isn't configured
Changes since the V2 RFC: * Based on v6.12 * Replaced the flags for SVP and Trusted UID memory with a u32 field with unique id for each use case * Added dynamic allocation of restricted memory pools * Added OP-TEE ABI both with and without FF-A for dynamic restricted memory * Added support for FF-A with FFA_LEND
Changes since the V1 RFC: * Based on v6.11 * Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]: * Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap * Simplifications and cleanup * New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support" * Replaced the word "secure" with "restricted" where applicable
Jens Wiklander (6): tee: add restricted memory allocation optee: account for direction while converting parameters optee: sync secure world ABI headers optee: support restricted memory allocation optee: FF-A: dynamic restricted memory allocation optee: smc abi: dynamic restricted memory allocation
drivers/tee/Makefile | 1 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/call.c | 10 +- drivers/tee/optee/core.c | 1 + drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- drivers/tee/optee/optee_ffa.h | 27 ++- drivers/tee/optee/optee_msg.h | 65 ++++- drivers/tee/optee/optee_private.h | 75 ++++-- drivers/tee/optee/optee_smc.h | 71 +++++- drivers/tee/optee/rpc.c | 31 ++- drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- drivers/tee/tee_core.c | 38 ++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 +++++- include/linux/tee_core.h | 15 ++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++- 20 files changed, 1358 insertions(+), 76 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c create mode 100644 drivers/tee/tee_rstmem.c
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
Add restricted memory allocation to the TEE subsystem.
Restricted memory refers to memory buffers behind a hardware enforced firewall. It is not accessible to the kernel during normal circumstances but rather only accessible to certain hardware IPs or CPUs executing in higher privileged mode than the kernel itself. This interface allows to allocate and manage such restricted memory buffers via interaction with a TEE implementation.
A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted memory buffers.
The restricted memory is allocated for a specific use-case, like Secure Video Playback, Trusted UI, or Secure Video Recording where certain hardware devices can access the memory.
More use-cases can be added in userspace ABI, but it's up to the backend drivers to provide the implementation.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org --- drivers/tee/Makefile | 1 + drivers/tee/tee_core.c | 38 ++++++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 +++++++++++++++++++++++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 ++++++++++++- include/linux/tee_core.h | 15 +++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++++++- 9 files changed, 370 insertions(+), 4 deletions(-) create mode 100644 drivers/tee/tee_rstmem.c
diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile index 5488cba30bd2..a4c6b55444b9 100644 --- a/drivers/tee/Makefile +++ b/drivers/tee/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o tee-objs += tee_core.o tee-objs += tee_shm.o tee-objs += tee_shm_pool.o +tee-objs += tee_rstmem.o obj-$(CONFIG_OPTEE) += optee/ obj-$(CONFIG_AMDTEE) += amdtee/ obj-$(CONFIG_ARM_TSTEE) += tstee/ diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c index d113679b1e2d..f4a45b77753b 100644 --- a/drivers/tee/tee_core.c +++ b/drivers/tee/tee_core.c @@ -1,12 +1,13 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2015-2016, Linaro Limited + * Copyright (c) 2015-2022, 2024, Linaro Limited */
#define pr_fmt(fmt) "%s: " fmt, __func__
#include <linux/cdev.h> #include <linux/cred.h> +#include <linux/dma-buf.h> #include <linux/fs.h> #include <linux/idr.h> #include <linux/module.h> @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx, return rc; }
+static int +tee_ioctl_rstmem_alloc(struct tee_context *ctx, + struct tee_ioctl_rstmem_alloc_data __user *udata) +{ + struct tee_ioctl_rstmem_alloc_data data; + struct dma_buf *dmabuf; + int id; + int fd; + + if (copy_from_user(&data, udata, sizeof(data))) + return -EFAULT; + + if (data.use_case == TEE_IOC_UC_RESERVED) + return -EINVAL; + + dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size, + &id); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + if (put_user(id, &udata->id)) { + fd = -EFAULT; + goto err; + } + fd = dma_buf_fd(dmabuf, O_CLOEXEC); + if (fd < 0) + goto err; + return fd; +err: + dma_buf_put(dmabuf); + return fd; +} + static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct tee_context *ctx = filp->private_data; @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return tee_ioctl_supp_recv(ctx, uarg); case TEE_IOC_SUPPL_SEND: return tee_ioctl_supp_send(ctx, uarg); + case TEE_IOC_RSTMEM_ALLOC: + return tee_ioctl_rstmem_alloc(ctx, uarg); default: return -EINVAL; } @@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro"); MODULE_DESCRIPTION("TEE Driver"); MODULE_VERSION("1.0"); MODULE_LICENSE("GPL v2"); +MODULE_IMPORT_NS("DMA_BUF"); diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h index 9bc50605227c..bf97796909c0 100644 --- a/drivers/tee/tee_private.h +++ b/drivers/tee/tee_private.h @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx); struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size); struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx, unsigned long addr, size_t length); +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags, + u32 use_case, size_t size, int *shm_id);
#endif /*TEE_PRIVATE_H*/ diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c new file mode 100644 index 000000000000..536bca2901e2 --- /dev/null +++ b/drivers/tee/tee_rstmem.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024 Linaro Limited + */ +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/genalloc.h> +#include <linux/scatterlist.h> +#include <linux/slab.h> +#include <linux/tee_core.h> +#include "tee_private.h" + +struct tee_rstmem_attachment { + struct sg_table table; + struct device *dev; +}; + +static int rstmem_dma_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct tee_shm *shm = dmabuf->priv; + struct tee_rstmem_attachment *a; + int rc; + + a = kzalloc(sizeof(*a), GFP_KERNEL); + if (!a) + return -ENOMEM; + + if (shm->pages) { + rc = sg_alloc_table_from_pages(&a->table, shm->pages, + shm->num_pages, 0, + shm->num_pages * PAGE_SIZE, + GFP_KERNEL); + if (rc) + goto err; + } else { + rc = sg_alloc_table(&a->table, 1, GFP_KERNEL); + if (rc) + goto err; + sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size, + 0); + } + + a->dev = attachment->dev; + attachment->priv = a; + + return 0; +err: + kfree(a); + return rc; +} + +static void rstmem_dma_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct tee_rstmem_attachment *a = attachment->priv; + + sg_free_table(&a->table); + kfree(a); +} + +static struct sg_table * +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment, + enum dma_data_direction direction) +{ + struct tee_rstmem_attachment *a = attachment->priv; + int ret; + + ret = dma_map_sgtable(attachment->dev, &a->table, direction, + DMA_ATTR_SKIP_CPU_SYNC); + if (ret) + return ERR_PTR(ret); + + return &a->table; +} + +static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment, + struct sg_table *table, + enum dma_data_direction direction) +{ + struct tee_rstmem_attachment *a = attachment->priv; + + WARN_ON(&a->table != table); + + dma_unmap_sgtable(attachment->dev, table, direction, + DMA_ATTR_SKIP_CPU_SYNC); +} + +static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + enum dma_data_direction direction) +{ + return -EPERM; +} + +static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf, + enum dma_data_direction direction) +{ + return -EPERM; +} + +static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf, + struct vm_area_struct *vma) +{ + return -EPERM; +} + +static void rstmem_dma_buf_free(struct dma_buf *dmabuf) +{ + struct tee_shm *shm = dmabuf->priv; + + tee_shm_put(shm); +} + +static const struct dma_buf_ops rstmem_generic_buf_ops = { + .attach = rstmem_dma_attach, + .detach = rstmem_dma_detach, + .map_dma_buf = rstmem_dma_map_dma_buf, + .unmap_dma_buf = rstmem_dma_unmap_dma_buf, + .begin_cpu_access = rstmem_dma_buf_begin_cpu_access, + .end_cpu_access = rstmem_dma_buf_end_cpu_access, + .mmap = rstmem_dma_buf_mmap, + .release = rstmem_dma_buf_free, +}; + +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags, + u32 use_case, size_t size, int *shm_id) +{ + struct tee_device *teedev = ctx->teedev; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct dma_buf *dmabuf; + struct tee_shm *shm; + void *ret; + int rc; + + if (!tee_device_get(teedev)) + return ERR_PTR(-EINVAL); + + if (!teedev->desc->ops->rstmem_alloc || + !teedev->desc->ops->rstmem_free) { + dmabuf = ERR_PTR(-EINVAL); + goto err; + } + + shm = kzalloc(sizeof(*shm), GFP_KERNEL); + if (!shm) { + dmabuf = ERR_PTR(-ENOMEM); + goto err; + } + + refcount_set(&shm->refcount, 1); + shm->flags = TEE_SHM_RESTRICTED; + shm->use_case = use_case; + shm->ctx = ctx; + + mutex_lock(&teedev->mutex); + shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL); + mutex_unlock(&teedev->mutex); + if (shm->id < 0) { + dmabuf = ERR_PTR(shm->id); + goto err_kfree; + } + + rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size); + if (rc) { + dmabuf = ERR_PTR(rc); + goto err_idr_remove; + } + + mutex_lock(&teedev->mutex); + ret = idr_replace(&teedev->idr, shm, shm->id); + mutex_unlock(&teedev->mutex); + if (IS_ERR(ret)) { + dmabuf = ret; + goto err_rstmem_free; + } + teedev_ctx_get(ctx); + + exp_info.ops = &rstmem_generic_buf_ops; + exp_info.size = shm->size; + exp_info.priv = shm; + dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(dmabuf)) { + tee_shm_put(shm); + return dmabuf; + } + + *shm_id = shm->id; + return dmabuf; + +err_rstmem_free: + teedev->desc->ops->rstmem_free(ctx, shm); +err_idr_remove: + mutex_lock(&teedev->mutex); + idr_remove(&teedev->idr, shm->id); + mutex_unlock(&teedev->mutex); +err_kfree: + kfree(shm); +err: + tee_device_put(teedev); + return dmabuf; +} diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index daf6e5cfd59a..416f7f25d885 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) "unregister shm %p failed: %d", shm, rc);
release_registered_pages(shm); + } else if (shm->flags & TEE_SHM_RESTRICTED) { + teedev->desc->ops->rstmem_free(shm->ctx, shm); }
teedev_ctx_put(shm->ctx); diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c index 80004b55628d..ee57ef157a77 100644 --- a/drivers/tee/tee_shm_pool.c +++ b/drivers/tee/tee_shm_pool.c @@ -1,9 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * Copyright (c) 2015, 2017, 2022 Linaro Limited + * Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited */ #include <linux/device.h> -#include <linux/dma-buf.h> #include <linux/genalloc.h> #include <linux/slab.h> #include <linux/tee_core.h> @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr, return ERR_PTR(rc); } EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem); + +static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool, + struct tee_shm *shm, size_t size, + size_t align) +{ + size_t sz = ALIGN(size, PAGE_SIZE); + phys_addr_t pa; + + pa = gen_pool_alloc(pool->private_data, sz); + if (!pa) + return -ENOMEM; + + shm->size = sz; + shm->paddr = pa; + + return 0; +} + +static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool, + struct tee_shm *shm) +{ + gen_pool_free(pool->private_data, shm->paddr, shm->size); + shm->paddr = 0; +} + +static struct tee_shm_pool_ops rstmem_pool_ops_generic = { + .alloc = rstmem_pool_op_gen_alloc, + .free = rstmem_pool_op_gen_free, + .destroy_pool = pool_op_gen_destroy_pool, +}; + +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size) +{ + const size_t page_mask = PAGE_SIZE - 1; + struct tee_shm_pool *pool; + int rc; + + /* Check it's page aligned */ + if ((paddr | size) & page_mask) + return ERR_PTR(-EINVAL); + + pool = kzalloc(sizeof(*pool), GFP_KERNEL); + if (!pool) + return ERR_PTR(-ENOMEM); + + pool->private_data = gen_pool_create(PAGE_SHIFT, -1); + if (!pool->private_data) { + rc = -ENOMEM; + goto err_free; + } + + rc = gen_pool_add(pool->private_data, paddr, size, -1); + if (rc) + goto err_free_pool; + + pool->ops = &rstmem_pool_ops_generic; + return pool; + +err_free_pool: + gen_pool_destroy(pool->private_data); +err_free: + kfree(pool); + + return ERR_PTR(rc); +} +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc); diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h index a38494d6b5f4..608302f494fe 100644 --- a/include/linux/tee_core.h +++ b/include/linux/tee_core.h @@ -26,6 +26,7 @@ #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ +#define TEE_SHM_RESTRICTED BIT(4) /* Restricted memory */
#define TEE_DEVICE_FLAG_REGISTERED 0x1 #define TEE_MAX_DEV_NAME_LEN 32 @@ -76,6 +77,8 @@ struct tee_device { * @supp_send: called for supplicant to send a response * @shm_register: register shared memory buffer in TEE * @shm_unregister: unregister shared memory buffer in TEE + * @rstmem_alloc: allocate restricted memory + * @rstmem_free: free restricted memory */ struct tee_driver_ops { void (*get_version)(struct tee_device *teedev, @@ -99,6 +102,9 @@ struct tee_driver_ops { struct page **pages, size_t num_pages, unsigned long start); int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm); + int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm, + u32 flags, u32 use_case, size_t size); + void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm); };
/** @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool) pool->ops->destroy_pool(pool); }
+/** + * tee_rstmem_gen_pool_alloc() - Create a restricted memory manager + * @paddr: Physical address of start of pool + * @size: Size in bytes of the pool + * + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure. + */ +struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size); + /** * tee_get_drvdata() - Return driver_data pointer * @returns the driver_data pointer supplied to tee_register(). diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index a54c203000ed..cba067715d14 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -55,6 +55,7 @@ struct tee_context { * @pages: locked pages from userspace * @num_pages: number of locked pages * @refcount: reference counter + * @use_case: defined by TEE_IOC_UC_* in tee.h * @flags: defined by TEE_SHM_* in tee_core.h * @id: unique id of a shared memory object on this device, shared * with user space @@ -71,6 +72,7 @@ struct tee_shm { struct page **pages; size_t num_pages; refcount_t refcount; + u32 use_case; u32 flags; int id; u64 sec_world_id; diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h index d0430bee8292..88834448debb 100644 --- a/include/uapi/linux/tee.h +++ b/include/uapi/linux/tee.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2015-2016, Linaro Limited + * Copyright (c) 2015-2017, 2020, 2024, Linaro Limited * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -48,6 +48,7 @@ #define TEE_GEN_CAP_PRIVILEGED (1 << 1)/* Privileged device (for supplicant) */ #define TEE_GEN_CAP_REG_MEM (1 << 2)/* Supports registering shared memory */ #define TEE_GEN_CAP_MEMREF_NULL (1 << 3)/* NULL MemRef support */ +#define TEE_GEN_CAP_RSTMEM (1 << 4)/* Supports restricted memory */
#define TEE_MEMREF_NULL (__u64)(-1) /* NULL MemRef Buffer */
@@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data { */ #define TEE_IOC_SHM_REGISTER _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \ struct tee_ioctl_shm_register_data) + +#define TEE_IOC_UC_RESERVED 0 +#define TEE_IOC_UC_SECURE_VIDEO_PLAY 1 +#define TEE_IOC_UC_TRUSTED_UI 2 +#define TEE_IOC_UC_SECURE_VIDEO_RECORD 3 + +/** + * struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument + * @size: [in/out] Size of restricted memory to allocate + * @flags: [in/out] Flags to/from allocate + * @use_case [in] Restricted memory use case, TEE_IOC_UC_* + * @id: [out] Identifier of the restricted memory + */ +struct tee_ioctl_rstmem_alloc_data { + __u64 size; + __u32 flags; + __u32 use_case; + __s32 id; +}; + +/** + * TEE_IOC_RSTMEM_ALLOC - allocate restricted memory + * + * Allocates restricted physically memory normally not accessible by the + * kernel. + * + * Restricted memory refers to memory buffers behind a hardware enforced + * firewall. It is not accessible to the kernel during normal circumstances + * but rather only accessible to certain hardware IPs or CPUs executing in + * higher privileged mode than the kernel itself. This interface allows to + * allocate and manage such restricted memory buffers via interaction with + * a TEE implementation. + * + * Returns a file descriptor on success or < 0 on failure + * + * The returned file descriptor is a dma-buf that can be attached and + * mapped for device with permission to access the physical memory. + */ +#define TEE_IOC_RSTMEM_ALLOC _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \ + struct tee_ioctl_rstmem_alloc_data) + /* * Five syscalls are used when communicating with the TEE driver. * open(): opens the device associated with the driver
On Tue, Dec 17, 2024 at 11:07:37AM +0100, Jens Wiklander wrote:
Add restricted memory allocation to the TEE subsystem.
Restricted memory refers to memory buffers behind a hardware enforced firewall. It is not accessible to the kernel during normal circumstances but rather only accessible to certain hardware IPs or CPUs executing in higher privileged mode than the kernel itself. This interface allows to allocate and manage such restricted memory buffers via interaction with a TEE implementation.
A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted memory buffers.
The restricted memory is allocated for a specific use-case, like Secure Video Playback, Trusted UI, or Secure Video Recording where certain hardware devices can access the memory.
More use-cases can be added in userspace ABI, but it's up to the backend drivers to provide the implementation.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org
drivers/tee/Makefile | 1 + drivers/tee/tee_core.c | 38 ++++++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 +++++++++++++++++++++++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 ++++++++++++- include/linux/tee_core.h | 15 +++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++++++- 9 files changed, 370 insertions(+), 4 deletions(-) create mode 100644 drivers/tee/tee_rstmem.c
diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile index 5488cba30bd2..a4c6b55444b9 100644 --- a/drivers/tee/Makefile +++ b/drivers/tee/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o tee-objs += tee_core.o tee-objs += tee_shm.o tee-objs += tee_shm_pool.o +tee-objs += tee_rstmem.o obj-$(CONFIG_OPTEE) += optee/ obj-$(CONFIG_AMDTEE) += amdtee/ obj-$(CONFIG_ARM_TSTEE) += tstee/ diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c index d113679b1e2d..f4a45b77753b 100644 --- a/drivers/tee/tee_core.c +++ b/drivers/tee/tee_core.c @@ -1,12 +1,13 @@ // SPDX-License-Identifier: GPL-2.0-only /*
- Copyright (c) 2015-2016, Linaro Limited
*/
- Copyright (c) 2015-2022, 2024, Linaro Limited
#define pr_fmt(fmt) "%s: " fmt, __func__ #include <linux/cdev.h> #include <linux/cred.h> +#include <linux/dma-buf.h> #include <linux/fs.h> #include <linux/idr.h> #include <linux/module.h> @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx, return rc; } +static int +tee_ioctl_rstmem_alloc(struct tee_context *ctx,
struct tee_ioctl_rstmem_alloc_data __user *udata)
+{
- struct tee_ioctl_rstmem_alloc_data data;
- struct dma_buf *dmabuf;
- int id;
- int fd;
- if (copy_from_user(&data, udata, sizeof(data)))
return -EFAULT;
- if (data.use_case == TEE_IOC_UC_RESERVED)
return -EINVAL;
- dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
&id);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- if (put_user(id, &udata->id)) {
fd = -EFAULT;
goto err;
- }
- fd = dma_buf_fd(dmabuf, O_CLOEXEC);
- if (fd < 0)
goto err;
- return fd;
+err:
- dma_buf_put(dmabuf);
- return fd;
+}
static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct tee_context *ctx = filp->private_data; @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return tee_ioctl_supp_recv(ctx, uarg); case TEE_IOC_SUPPL_SEND: return tee_ioctl_supp_send(ctx, uarg);
- case TEE_IOC_RSTMEM_ALLOC:
default: return -EINVAL; }return tee_ioctl_rstmem_alloc(ctx, uarg);
@@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro"); MODULE_DESCRIPTION("TEE Driver"); MODULE_VERSION("1.0"); MODULE_LICENSE("GPL v2"); +MODULE_IMPORT_NS("DMA_BUF"); diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h index 9bc50605227c..bf97796909c0 100644 --- a/drivers/tee/tee_private.h +++ b/drivers/tee/tee_private.h @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx); struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size); struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx, unsigned long addr, size_t length); +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
u32 use_case, size_t size, int *shm_id);
#endif /*TEE_PRIVATE_H*/ diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c new file mode 100644 index 000000000000..536bca2901e2 --- /dev/null +++ b/drivers/tee/tee_rstmem.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (c) 2024 Linaro Limited
- */
+#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/genalloc.h> +#include <linux/scatterlist.h> +#include <linux/slab.h> +#include <linux/tee_core.h> +#include "tee_private.h"
+struct tee_rstmem_attachment {
- struct sg_table table;
- struct device *dev;
+};
+static int rstmem_dma_attach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
+{
- struct tee_shm *shm = dmabuf->priv;
- struct tee_rstmem_attachment *a;
- int rc;
- a = kzalloc(sizeof(*a), GFP_KERNEL);
- if (!a)
return -ENOMEM;
- if (shm->pages) {
rc = sg_alloc_table_from_pages(&a->table, shm->pages,
shm->num_pages, 0,
shm->num_pages * PAGE_SIZE,
GFP_KERNEL);
if (rc)
goto err;
- } else {
rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
if (rc)
goto err;
sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
0);
- }
- a->dev = attachment->dev;
- attachment->priv = a;
- return 0;
+err:
- kfree(a);
- return rc;
+}
+static void rstmem_dma_detach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
+{
- struct tee_rstmem_attachment *a = attachment->priv;
- sg_free_table(&a->table);
- kfree(a);
+}
+static struct sg_table * +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
enum dma_data_direction direction)
+{
- struct tee_rstmem_attachment *a = attachment->priv;
- int ret;
- ret = dma_map_sgtable(attachment->dev, &a->table, direction,
DMA_ATTR_SKIP_CPU_SYNC);
- if (ret)
return ERR_PTR(ret);
- return &a->table;
+}
+static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
+{
- struct tee_rstmem_attachment *a = attachment->priv;
- WARN_ON(&a->table != table);
- dma_unmap_sgtable(attachment->dev, table, direction,
DMA_ATTR_SKIP_CPU_SYNC);
+}
+static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction)
+{
- return -EPERM;
+}
+static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction)
+{
- return -EPERM;
+}
+static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
struct vm_area_struct *vma)
+{
- return -EPERM;
Just an aside, you should just not implment mmap if it's not possible. Then you also don't need the cpu access functions. Both userspace mmap and in-kernel vmap support is intentionally optional for dma-buf. -Sima
+}
+static void rstmem_dma_buf_free(struct dma_buf *dmabuf) +{
- struct tee_shm *shm = dmabuf->priv;
- tee_shm_put(shm);
+}
+static const struct dma_buf_ops rstmem_generic_buf_ops = {
- .attach = rstmem_dma_attach,
- .detach = rstmem_dma_detach,
- .map_dma_buf = rstmem_dma_map_dma_buf,
- .unmap_dma_buf = rstmem_dma_unmap_dma_buf,
- .begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
- .end_cpu_access = rstmem_dma_buf_end_cpu_access,
- .mmap = rstmem_dma_buf_mmap,
- .release = rstmem_dma_buf_free,
+};
+struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
u32 use_case, size_t size, int *shm_id)
+{
- struct tee_device *teedev = ctx->teedev;
- DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
- struct dma_buf *dmabuf;
- struct tee_shm *shm;
- void *ret;
- int rc;
- if (!tee_device_get(teedev))
return ERR_PTR(-EINVAL);
- if (!teedev->desc->ops->rstmem_alloc ||
!teedev->desc->ops->rstmem_free) {
dmabuf = ERR_PTR(-EINVAL);
goto err;
- }
- shm = kzalloc(sizeof(*shm), GFP_KERNEL);
- if (!shm) {
dmabuf = ERR_PTR(-ENOMEM);
goto err;
- }
- refcount_set(&shm->refcount, 1);
- shm->flags = TEE_SHM_RESTRICTED;
- shm->use_case = use_case;
- shm->ctx = ctx;
- mutex_lock(&teedev->mutex);
- shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
- mutex_unlock(&teedev->mutex);
- if (shm->id < 0) {
dmabuf = ERR_PTR(shm->id);
goto err_kfree;
- }
- rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
- if (rc) {
dmabuf = ERR_PTR(rc);
goto err_idr_remove;
- }
- mutex_lock(&teedev->mutex);
- ret = idr_replace(&teedev->idr, shm, shm->id);
- mutex_unlock(&teedev->mutex);
- if (IS_ERR(ret)) {
dmabuf = ret;
goto err_rstmem_free;
- }
- teedev_ctx_get(ctx);
- exp_info.ops = &rstmem_generic_buf_ops;
- exp_info.size = shm->size;
- exp_info.priv = shm;
- dmabuf = dma_buf_export(&exp_info);
- if (IS_ERR(dmabuf)) {
tee_shm_put(shm);
return dmabuf;
- }
- *shm_id = shm->id;
- return dmabuf;
+err_rstmem_free:
- teedev->desc->ops->rstmem_free(ctx, shm);
+err_idr_remove:
- mutex_lock(&teedev->mutex);
- idr_remove(&teedev->idr, shm->id);
- mutex_unlock(&teedev->mutex);
+err_kfree:
- kfree(shm);
+err:
- tee_device_put(teedev);
- return dmabuf;
+} diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index daf6e5cfd59a..416f7f25d885 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) "unregister shm %p failed: %d", shm, rc); release_registered_pages(shm);
- } else if (shm->flags & TEE_SHM_RESTRICTED) {
}teedev->desc->ops->rstmem_free(shm->ctx, shm);
teedev_ctx_put(shm->ctx); diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c index 80004b55628d..ee57ef157a77 100644 --- a/drivers/tee/tee_shm_pool.c +++ b/drivers/tee/tee_shm_pool.c @@ -1,9 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-only /*
- Copyright (c) 2015, 2017, 2022 Linaro Limited
*/
- Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
#include <linux/device.h> -#include <linux/dma-buf.h> #include <linux/genalloc.h> #include <linux/slab.h> #include <linux/tee_core.h> @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr, return ERR_PTR(rc); } EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
+static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
struct tee_shm *shm, size_t size,
size_t align)
+{
- size_t sz = ALIGN(size, PAGE_SIZE);
- phys_addr_t pa;
- pa = gen_pool_alloc(pool->private_data, sz);
- if (!pa)
return -ENOMEM;
- shm->size = sz;
- shm->paddr = pa;
- return 0;
+}
+static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
struct tee_shm *shm)
+{
- gen_pool_free(pool->private_data, shm->paddr, shm->size);
- shm->paddr = 0;
+}
+static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
- .alloc = rstmem_pool_op_gen_alloc,
- .free = rstmem_pool_op_gen_free,
- .destroy_pool = pool_op_gen_destroy_pool,
+};
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size) +{
- const size_t page_mask = PAGE_SIZE - 1;
- struct tee_shm_pool *pool;
- int rc;
- /* Check it's page aligned */
- if ((paddr | size) & page_mask)
return ERR_PTR(-EINVAL);
- pool = kzalloc(sizeof(*pool), GFP_KERNEL);
- if (!pool)
return ERR_PTR(-ENOMEM);
- pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
- if (!pool->private_data) {
rc = -ENOMEM;
goto err_free;
- }
- rc = gen_pool_add(pool->private_data, paddr, size, -1);
- if (rc)
goto err_free_pool;
- pool->ops = &rstmem_pool_ops_generic;
- return pool;
+err_free_pool:
- gen_pool_destroy(pool->private_data);
+err_free:
- kfree(pool);
- return ERR_PTR(rc);
+} +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc); diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h index a38494d6b5f4..608302f494fe 100644 --- a/include/linux/tee_core.h +++ b/include/linux/tee_core.h @@ -26,6 +26,7 @@ #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ +#define TEE_SHM_RESTRICTED BIT(4) /* Restricted memory */ #define TEE_DEVICE_FLAG_REGISTERED 0x1 #define TEE_MAX_DEV_NAME_LEN 32 @@ -76,6 +77,8 @@ struct tee_device {
- @supp_send: called for supplicant to send a response
- @shm_register: register shared memory buffer in TEE
- @shm_unregister: unregister shared memory buffer in TEE
- @rstmem_alloc: allocate restricted memory
*/
- @rstmem_free: free restricted memory
struct tee_driver_ops { void (*get_version)(struct tee_device *teedev, @@ -99,6 +102,9 @@ struct tee_driver_ops { struct page **pages, size_t num_pages, unsigned long start); int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
- int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
u32 flags, u32 use_case, size_t size);
- void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
}; /** @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool) pool->ops->destroy_pool(pool); } +/**
- tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
- @paddr: Physical address of start of pool
- @size: Size in bytes of the pool
- @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
- */
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
/**
- tee_get_drvdata() - Return driver_data pointer
- @returns the driver_data pointer supplied to tee_register().
diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index a54c203000ed..cba067715d14 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -55,6 +55,7 @@ struct tee_context {
- @pages: locked pages from userspace
- @num_pages: number of locked pages
- @refcount: reference counter
- @use_case: defined by TEE_IOC_UC_* in tee.h
- @flags: defined by TEE_SHM_* in tee_core.h
- @id: unique id of a shared memory object on this device, shared
with user space
@@ -71,6 +72,7 @@ struct tee_shm { struct page **pages; size_t num_pages; refcount_t refcount;
- u32 use_case; u32 flags; int id; u64 sec_world_id;
diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h index d0430bee8292..88834448debb 100644 --- a/include/uapi/linux/tee.h +++ b/include/uapi/linux/tee.h @@ -1,5 +1,5 @@ /*
- Copyright (c) 2015-2016, Linaro Limited
- Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
- All rights reserved.
- Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@ #define TEE_GEN_CAP_PRIVILEGED (1 << 1)/* Privileged device (for supplicant) */ #define TEE_GEN_CAP_REG_MEM (1 << 2)/* Supports registering shared memory */ #define TEE_GEN_CAP_MEMREF_NULL (1 << 3)/* NULL MemRef support */ +#define TEE_GEN_CAP_RSTMEM (1 << 4)/* Supports restricted memory */ #define TEE_MEMREF_NULL (__u64)(-1) /* NULL MemRef Buffer */ @@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data { */ #define TEE_IOC_SHM_REGISTER _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \ struct tee_ioctl_shm_register_data)
+#define TEE_IOC_UC_RESERVED 0 +#define TEE_IOC_UC_SECURE_VIDEO_PLAY 1 +#define TEE_IOC_UC_TRUSTED_UI 2 +#define TEE_IOC_UC_SECURE_VIDEO_RECORD 3
+/**
- struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
- @size: [in/out] Size of restricted memory to allocate
- @flags: [in/out] Flags to/from allocate
- @use_case [in] Restricted memory use case, TEE_IOC_UC_*
- @id: [out] Identifier of the restricted memory
- */
+struct tee_ioctl_rstmem_alloc_data {
- __u64 size;
- __u32 flags;
- __u32 use_case;
- __s32 id;
+};
+/**
- TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
- Allocates restricted physically memory normally not accessible by the
- kernel.
- Restricted memory refers to memory buffers behind a hardware enforced
- firewall. It is not accessible to the kernel during normal circumstances
- but rather only accessible to certain hardware IPs or CPUs executing in
- higher privileged mode than the kernel itself. This interface allows to
- allocate and manage such restricted memory buffers via interaction with
- a TEE implementation.
- Returns a file descriptor on success or < 0 on failure
- The returned file descriptor is a dma-buf that can be attached and
- mapped for device with permission to access the physical memory.
- */
+#define TEE_IOC_RSTMEM_ALLOC _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
struct tee_ioctl_rstmem_alloc_data)
/*
- Five syscalls are used when communicating with the TEE driver.
- open(): opens the device associated with the driver
-- 2.43.0
On Wed, Jan 8, 2025 at 5:54 PM Simona Vetter simona.vetter@ffwll.ch wrote:
On Tue, Dec 17, 2024 at 11:07:37AM +0100, Jens Wiklander wrote:
Add restricted memory allocation to the TEE subsystem.
Restricted memory refers to memory buffers behind a hardware enforced firewall. It is not accessible to the kernel during normal circumstances but rather only accessible to certain hardware IPs or CPUs executing in higher privileged mode than the kernel itself. This interface allows to allocate and manage such restricted memory buffers via interaction with a TEE implementation.
A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted memory buffers.
The restricted memory is allocated for a specific use-case, like Secure Video Playback, Trusted UI, or Secure Video Recording where certain hardware devices can access the memory.
More use-cases can be added in userspace ABI, but it's up to the backend drivers to provide the implementation.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org
drivers/tee/Makefile | 1 + drivers/tee/tee_core.c | 38 ++++++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 +++++++++++++++++++++++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 ++++++++++++- include/linux/tee_core.h | 15 +++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++++++- 9 files changed, 370 insertions(+), 4 deletions(-) create mode 100644 drivers/tee/tee_rstmem.c
diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile index 5488cba30bd2..a4c6b55444b9 100644 --- a/drivers/tee/Makefile +++ b/drivers/tee/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o tee-objs += tee_core.o tee-objs += tee_shm.o tee-objs += tee_shm_pool.o +tee-objs += tee_rstmem.o obj-$(CONFIG_OPTEE) += optee/ obj-$(CONFIG_AMDTEE) += amdtee/ obj-$(CONFIG_ARM_TSTEE) += tstee/ diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c index d113679b1e2d..f4a45b77753b 100644 --- a/drivers/tee/tee_core.c +++ b/drivers/tee/tee_core.c @@ -1,12 +1,13 @@ // SPDX-License-Identifier: GPL-2.0-only /*
- Copyright (c) 2015-2016, Linaro Limited
*/
- Copyright (c) 2015-2022, 2024, Linaro Limited
#define pr_fmt(fmt) "%s: " fmt, __func__
#include <linux/cdev.h> #include <linux/cred.h> +#include <linux/dma-buf.h> #include <linux/fs.h> #include <linux/idr.h> #include <linux/module.h> @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx, return rc; }
+static int +tee_ioctl_rstmem_alloc(struct tee_context *ctx,
struct tee_ioctl_rstmem_alloc_data __user *udata)
+{
struct tee_ioctl_rstmem_alloc_data data;
struct dma_buf *dmabuf;
int id;
int fd;
if (copy_from_user(&data, udata, sizeof(data)))
return -EFAULT;
if (data.use_case == TEE_IOC_UC_RESERVED)
return -EINVAL;
dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
&id);
if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
if (put_user(id, &udata->id)) {
fd = -EFAULT;
goto err;
}
fd = dma_buf_fd(dmabuf, O_CLOEXEC);
if (fd < 0)
goto err;
return fd;
+err:
dma_buf_put(dmabuf);
return fd;
+}
static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct tee_context *ctx = filp->private_data; @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return tee_ioctl_supp_recv(ctx, uarg); case TEE_IOC_SUPPL_SEND: return tee_ioctl_supp_send(ctx, uarg);
case TEE_IOC_RSTMEM_ALLOC:
return tee_ioctl_rstmem_alloc(ctx, uarg); default: return -EINVAL; }
@@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro"); MODULE_DESCRIPTION("TEE Driver"); MODULE_VERSION("1.0"); MODULE_LICENSE("GPL v2"); +MODULE_IMPORT_NS("DMA_BUF"); diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h index 9bc50605227c..bf97796909c0 100644 --- a/drivers/tee/tee_private.h +++ b/drivers/tee/tee_private.h @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx); struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size); struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx, unsigned long addr, size_t length); +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
u32 use_case, size_t size, int *shm_id);
#endif /*TEE_PRIVATE_H*/ diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c new file mode 100644 index 000000000000..536bca2901e2 --- /dev/null +++ b/drivers/tee/tee_rstmem.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (c) 2024 Linaro Limited
- */
+#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/genalloc.h> +#include <linux/scatterlist.h> +#include <linux/slab.h> +#include <linux/tee_core.h> +#include "tee_private.h"
+struct tee_rstmem_attachment {
struct sg_table table;
struct device *dev;
+};
+static int rstmem_dma_attach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
+{
struct tee_shm *shm = dmabuf->priv;
struct tee_rstmem_attachment *a;
int rc;
a = kzalloc(sizeof(*a), GFP_KERNEL);
if (!a)
return -ENOMEM;
if (shm->pages) {
rc = sg_alloc_table_from_pages(&a->table, shm->pages,
shm->num_pages, 0,
shm->num_pages * PAGE_SIZE,
GFP_KERNEL);
if (rc)
goto err;
} else {
rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
if (rc)
goto err;
sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
0);
}
a->dev = attachment->dev;
attachment->priv = a;
return 0;
+err:
kfree(a);
return rc;
+}
+static void rstmem_dma_detach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
+{
struct tee_rstmem_attachment *a = attachment->priv;
sg_free_table(&a->table);
kfree(a);
+}
+static struct sg_table * +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
enum dma_data_direction direction)
+{
struct tee_rstmem_attachment *a = attachment->priv;
int ret;
ret = dma_map_sgtable(attachment->dev, &a->table, direction,
DMA_ATTR_SKIP_CPU_SYNC);
if (ret)
return ERR_PTR(ret);
return &a->table;
+}
+static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
+{
struct tee_rstmem_attachment *a = attachment->priv;
WARN_ON(&a->table != table);
dma_unmap_sgtable(attachment->dev, table, direction,
DMA_ATTR_SKIP_CPU_SYNC);
+}
+static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction)
+{
return -EPERM;
+}
+static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction)
+{
return -EPERM;
+}
+static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
struct vm_area_struct *vma)
+{
return -EPERM;
Just an aside, you should just not implment mmap if it's not possible. Then you also don't need the cpu access functions. Both userspace mmap and in-kernel vmap support is intentionally optional for dma-buf.
Got it, thanks. I'll remove those in the next version.
Cheers, Jens
-Sima
+}
+static void rstmem_dma_buf_free(struct dma_buf *dmabuf) +{
struct tee_shm *shm = dmabuf->priv;
tee_shm_put(shm);
+}
+static const struct dma_buf_ops rstmem_generic_buf_ops = {
.attach = rstmem_dma_attach,
.detach = rstmem_dma_detach,
.map_dma_buf = rstmem_dma_map_dma_buf,
.unmap_dma_buf = rstmem_dma_unmap_dma_buf,
.begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
.end_cpu_access = rstmem_dma_buf_end_cpu_access,
.mmap = rstmem_dma_buf_mmap,
.release = rstmem_dma_buf_free,
+};
+struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
u32 use_case, size_t size, int *shm_id)
+{
struct tee_device *teedev = ctx->teedev;
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
struct dma_buf *dmabuf;
struct tee_shm *shm;
void *ret;
int rc;
if (!tee_device_get(teedev))
return ERR_PTR(-EINVAL);
if (!teedev->desc->ops->rstmem_alloc ||
!teedev->desc->ops->rstmem_free) {
dmabuf = ERR_PTR(-EINVAL);
goto err;
}
shm = kzalloc(sizeof(*shm), GFP_KERNEL);
if (!shm) {
dmabuf = ERR_PTR(-ENOMEM);
goto err;
}
refcount_set(&shm->refcount, 1);
shm->flags = TEE_SHM_RESTRICTED;
shm->use_case = use_case;
shm->ctx = ctx;
mutex_lock(&teedev->mutex);
shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
mutex_unlock(&teedev->mutex);
if (shm->id < 0) {
dmabuf = ERR_PTR(shm->id);
goto err_kfree;
}
rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
if (rc) {
dmabuf = ERR_PTR(rc);
goto err_idr_remove;
}
mutex_lock(&teedev->mutex);
ret = idr_replace(&teedev->idr, shm, shm->id);
mutex_unlock(&teedev->mutex);
if (IS_ERR(ret)) {
dmabuf = ret;
goto err_rstmem_free;
}
teedev_ctx_get(ctx);
exp_info.ops = &rstmem_generic_buf_ops;
exp_info.size = shm->size;
exp_info.priv = shm;
dmabuf = dma_buf_export(&exp_info);
if (IS_ERR(dmabuf)) {
tee_shm_put(shm);
return dmabuf;
}
*shm_id = shm->id;
return dmabuf;
+err_rstmem_free:
teedev->desc->ops->rstmem_free(ctx, shm);
+err_idr_remove:
mutex_lock(&teedev->mutex);
idr_remove(&teedev->idr, shm->id);
mutex_unlock(&teedev->mutex);
+err_kfree:
kfree(shm);
+err:
tee_device_put(teedev);
return dmabuf;
+} diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index daf6e5cfd59a..416f7f25d885 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) "unregister shm %p failed: %d", shm, rc);
release_registered_pages(shm);
} else if (shm->flags & TEE_SHM_RESTRICTED) {
teedev->desc->ops->rstmem_free(shm->ctx, shm); } teedev_ctx_put(shm->ctx);
diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c index 80004b55628d..ee57ef157a77 100644 --- a/drivers/tee/tee_shm_pool.c +++ b/drivers/tee/tee_shm_pool.c @@ -1,9 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-only /*
- Copyright (c) 2015, 2017, 2022 Linaro Limited
*/
- Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
#include <linux/device.h> -#include <linux/dma-buf.h> #include <linux/genalloc.h> #include <linux/slab.h> #include <linux/tee_core.h> @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr, return ERR_PTR(rc); } EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
+static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
struct tee_shm *shm, size_t size,
size_t align)
+{
size_t sz = ALIGN(size, PAGE_SIZE);
phys_addr_t pa;
pa = gen_pool_alloc(pool->private_data, sz);
if (!pa)
return -ENOMEM;
shm->size = sz;
shm->paddr = pa;
return 0;
+}
+static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
struct tee_shm *shm)
+{
gen_pool_free(pool->private_data, shm->paddr, shm->size);
shm->paddr = 0;
+}
+static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
.alloc = rstmem_pool_op_gen_alloc,
.free = rstmem_pool_op_gen_free,
.destroy_pool = pool_op_gen_destroy_pool,
+};
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size) +{
const size_t page_mask = PAGE_SIZE - 1;
struct tee_shm_pool *pool;
int rc;
/* Check it's page aligned */
if ((paddr | size) & page_mask)
return ERR_PTR(-EINVAL);
pool = kzalloc(sizeof(*pool), GFP_KERNEL);
if (!pool)
return ERR_PTR(-ENOMEM);
pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
if (!pool->private_data) {
rc = -ENOMEM;
goto err_free;
}
rc = gen_pool_add(pool->private_data, paddr, size, -1);
if (rc)
goto err_free_pool;
pool->ops = &rstmem_pool_ops_generic;
return pool;
+err_free_pool:
gen_pool_destroy(pool->private_data);
+err_free:
kfree(pool);
return ERR_PTR(rc);
+} +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc); diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h index a38494d6b5f4..608302f494fe 100644 --- a/include/linux/tee_core.h +++ b/include/linux/tee_core.h @@ -26,6 +26,7 @@ #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ +#define TEE_SHM_RESTRICTED BIT(4) /* Restricted memory */
#define TEE_DEVICE_FLAG_REGISTERED 0x1 #define TEE_MAX_DEV_NAME_LEN 32 @@ -76,6 +77,8 @@ struct tee_device {
- @supp_send: called for supplicant to send a response
- @shm_register: register shared memory buffer in TEE
- @shm_unregister: unregister shared memory buffer in TEE
- @rstmem_alloc: allocate restricted memory
*/
- @rstmem_free: free restricted memory
struct tee_driver_ops { void (*get_version)(struct tee_device *teedev, @@ -99,6 +102,9 @@ struct tee_driver_ops { struct page **pages, size_t num_pages, unsigned long start); int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
u32 flags, u32 use_case, size_t size);
void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
};
/** @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool) pool->ops->destroy_pool(pool); }
+/**
- tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
- @paddr: Physical address of start of pool
- @size: Size in bytes of the pool
- @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
- */
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
/**
- tee_get_drvdata() - Return driver_data pointer
- @returns the driver_data pointer supplied to tee_register().
diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index a54c203000ed..cba067715d14 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -55,6 +55,7 @@ struct tee_context {
- @pages: locked pages from userspace
- @num_pages: number of locked pages
- @refcount: reference counter
- @use_case: defined by TEE_IOC_UC_* in tee.h
- @flags: defined by TEE_SHM_* in tee_core.h
- @id: unique id of a shared memory object on this device, shared
with user space
@@ -71,6 +72,7 @@ struct tee_shm { struct page **pages; size_t num_pages; refcount_t refcount;
u32 use_case; u32 flags; int id; u64 sec_world_id;
diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h index d0430bee8292..88834448debb 100644 --- a/include/uapi/linux/tee.h +++ b/include/uapi/linux/tee.h @@ -1,5 +1,5 @@ /*
- Copyright (c) 2015-2016, Linaro Limited
- Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
- All rights reserved.
- Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@ #define TEE_GEN_CAP_PRIVILEGED (1 << 1)/* Privileged device (for supplicant) */ #define TEE_GEN_CAP_REG_MEM (1 << 2)/* Supports registering shared memory */ #define TEE_GEN_CAP_MEMREF_NULL (1 << 3)/* NULL MemRef support */ +#define TEE_GEN_CAP_RSTMEM (1 << 4)/* Supports restricted memory */
#define TEE_MEMREF_NULL (__u64)(-1) /* NULL MemRef Buffer */
@@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data { */ #define TEE_IOC_SHM_REGISTER _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \ struct tee_ioctl_shm_register_data)
+#define TEE_IOC_UC_RESERVED 0 +#define TEE_IOC_UC_SECURE_VIDEO_PLAY 1 +#define TEE_IOC_UC_TRUSTED_UI 2 +#define TEE_IOC_UC_SECURE_VIDEO_RECORD 3
+/**
- struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
- @size: [in/out] Size of restricted memory to allocate
- @flags: [in/out] Flags to/from allocate
- @use_case [in] Restricted memory use case, TEE_IOC_UC_*
- @id: [out] Identifier of the restricted memory
- */
+struct tee_ioctl_rstmem_alloc_data {
__u64 size;
__u32 flags;
__u32 use_case;
__s32 id;
+};
+/**
- TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
- Allocates restricted physically memory normally not accessible by the
- kernel.
- Restricted memory refers to memory buffers behind a hardware enforced
- firewall. It is not accessible to the kernel during normal circumstances
- but rather only accessible to certain hardware IPs or CPUs executing in
- higher privileged mode than the kernel itself. This interface allows to
- allocate and manage such restricted memory buffers via interaction with
- a TEE implementation.
- Returns a file descriptor on success or < 0 on failure
- The returned file descriptor is a dma-buf that can be attached and
- mapped for device with permission to access the physical memory.
- */
+#define TEE_IOC_RSTMEM_ALLOC _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
struct tee_ioctl_rstmem_alloc_data)
/*
- Five syscalls are used when communicating with the TEE driver.
- open(): opens the device associated with the driver
-- 2.43.0
-- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Hi Jens,
On Tue, 17 Dec 2024 11:07:37 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Add restricted memory allocation to the TEE subsystem.
Restricted memory refers to memory buffers behind a hardware enforced firewall. It is not accessible to the kernel during normal circumstances but rather only accessible to certain hardware IPs or CPUs executing in higher privileged mode than the kernel itself. This interface allows to allocate and manage such restricted memory buffers via interaction with a TEE implementation.
A new ioctl TEE_IOC_RSTMEM_ALLOC is added to allocate these restricted memory buffers.
The restricted memory is allocated for a specific use-case, like Secure Video Playback, Trusted UI, or Secure Video Recording where certain hardware devices can access the memory.
More use-cases can be added in userspace ABI, but it's up to the backend drivers to provide the implementation.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org
drivers/tee/Makefile | 1 + drivers/tee/tee_core.c | 38 ++++++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 +++++++++++++++++++++++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 ++++++++++++- include/linux/tee_core.h | 15 +++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++++++- 9 files changed, 370 insertions(+), 4 deletions(-) create mode 100644 drivers/tee/tee_rstmem.c
diff --git a/drivers/tee/Makefile b/drivers/tee/Makefile index 5488cba30bd2..a4c6b55444b9 100644 --- a/drivers/tee/Makefile +++ b/drivers/tee/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_TEE) += tee.o tee-objs += tee_core.o tee-objs += tee_shm.o tee-objs += tee_shm_pool.o +tee-objs += tee_rstmem.o obj-$(CONFIG_OPTEE) += optee/ obj-$(CONFIG_AMDTEE) += amdtee/ obj-$(CONFIG_ARM_TSTEE) += tstee/ diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c index d113679b1e2d..f4a45b77753b 100644 --- a/drivers/tee/tee_core.c +++ b/drivers/tee/tee_core.c @@ -1,12 +1,13 @@ // SPDX-License-Identifier: GPL-2.0-only /*
- Copyright (c) 2015-2016, Linaro Limited
*/
- Copyright (c) 2015-2022, 2024, Linaro Limited
#define pr_fmt(fmt) "%s: " fmt, __func__ #include <linux/cdev.h> #include <linux/cred.h> +#include <linux/dma-buf.h> #include <linux/fs.h> #include <linux/idr.h> #include <linux/module.h> @@ -815,6 +816,38 @@ static int tee_ioctl_supp_send(struct tee_context *ctx, return rc; } +static int +tee_ioctl_rstmem_alloc(struct tee_context *ctx,
struct tee_ioctl_rstmem_alloc_data __user *udata)
+{
- struct tee_ioctl_rstmem_alloc_data data;
- struct dma_buf *dmabuf;
- int id;
- int fd;
- if (copy_from_user(&data, udata, sizeof(data)))
return -EFAULT;
- if (data.use_case == TEE_IOC_UC_RESERVED)
return -EINVAL;
- dmabuf = tee_rstmem_alloc(ctx, data.flags, data.use_case, data.size,
&id);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- if (put_user(id, &udata->id)) {
fd = -EFAULT;
goto err;
- }
- fd = dma_buf_fd(dmabuf, O_CLOEXEC);
- if (fd < 0)
goto err;
- return fd;
+err:
- dma_buf_put(dmabuf);
- return fd;
+}
static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct tee_context *ctx = filp->private_data; @@ -839,6 +872,8 @@ static long tee_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return tee_ioctl_supp_recv(ctx, uarg); case TEE_IOC_SUPPL_SEND: return tee_ioctl_supp_send(ctx, uarg);
- case TEE_IOC_RSTMEM_ALLOC:
default: return -EINVAL; }return tee_ioctl_rstmem_alloc(ctx, uarg);
@@ -1286,3 +1321,4 @@ MODULE_AUTHOR("Linaro"); MODULE_DESCRIPTION("TEE Driver"); MODULE_VERSION("1.0"); MODULE_LICENSE("GPL v2"); +MODULE_IMPORT_NS("DMA_BUF"); diff --git a/drivers/tee/tee_private.h b/drivers/tee/tee_private.h index 9bc50605227c..bf97796909c0 100644 --- a/drivers/tee/tee_private.h +++ b/drivers/tee/tee_private.h @@ -23,5 +23,7 @@ void teedev_ctx_put(struct tee_context *ctx); struct tee_shm *tee_shm_alloc_user_buf(struct tee_context *ctx, size_t size); struct tee_shm *tee_shm_register_user_buf(struct tee_context *ctx, unsigned long addr, size_t length); +struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
u32 use_case, size_t size, int *shm_id);
#endif /*TEE_PRIVATE_H*/ diff --git a/drivers/tee/tee_rstmem.c b/drivers/tee/tee_rstmem.c new file mode 100644 index 000000000000..536bca2901e2 --- /dev/null +++ b/drivers/tee/tee_rstmem.c @@ -0,0 +1,201 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Copyright (c) 2024 Linaro Limited
- */
+#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/genalloc.h> +#include <linux/scatterlist.h> +#include <linux/slab.h> +#include <linux/tee_core.h> +#include "tee_private.h"
+struct tee_rstmem_attachment {
- struct sg_table table;
- struct device *dev;
+};
+static int rstmem_dma_attach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
+{
- struct tee_shm *shm = dmabuf->priv;
- struct tee_rstmem_attachment *a;
- int rc;
- a = kzalloc(sizeof(*a), GFP_KERNEL);
- if (!a)
return -ENOMEM;
- if (shm->pages) {
rc = sg_alloc_table_from_pages(&a->table, shm->pages,
shm->num_pages, 0,
shm->num_pages * PAGE_SIZE,
GFP_KERNEL);
if (rc)
goto err;
- } else {
rc = sg_alloc_table(&a->table, 1, GFP_KERNEL);
if (rc)
goto err;
sg_set_page(a->table.sgl, phys_to_page(shm->paddr), shm->size,
0);
- }
- a->dev = attachment->dev;
- attachment->priv = a;
- return 0;
+err:
- kfree(a);
- return rc;
+}
+static void rstmem_dma_detach(struct dma_buf *dmabuf,
struct dma_buf_attachment *attachment)
+{
- struct tee_rstmem_attachment *a = attachment->priv;
- sg_free_table(&a->table);
- kfree(a);
+}
+static struct sg_table * +rstmem_dma_map_dma_buf(struct dma_buf_attachment *attachment,
enum dma_data_direction direction)
+{
- struct tee_rstmem_attachment *a = attachment->priv;
- int ret;
- ret = dma_map_sgtable(attachment->dev, &a->table, direction,
DMA_ATTR_SKIP_CPU_SYNC);
- if (ret)
return ERR_PTR(ret);
- return &a->table;
+}
+static void rstmem_dma_unmap_dma_buf(struct dma_buf_attachment *attachment,
struct sg_table *table,
enum dma_data_direction direction)
+{
- struct tee_rstmem_attachment *a = attachment->priv;
- WARN_ON(&a->table != table);
- dma_unmap_sgtable(attachment->dev, table, direction,
DMA_ATTR_SKIP_CPU_SYNC);
+}
+static int rstmem_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction)
+{
- return -EPERM;
+}
+static int rstmem_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
enum dma_data_direction direction)
+{
- return -EPERM;
+}
+static int rstmem_dma_buf_mmap(struct dma_buf *dmabuf,
struct vm_area_struct *vma)
+{
- return -EPERM;
+}
+static void rstmem_dma_buf_free(struct dma_buf *dmabuf) +{
- struct tee_shm *shm = dmabuf->priv;
- tee_shm_put(shm);
+}
+static const struct dma_buf_ops rstmem_generic_buf_ops = {
- .attach = rstmem_dma_attach,
- .detach = rstmem_dma_detach,
- .map_dma_buf = rstmem_dma_map_dma_buf,
- .unmap_dma_buf = rstmem_dma_unmap_dma_buf,
- .begin_cpu_access = rstmem_dma_buf_begin_cpu_access,
- .end_cpu_access = rstmem_dma_buf_end_cpu_access,
- .mmap = rstmem_dma_buf_mmap,
- .release = rstmem_dma_buf_free,
+};
+struct dma_buf *tee_rstmem_alloc(struct tee_context *ctx, u32 flags,
u32 use_case, size_t size, int *shm_id)
+{
- struct tee_device *teedev = ctx->teedev;
- DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
- struct dma_buf *dmabuf;
- struct tee_shm *shm;
- void *ret;
- int rc;
- if (!tee_device_get(teedev))
return ERR_PTR(-EINVAL);
- if (!teedev->desc->ops->rstmem_alloc ||
!teedev->desc->ops->rstmem_free) {
dmabuf = ERR_PTR(-EINVAL);
goto err;
- }
- shm = kzalloc(sizeof(*shm), GFP_KERNEL);
- if (!shm) {
dmabuf = ERR_PTR(-ENOMEM);
goto err;
- }
- refcount_set(&shm->refcount, 1);
- shm->flags = TEE_SHM_RESTRICTED;
- shm->use_case = use_case;
- shm->ctx = ctx;
- mutex_lock(&teedev->mutex);
- shm->id = idr_alloc(&teedev->idr, NULL, 1, 0, GFP_KERNEL);
- mutex_unlock(&teedev->mutex);
- if (shm->id < 0) {
dmabuf = ERR_PTR(shm->id);
goto err_kfree;
- }
- rc = teedev->desc->ops->rstmem_alloc(ctx, shm, flags, use_case, size);
- if (rc) {
dmabuf = ERR_PTR(rc);
goto err_idr_remove;
- }
- mutex_lock(&teedev->mutex);
- ret = idr_replace(&teedev->idr, shm, shm->id);
- mutex_unlock(&teedev->mutex);
- if (IS_ERR(ret)) {
dmabuf = ret;
goto err_rstmem_free;
- }
- teedev_ctx_get(ctx);
- exp_info.ops = &rstmem_generic_buf_ops;
- exp_info.size = shm->size;
- exp_info.priv = shm;
- dmabuf = dma_buf_export(&exp_info);
- if (IS_ERR(dmabuf)) {
tee_shm_put(shm);
return dmabuf;
- }
- *shm_id = shm->id;
- return dmabuf;
+err_rstmem_free:
- teedev->desc->ops->rstmem_free(ctx, shm);
+err_idr_remove:
- mutex_lock(&teedev->mutex);
- idr_remove(&teedev->idr, shm->id);
- mutex_unlock(&teedev->mutex);
+err_kfree:
- kfree(shm);
+err:
- tee_device_put(teedev);
- return dmabuf;
+} diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c index daf6e5cfd59a..416f7f25d885 100644 --- a/drivers/tee/tee_shm.c +++ b/drivers/tee/tee_shm.c @@ -55,6 +55,8 @@ static void tee_shm_release(struct tee_device *teedev, struct tee_shm *shm) "unregister shm %p failed: %d", shm, rc); release_registered_pages(shm);
- } else if (shm->flags & TEE_SHM_RESTRICTED) {
}teedev->desc->ops->rstmem_free(shm->ctx, shm);
teedev_ctx_put(shm->ctx); diff --git a/drivers/tee/tee_shm_pool.c b/drivers/tee/tee_shm_pool.c index 80004b55628d..ee57ef157a77 100644 --- a/drivers/tee/tee_shm_pool.c +++ b/drivers/tee/tee_shm_pool.c @@ -1,9 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-only /*
- Copyright (c) 2015, 2017, 2022 Linaro Limited
*/
- Copyright (c) 2015, 2017, 2022, 2024 Linaro Limited
#include <linux/device.h> -#include <linux/dma-buf.h> #include <linux/genalloc.h> #include <linux/slab.h> #include <linux/tee_core.h> @@ -90,3 +89,69 @@ struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr, return ERR_PTR(rc); } EXPORT_SYMBOL_GPL(tee_shm_pool_alloc_res_mem);
+static int rstmem_pool_op_gen_alloc(struct tee_shm_pool *pool,
struct tee_shm *shm, size_t size,
size_t align)
+{
- size_t sz = ALIGN(size, PAGE_SIZE);
- phys_addr_t pa;
- pa = gen_pool_alloc(pool->private_data, sz);
- if (!pa)
return -ENOMEM;
- shm->size = sz;
- shm->paddr = pa;
- return 0;
+}
+static void rstmem_pool_op_gen_free(struct tee_shm_pool *pool,
struct tee_shm *shm)
+{
- gen_pool_free(pool->private_data, shm->paddr, shm->size);
- shm->paddr = 0;
+}
+static struct tee_shm_pool_ops rstmem_pool_ops_generic = {
- .alloc = rstmem_pool_op_gen_alloc,
- .free = rstmem_pool_op_gen_free,
- .destroy_pool = pool_op_gen_destroy_pool,
+};
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size) +{
- const size_t page_mask = PAGE_SIZE - 1;
- struct tee_shm_pool *pool;
- int rc;
- /* Check it's page aligned */
- if ((paddr | size) & page_mask)
return ERR_PTR(-EINVAL);
- pool = kzalloc(sizeof(*pool), GFP_KERNEL);
- if (!pool)
return ERR_PTR(-ENOMEM);
- pool->private_data = gen_pool_create(PAGE_SHIFT, -1);
- if (!pool->private_data) {
rc = -ENOMEM;
goto err_free;
- }
- rc = gen_pool_add(pool->private_data, paddr, size, -1);
- if (rc)
goto err_free_pool;
- pool->ops = &rstmem_pool_ops_generic;
- return pool;
+err_free_pool:
- gen_pool_destroy(pool->private_data);
+err_free:
- kfree(pool);
- return ERR_PTR(rc);
+} +EXPORT_SYMBOL_GPL(tee_rstmem_gen_pool_alloc); diff --git a/include/linux/tee_core.h b/include/linux/tee_core.h index a38494d6b5f4..608302f494fe 100644 --- a/include/linux/tee_core.h +++ b/include/linux/tee_core.h @@ -26,6 +26,7 @@ #define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ #define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ #define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ +#define TEE_SHM_RESTRICTED BIT(4) /* Restricted memory */ #define TEE_DEVICE_FLAG_REGISTERED 0x1 #define TEE_MAX_DEV_NAME_LEN 32 @@ -76,6 +77,8 @@ struct tee_device {
- @supp_send: called for supplicant to send a response
- @shm_register: register shared memory buffer in TEE
- @shm_unregister: unregister shared memory buffer in TEE
- @rstmem_alloc: allocate restricted memory
*/
- @rstmem_free: free restricted memory
struct tee_driver_ops { void (*get_version)(struct tee_device *teedev, @@ -99,6 +102,9 @@ struct tee_driver_ops { struct page **pages, size_t num_pages, unsigned long start); int (*shm_unregister)(struct tee_context *ctx, struct tee_shm *shm);
- int (*rstmem_alloc)(struct tee_context *ctx, struct tee_shm *shm,
u32 flags, u32 use_case, size_t size);
- void (*rstmem_free)(struct tee_context *ctx, struct tee_shm *shm);
}; /** @@ -229,6 +235,15 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool) pool->ops->destroy_pool(pool); } +/**
- tee_rstmem_gen_pool_alloc() - Create a restricted memory manager
- @paddr: Physical address of start of pool
- @size: Size in bytes of the pool
- @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure.
- */
+struct tee_shm_pool *tee_rstmem_gen_pool_alloc(phys_addr_t paddr, size_t size);
/**
- tee_get_drvdata() - Return driver_data pointer
- @returns the driver_data pointer supplied to tee_register().
diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index a54c203000ed..cba067715d14 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -55,6 +55,7 @@ struct tee_context {
- @pages: locked pages from userspace
- @num_pages: number of locked pages
- @refcount: reference counter
- @use_case: defined by TEE_IOC_UC_* in tee.h
- @flags: defined by TEE_SHM_* in tee_core.h
- @id: unique id of a shared memory object on this device, shared
with user space
@@ -71,6 +72,7 @@ struct tee_shm { struct page **pages; size_t num_pages; refcount_t refcount;
- u32 use_case; u32 flags; int id; u64 sec_world_id;
diff --git a/include/uapi/linux/tee.h b/include/uapi/linux/tee.h index d0430bee8292..88834448debb 100644 --- a/include/uapi/linux/tee.h +++ b/include/uapi/linux/tee.h @@ -1,5 +1,5 @@ /*
- Copyright (c) 2015-2016, Linaro Limited
- Copyright (c) 2015-2017, 2020, 2024, Linaro Limited
- All rights reserved.
- Redistribution and use in source and binary forms, with or without
@@ -48,6 +48,7 @@ #define TEE_GEN_CAP_PRIVILEGED (1 << 1)/* Privileged device (for supplicant) */ #define TEE_GEN_CAP_REG_MEM (1 << 2)/* Supports registering shared memory */ #define TEE_GEN_CAP_MEMREF_NULL (1 << 3)/* NULL MemRef support */ +#define TEE_GEN_CAP_RSTMEM (1 << 4)/* Supports restricted memory */ #define TEE_MEMREF_NULL (__u64)(-1) /* NULL MemRef Buffer */ @@ -389,6 +390,47 @@ struct tee_ioctl_shm_register_data { */ #define TEE_IOC_SHM_REGISTER _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 9, \ struct tee_ioctl_shm_register_data)
+#define TEE_IOC_UC_RESERVED 0 +#define TEE_IOC_UC_SECURE_VIDEO_PLAY 1 +#define TEE_IOC_UC_TRUSTED_UI 2 +#define TEE_IOC_UC_SECURE_VIDEO_RECORD 3
+/**
- struct tee_ioctl_rstmem_alloc_data - Restricted memory allocate argument
- @size: [in/out] Size of restricted memory to allocate
- @flags: [in/out] Flags to/from allocate
- @use_case [in] Restricted memory use case, TEE_IOC_UC_*
- @id: [out] Identifier of the restricted memory
- */
+struct tee_ioctl_rstmem_alloc_data {
- __u64 size;
- __u32 flags;
- __u32 use_case;
- __s32 id;
+};
+/**
- TEE_IOC_RSTMEM_ALLOC - allocate restricted memory
- Allocates restricted physically memory normally not accessible by the
- kernel.
- Restricted memory refers to memory buffers behind a hardware enforced
- firewall. It is not accessible to the kernel during normal circumstances
- but rather only accessible to certain hardware IPs or CPUs executing in
- higher privileged mode than the kernel itself. This interface allows to
- allocate and manage such restricted memory buffers via interaction with
- a TEE implementation.
- Returns a file descriptor on success or < 0 on failure
- The returned file descriptor is a dma-buf that can be attached and
- mapped for device with permission to access the physical memory.
- */
+#define TEE_IOC_RSTMEM_ALLOC _IOWR(TEE_IOC_MAGIC, TEE_IOC_BASE + 10, \
struct tee_ioctl_rstmem_alloc_data)
/*
- Five syscalls are used when communicating with the TEE driver.
- open(): opens the device associated with the driver
The OP-TEE backend driver has two internal function pointers to convert between the subsystem type struct tee_param and the OP-TEE type struct optee_msg_param.
The conversion is done from one of the types to the other, which is then involved in some operation and finally converted back to the original type. When converting to prepare the parameters for the operation, all fields must be taken into account, but then converting back, it's enough to update only out-values and out-sizes. So, an update_out parameter is added to the conversion functions to tell if all or only some fields must be copied.
This is needed in a later patch where it might get confusing when converting back in from_msg_param() callback since an allocated restricted SHM can be using the sec_world_id of the used restricted memory pool and that doesn't translate back well.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org --- drivers/tee/optee/call.c | 10 ++-- drivers/tee/optee/ffa_abi.c | 43 +++++++++++++---- drivers/tee/optee/optee_private.h | 42 +++++++++++------ drivers/tee/optee/rpc.c | 31 +++++++++---- drivers/tee/optee/smc_abi.c | 76 +++++++++++++++++++++++-------- 5 files changed, 144 insertions(+), 58 deletions(-)
diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c index 16eb953e14bb..f1533b894726 100644 --- a/drivers/tee/optee/call.c +++ b/drivers/tee/optee/call.c @@ -400,7 +400,8 @@ int optee_open_session(struct tee_context *ctx, export_uuid(msg_arg->params[1].u.octets, &client_uuid);
rc = optee->ops->to_msg_param(optee, msg_arg->params + 2, - arg->num_params, param); + arg->num_params, param, + false /*!update_out*/); if (rc) goto out;
@@ -427,7 +428,8 @@ int optee_open_session(struct tee_context *ctx, }
if (optee->ops->from_msg_param(optee, param, arg->num_params, - msg_arg->params + 2)) { + msg_arg->params + 2, + true /*update_out*/)) { arg->ret = TEEC_ERROR_COMMUNICATION; arg->ret_origin = TEEC_ORIGIN_COMMS; /* Close session again to avoid leakage */ @@ -541,7 +543,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg, msg_arg->cancel_id = arg->cancel_id;
rc = optee->ops->to_msg_param(optee, msg_arg->params, arg->num_params, - param); + param, false /*!update_out*/); if (rc) goto out;
@@ -551,7 +553,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg, }
if (optee->ops->from_msg_param(optee, param, arg->num_params, - msg_arg->params)) { + msg_arg->params, true /*update_out*/)) { msg_arg->ret = TEEC_ERROR_COMMUNICATION; msg_arg->ret_origin = TEEC_ORIGIN_COMMS; } diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c index f3af5666bb11..02e6175ac5f0 100644 --- a/drivers/tee/optee/ffa_abi.c +++ b/drivers/tee/optee/ffa_abi.c @@ -122,15 +122,21 @@ static int optee_shm_rem_ffa_handle(struct optee *optee, u64 global_id) */
static void from_msg_param_ffa_mem(struct optee *optee, struct tee_param *p, - u32 attr, const struct optee_msg_param *mp) + u32 attr, const struct optee_msg_param *mp, + bool update_out) { struct tee_shm *shm = NULL; u64 offs_high = 0; u64 offs_low = 0;
+ if (update_out) { + if (attr == OPTEE_MSG_ATTR_TYPE_FMEM_INPUT) + return; + goto out; + } + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT + attr - OPTEE_MSG_ATTR_TYPE_FMEM_INPUT; - p->u.memref.size = mp->u.fmem.size;
if (mp->u.fmem.global_id != OPTEE_MSG_FMEM_INVALID_GLOBAL_ID) shm = optee_shm_from_ffa_handle(optee, mp->u.fmem.global_id); @@ -141,6 +147,8 @@ static void from_msg_param_ffa_mem(struct optee *optee, struct tee_param *p, offs_high = mp->u.fmem.offs_high; } p->u.memref.shm_offs = offs_low | offs_high << 32; +out: + p->u.memref.size = mp->u.fmem.size; }
/** @@ -150,12 +158,14 @@ static void from_msg_param_ffa_mem(struct optee *optee, struct tee_param *p, * @params: subsystem internal parameter representation * @num_params: number of elements in the parameter arrays * @msg_params: OPTEE_MSG parameters + * @update_out: update parameter for output only * * Returns 0 on success or <0 on failure */ static int optee_ffa_from_msg_param(struct optee *optee, struct tee_param *params, size_t num_params, - const struct optee_msg_param *msg_params) + const struct optee_msg_param *msg_params, + bool update_out) { size_t n;
@@ -166,18 +176,20 @@ static int optee_ffa_from_msg_param(struct optee *optee,
switch (attr) { case OPTEE_MSG_ATTR_TYPE_NONE: + if (update_out) + break; p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE; memset(&p->u, 0, sizeof(p->u)); break; case OPTEE_MSG_ATTR_TYPE_VALUE_INPUT: case OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT: case OPTEE_MSG_ATTR_TYPE_VALUE_INOUT: - optee_from_msg_param_value(p, attr, mp); + optee_from_msg_param_value(p, attr, mp, update_out); break; case OPTEE_MSG_ATTR_TYPE_FMEM_INPUT: case OPTEE_MSG_ATTR_TYPE_FMEM_OUTPUT: case OPTEE_MSG_ATTR_TYPE_FMEM_INOUT: - from_msg_param_ffa_mem(optee, p, attr, mp); + from_msg_param_ffa_mem(optee, p, attr, mp, update_out); break; default: return -EINVAL; @@ -188,10 +200,16 @@ static int optee_ffa_from_msg_param(struct optee *optee, }
static int to_msg_param_ffa_mem(struct optee_msg_param *mp, - const struct tee_param *p) + const struct tee_param *p, bool update_out) { struct tee_shm *shm = p->u.memref.shm;
+ if (update_out) { + if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT) + return 0; + goto out; + } + mp->attr = OPTEE_MSG_ATTR_TYPE_FMEM_INPUT + p->attr - TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT;
@@ -211,6 +229,7 @@ static int to_msg_param_ffa_mem(struct optee_msg_param *mp, memset(&mp->u, 0, sizeof(mp->u)); mp->u.fmem.global_id = OPTEE_MSG_FMEM_INVALID_GLOBAL_ID; } +out: mp->u.fmem.size = p->u.memref.size;
return 0; @@ -222,13 +241,15 @@ static int to_msg_param_ffa_mem(struct optee_msg_param *mp, * @optee: main service struct * @msg_params: OPTEE_MSG parameters * @num_params: number of elements in the parameter arrays - * @params: subsystem itnernal parameter representation + * @params: subsystem internal parameter representation + * @update_out: update parameter for output only * Returns 0 on success or <0 on failure */ static int optee_ffa_to_msg_param(struct optee *optee, struct optee_msg_param *msg_params, size_t num_params, - const struct tee_param *params) + const struct tee_param *params, + bool update_out) { size_t n;
@@ -238,18 +259,20 @@ static int optee_ffa_to_msg_param(struct optee *optee,
switch (p->attr) { case TEE_IOCTL_PARAM_ATTR_TYPE_NONE: + if (update_out) + break; mp->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE; memset(&mp->u, 0, sizeof(mp->u)); break; case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: - optee_to_msg_param_value(mp, p); + optee_to_msg_param_value(mp, p, update_out); break; case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: - if (to_msg_param_ffa_mem(mp, p)) + if (to_msg_param_ffa_mem(mp, p, update_out)) return -EINVAL; break; default: diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h index dc0f355ef72a..20eda508dbac 100644 --- a/drivers/tee/optee/optee_private.h +++ b/drivers/tee/optee/optee_private.h @@ -185,10 +185,12 @@ struct optee_ops { bool system_thread); int (*to_msg_param)(struct optee *optee, struct optee_msg_param *msg_params, - size_t num_params, const struct tee_param *params); + size_t num_params, const struct tee_param *params, + bool update_out); int (*from_msg_param)(struct optee *optee, struct tee_param *params, size_t num_params, - const struct optee_msg_param *msg_params); + const struct optee_msg_param *msg_params, + bool update_out); };
/** @@ -316,23 +318,35 @@ void optee_release(struct tee_context *ctx); void optee_release_supp(struct tee_context *ctx);
static inline void optee_from_msg_param_value(struct tee_param *p, u32 attr, - const struct optee_msg_param *mp) + const struct optee_msg_param *mp, + bool update_out) { - p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT + - attr - OPTEE_MSG_ATTR_TYPE_VALUE_INPUT; - p->u.value.a = mp->u.value.a; - p->u.value.b = mp->u.value.b; - p->u.value.c = mp->u.value.c; + if (!update_out) + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT + + attr - OPTEE_MSG_ATTR_TYPE_VALUE_INPUT; + + if (attr == OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT || + attr == OPTEE_MSG_ATTR_TYPE_VALUE_INOUT || !update_out) { + p->u.value.a = mp->u.value.a; + p->u.value.b = mp->u.value.b; + p->u.value.c = mp->u.value.c; + } }
static inline void optee_to_msg_param_value(struct optee_msg_param *mp, - const struct tee_param *p) + const struct tee_param *p, + bool update_out) { - mp->attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + p->attr - - TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT; - mp->u.value.a = p->u.value.a; - mp->u.value.b = p->u.value.b; - mp->u.value.c = p->u.value.c; + if (!update_out) + mp->attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + p->attr - + TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT; + + if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT || + p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT || !update_out) { + mp->u.value.a = p->u.value.a; + mp->u.value.b = p->u.value.b; + mp->u.value.c = p->u.value.c; + } }
void optee_cq_init(struct optee_call_queue *cq, int thread_count); diff --git a/drivers/tee/optee/rpc.c b/drivers/tee/optee/rpc.c index ebbbd42b0e3e..580e6b9b0606 100644 --- a/drivers/tee/optee/rpc.c +++ b/drivers/tee/optee/rpc.c @@ -63,7 +63,7 @@ static void handle_rpc_func_cmd_i2c_transfer(struct tee_context *ctx, }
if (optee->ops->from_msg_param(optee, params, arg->num_params, - arg->params)) + arg->params, false /*!update_out*/)) goto bad;
for (i = 0; i < arg->num_params; i++) { @@ -107,7 +107,8 @@ static void handle_rpc_func_cmd_i2c_transfer(struct tee_context *ctx, } else { params[3].u.value.a = msg.len; if (optee->ops->to_msg_param(optee, arg->params, - arg->num_params, params)) + arg->num_params, params, + true /*update_out*/)) arg->ret = TEEC_ERROR_BAD_PARAMETERS; else arg->ret = TEEC_SUCCESS; @@ -188,6 +189,7 @@ static void handle_rpc_func_cmd_wait(struct optee_msg_arg *arg) static void handle_rpc_supp_cmd(struct tee_context *ctx, struct optee *optee, struct optee_msg_arg *arg) { + bool update_out = false; struct tee_param *params;
arg->ret_origin = TEEC_ORIGIN_COMMS; @@ -200,15 +202,21 @@ static void handle_rpc_supp_cmd(struct tee_context *ctx, struct optee *optee, }
if (optee->ops->from_msg_param(optee, params, arg->num_params, - arg->params)) { + arg->params, update_out)) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; goto out; }
arg->ret = optee_supp_thrd_req(ctx, arg->cmd, arg->num_params, params);
+ /* + * Special treatment for OPTEE_RPC_CMD_SHM_ALLOC since input is a + * value type, but the output is a memref type. + */ + if (arg->cmd != OPTEE_RPC_CMD_SHM_ALLOC) + update_out = true; if (optee->ops->to_msg_param(optee, arg->params, arg->num_params, - params)) + params, update_out)) arg->ret = TEEC_ERROR_BAD_PARAMETERS; out: kfree(params); @@ -270,7 +278,7 @@ static void handle_rpc_func_rpmb_probe_reset(struct tee_context *ctx,
if (arg->num_params != ARRAY_SIZE(params) || optee->ops->from_msg_param(optee, params, arg->num_params, - arg->params) || + arg->params, false /*!update_out*/) || params[0].attr != TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; return; @@ -280,7 +288,8 @@ static void handle_rpc_func_rpmb_probe_reset(struct tee_context *ctx, params[0].u.value.b = 0; params[0].u.value.c = 0; if (optee->ops->to_msg_param(optee, arg->params, - arg->num_params, params)) { + arg->num_params, params, + true /*update_out*/)) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; return; } @@ -324,7 +333,7 @@ static void handle_rpc_func_rpmb_probe_next(struct tee_context *ctx,
if (arg->num_params != ARRAY_SIZE(params) || optee->ops->from_msg_param(optee, params, arg->num_params, - arg->params) || + arg->params, false /*!update_out*/) || params[0].attr != TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT || params[1].attr != TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; @@ -358,7 +367,8 @@ static void handle_rpc_func_rpmb_probe_next(struct tee_context *ctx, params[0].u.value.b = rdev->descr.capacity; params[0].u.value.c = rdev->descr.reliable_wr_count; if (optee->ops->to_msg_param(optee, arg->params, - arg->num_params, params)) { + arg->num_params, params, + true /*update_out*/)) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; return; } @@ -384,7 +394,7 @@ static void handle_rpc_func_rpmb_frames(struct tee_context *ctx,
if (arg->num_params != ARRAY_SIZE(params) || optee->ops->from_msg_param(optee, params, arg->num_params, - arg->params) || + arg->params, false /*!update_out*/) || params[0].attr != TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT || params[1].attr != TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; @@ -401,7 +411,8 @@ static void handle_rpc_func_rpmb_frames(struct tee_context *ctx, goto out; } if (optee->ops->to_msg_param(optee, arg->params, - arg->num_params, params)) { + arg->num_params, params, + true /*update_out*/)) { arg->ret = TEEC_ERROR_BAD_PARAMETERS; goto out; } diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c index 3e33cf2af73b..788919a473d6 100644 --- a/drivers/tee/optee/smc_abi.c +++ b/drivers/tee/optee/smc_abi.c @@ -81,20 +81,26 @@ static int optee_cpuhp_disable_pcpu_irq(unsigned int cpu) */
static int from_msg_param_tmp_mem(struct tee_param *p, u32 attr, - const struct optee_msg_param *mp) + const struct optee_msg_param *mp, + bool update_out) { struct tee_shm *shm; phys_addr_t pa; int rc;
+ if (update_out) { + if (attr == OPTEE_MSG_ATTR_TYPE_TMEM_INPUT) + return 0; + goto out; + } + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT + attr - OPTEE_MSG_ATTR_TYPE_TMEM_INPUT; - p->u.memref.size = mp->u.tmem.size; shm = (struct tee_shm *)(unsigned long)mp->u.tmem.shm_ref; if (!shm) { p->u.memref.shm_offs = 0; p->u.memref.shm = NULL; - return 0; + goto out; }
rc = tee_shm_get_pa(shm, 0, &pa); @@ -103,18 +109,25 @@ static int from_msg_param_tmp_mem(struct tee_param *p, u32 attr,
p->u.memref.shm_offs = mp->u.tmem.buf_ptr - pa; p->u.memref.shm = shm; - +out: + p->u.memref.size = mp->u.tmem.size; return 0; }
static void from_msg_param_reg_mem(struct tee_param *p, u32 attr, - const struct optee_msg_param *mp) + const struct optee_msg_param *mp, + bool update_out) { struct tee_shm *shm;
+ if (update_out) { + if (attr == OPTEE_MSG_ATTR_TYPE_RMEM_INPUT) + return; + goto out; + } + p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT + attr - OPTEE_MSG_ATTR_TYPE_RMEM_INPUT; - p->u.memref.size = mp->u.rmem.size; shm = (struct tee_shm *)(unsigned long)mp->u.rmem.shm_ref;
if (shm) { @@ -124,6 +137,8 @@ static void from_msg_param_reg_mem(struct tee_param *p, u32 attr, p->u.memref.shm_offs = 0; p->u.memref.shm = NULL; } +out: + p->u.memref.size = mp->u.rmem.size; }
/** @@ -133,11 +148,13 @@ static void from_msg_param_reg_mem(struct tee_param *p, u32 attr, * @params: subsystem internal parameter representation * @num_params: number of elements in the parameter arrays * @msg_params: OPTEE_MSG parameters + * @update_out: update parameter for output only * Returns 0 on success or <0 on failure */ static int optee_from_msg_param(struct optee *optee, struct tee_param *params, size_t num_params, - const struct optee_msg_param *msg_params) + const struct optee_msg_param *msg_params, + bool update_out) { int rc; size_t n; @@ -149,25 +166,27 @@ static int optee_from_msg_param(struct optee *optee, struct tee_param *params,
switch (attr) { case OPTEE_MSG_ATTR_TYPE_NONE: + if (update_out) + break; p->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE; memset(&p->u, 0, sizeof(p->u)); break; case OPTEE_MSG_ATTR_TYPE_VALUE_INPUT: case OPTEE_MSG_ATTR_TYPE_VALUE_OUTPUT: case OPTEE_MSG_ATTR_TYPE_VALUE_INOUT: - optee_from_msg_param_value(p, attr, mp); + optee_from_msg_param_value(p, attr, mp, update_out); break; case OPTEE_MSG_ATTR_TYPE_TMEM_INPUT: case OPTEE_MSG_ATTR_TYPE_TMEM_OUTPUT: case OPTEE_MSG_ATTR_TYPE_TMEM_INOUT: - rc = from_msg_param_tmp_mem(p, attr, mp); + rc = from_msg_param_tmp_mem(p, attr, mp, update_out); if (rc) return rc; break; case OPTEE_MSG_ATTR_TYPE_RMEM_INPUT: case OPTEE_MSG_ATTR_TYPE_RMEM_OUTPUT: case OPTEE_MSG_ATTR_TYPE_RMEM_INOUT: - from_msg_param_reg_mem(p, attr, mp); + from_msg_param_reg_mem(p, attr, mp, update_out); break;
default: @@ -178,20 +197,25 @@ static int optee_from_msg_param(struct optee *optee, struct tee_param *params, }
static int to_msg_param_tmp_mem(struct optee_msg_param *mp, - const struct tee_param *p) + const struct tee_param *p, bool update_out) { int rc; phys_addr_t pa;
+ if (update_out) { + if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT) + return 0; + goto out; + } + mp->attr = OPTEE_MSG_ATTR_TYPE_TMEM_INPUT + p->attr - TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT;
mp->u.tmem.shm_ref = (unsigned long)p->u.memref.shm; - mp->u.tmem.size = p->u.memref.size;
if (!p->u.memref.shm) { mp->u.tmem.buf_ptr = 0; - return 0; + goto out; }
rc = tee_shm_get_pa(p->u.memref.shm, p->u.memref.shm_offs, &pa); @@ -201,19 +225,27 @@ static int to_msg_param_tmp_mem(struct optee_msg_param *mp, mp->u.tmem.buf_ptr = pa; mp->attr |= OPTEE_MSG_ATTR_CACHE_PREDEFINED << OPTEE_MSG_ATTR_CACHE_SHIFT; - +out: + mp->u.tmem.size = p->u.memref.size; return 0; }
static int to_msg_param_reg_mem(struct optee_msg_param *mp, - const struct tee_param *p) + const struct tee_param *p, bool update_out) { + if (update_out) { + if (p->attr == TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT) + return 0; + goto out; + } + mp->attr = OPTEE_MSG_ATTR_TYPE_RMEM_INPUT + p->attr - TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT;
mp->u.rmem.shm_ref = (unsigned long)p->u.memref.shm; - mp->u.rmem.size = p->u.memref.size; mp->u.rmem.offs = p->u.memref.shm_offs; +out: + mp->u.rmem.size = p->u.memref.size; return 0; }
@@ -223,11 +255,13 @@ static int to_msg_param_reg_mem(struct optee_msg_param *mp, * @msg_params: OPTEE_MSG parameters * @num_params: number of elements in the parameter arrays * @params: subsystem itnernal parameter representation + * @update_out: update parameter for output only * Returns 0 on success or <0 on failure */ static int optee_to_msg_param(struct optee *optee, struct optee_msg_param *msg_params, - size_t num_params, const struct tee_param *params) + size_t num_params, const struct tee_param *params, + bool update_out) { int rc; size_t n; @@ -238,21 +272,23 @@ static int optee_to_msg_param(struct optee *optee,
switch (p->attr) { case TEE_IOCTL_PARAM_ATTR_TYPE_NONE: + if (update_out) + break; mp->attr = TEE_IOCTL_PARAM_ATTR_TYPE_NONE; memset(&mp->u, 0, sizeof(mp->u)); break; case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_OUTPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT: - optee_to_msg_param_value(mp, p); + optee_to_msg_param_value(mp, p, update_out); break; case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT: case TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_INOUT: if (tee_shm_is_dynamic(p->u.memref.shm)) - rc = to_msg_param_reg_mem(mp, p); + rc = to_msg_param_reg_mem(mp, p, update_out); else - rc = to_msg_param_tmp_mem(mp, p); + rc = to_msg_param_tmp_mem(mp, p, update_out); if (rc) return rc; break;
Update the header files describing the secure world ABI, both with and without FF-A. The ABI is extended to deal with restricted memory, but as usual backward compatible.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org --- drivers/tee/optee/optee_ffa.h | 27 ++++++++++--- drivers/tee/optee/optee_msg.h | 65 ++++++++++++++++++++++++++++++-- drivers/tee/optee/optee_smc.h | 71 ++++++++++++++++++++++++++++++++++- 3 files changed, 154 insertions(+), 9 deletions(-)
diff --git a/drivers/tee/optee/optee_ffa.h b/drivers/tee/optee/optee_ffa.h index 257735ae5b56..7bd037200343 100644 --- a/drivers/tee/optee/optee_ffa.h +++ b/drivers/tee/optee/optee_ffa.h @@ -81,7 +81,7 @@ * as the second MSG arg struct for * OPTEE_FFA_YIELDING_CALL_WITH_ARG. * Bit[31:8]: Reserved (MBZ) - * w5: Bitfield of secure world capabilities OPTEE_FFA_SEC_CAP_* below, + * w5: Bitfield of OP-TEE capabilities OPTEE_FFA_SEC_CAP_* * w6: The maximum secure world notification number * w7: Not used (MBZ) */ @@ -94,6 +94,8 @@ #define OPTEE_FFA_SEC_CAP_ASYNC_NOTIF BIT(1) /* OP-TEE supports probing for RPMB device if needed */ #define OPTEE_FFA_SEC_CAP_RPMB_PROBE BIT(2) +/* OP-TEE supports Restricted Memory for secure data path */ +#define OPTEE_FFA_SEC_CAP_RSTMEM BIT(3)
#define OPTEE_FFA_EXCHANGE_CAPABILITIES OPTEE_FFA_BLOCKING_CALL(2)
@@ -108,7 +110,7 @@ * * Return register usage: * w3: Error code, 0 on success - * w4-w7: Note used (MBZ) + * w4-w7: Not used (MBZ) */ #define OPTEE_FFA_UNREGISTER_SHM OPTEE_FFA_BLOCKING_CALL(3)
@@ -119,16 +121,31 @@ * Call register usage: * w3: Service ID, OPTEE_FFA_ENABLE_ASYNC_NOTIF * w4: Notification value to request bottom half processing, should be - * less than OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE. + * less than OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE * w5-w7: Not used (MBZ) * * Return register usage: * w3: Error code, 0 on success - * w4-w7: Note used (MBZ) + * w4-w7: Not used (MBZ) */ #define OPTEE_FFA_ENABLE_ASYNC_NOTIF OPTEE_FFA_BLOCKING_CALL(5)
-#define OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE 64 +#define OPTEE_FFA_MAX_ASYNC_NOTIF_VALUE 64 + +/* + * Release Restricted memory + * + * Call register usage: + * w3: Service ID, OPTEE_FFA_RECLAIM_RSTMEM + * w4: Shared memory handle, lower bits + * w5: Shared memory handle, higher bits + * w6-w7: Not used (MBZ) + * + * Return register usage: + * w3: Error code, 0 on success + * w4-w7: Note used (MBZ) + */ +#define OPTEE_FFA_RELEASE_RSTMEM OPTEE_FFA_BLOCKING_CALL(8)
/* * Call with struct optee_msg_arg as argument in the supplied shared memory diff --git a/drivers/tee/optee/optee_msg.h b/drivers/tee/optee/optee_msg.h index e8840a82b983..1b558526e7d9 100644 --- a/drivers/tee/optee/optee_msg.h +++ b/drivers/tee/optee/optee_msg.h @@ -133,13 +133,13 @@ struct optee_msg_param_rmem { };
/** - * struct optee_msg_param_fmem - ffa memory reference parameter + * struct optee_msg_param_fmem - FF-A memory reference parameter * @offs_lower: Lower bits of offset into shared memory reference * @offs_upper: Upper bits of offset into shared memory reference * @internal_offs: Internal offset into the first page of shared memory * reference * @size: Size of the buffer - * @global_id: Global identifier of Shared memory + * @global_id: Global identifier of the shared memory */ struct optee_msg_param_fmem { u32 offs_low; @@ -165,7 +165,7 @@ struct optee_msg_param_value { * @attr: attributes * @tmem: parameter by temporary memory reference * @rmem: parameter by registered memory reference - * @fmem: parameter by ffa registered memory reference + * @fmem: parameter by FF-A registered memory reference * @value: parameter by opaque value * @octets: parameter by octet string * @@ -296,6 +296,18 @@ struct optee_msg_arg { */ #define OPTEE_MSG_FUNCID_GET_OS_REVISION 0x0001
+/* + * Values used in OPTEE_MSG_CMD_LEND_RSTMEM below + * OPTEE_MSG_RSTMEM_RESERVED Reserved + * OPTEE_MSG_RSTMEM_SECURE_VIDEO_PLAY Secure Video Playback + * OPTEE_MSG_RSTMEM_TRUSTED_UI Trused UI + * OPTEE_MSG_RSTMEM_SECURE_VIDEO_RECORD Secure Video Recording + */ +#define OPTEE_MSG_RSTMEM_RESERVED 0 +#define OPTEE_MSG_RSTMEM_SECURE_VIDEO_PLAY 1 +#define OPTEE_MSG_RSTMEM_TRUSTED_UI 2 +#define OPTEE_MSG_RSTMEM_SECURE_VIDEO_RECORD 3 + /* * Do a secure call with struct optee_msg_arg as argument * The OPTEE_MSG_CMD_* below defines what goes in struct optee_msg_arg::cmd @@ -337,6 +349,49 @@ struct optee_msg_arg { * OPTEE_MSG_CMD_STOP_ASYNC_NOTIF informs secure world that from now is * normal world unable to process asynchronous notifications. Typically * used when the driver is shut down. + * + * OPTEE_MSG_CMD_LEND_RSTMEM lends restricted memory. The passed normal + * physical memory is restricted from normal world access. The memory + * should be unmapped prior to this call since it becomes inaccessible + * during the request. + * Parameters are passed as: + * [in] param[0].attr OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + * [in] param[0].u.value.a OPTEE_MSG_RSTMEM_* defined above + * [in] param[1].attr OPTEE_MSG_ATTR_TYPE_TMEM_INPUT + * [in] param[1].u.tmem.buf_ptr physical address + * [in] param[1].u.tmem.size size + * [in] param[1].u.tmem.shm_ref holds restricted memory reference + * + * OPTEE_MSG_CMD_RECLAIM_RSTMEM reclaims a previously lent restricted + * memory reference. The physical memory is accessible by the normal world + * after this function has return and can be mapped again. The information + * is passed as: + * [in] param[0].attr OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + * [in] param[0].u.value.a holds restricted memory cookie + * + * OPTEE_MSG_CMD_GET_RSTMEM_CONFIG get configuration for a specific + * restricted memory use case. Parameters are passed as: + * [in] param[0].attr OPTEE_MSG_ATTR_TYPE_VALUE_INOUT + * [in] param[0].value.a OPTEE_MSG_RSTMEM_* + * [in] param[1].attr OPTEE_MSG_ATTR_TYPE_{R,F}MEM_OUTPUT + * [in] param[1].u.{r,f}mem Buffer or NULL + * [in] param[1].u.{r,f}mem.size Provided size of buffer or 0 for query + * output for the restricted use case: + * [out] param[0].value.a Minimal size of SDP memory + * [out] param[0].value.b Required alignment of size and start of + * restricted memory + * [out] param[1].{r,f}mem.size Size of output data + * [out] param[1].{r,f}mem If non-NULL, contains an array of + * uint16_t holding endpoints that + * must be included when lending + * memory for this use case + * + * OPTEE_MSG_CMD_ASSIGN_RSTMEM assigns use-case to restricted memory + * previously lent using the FFA_LEND framework ABI. Parameters are passed + * as: + * [in] param[0].attr OPTEE_MSG_ATTR_TYPE_VALUE_INPUT + * [in] param[0].u.value.a holds restricted memory cookie + * [in] param[0].u.value.b OPTEE_MSG_RSTMEM_* defined above */ #define OPTEE_MSG_CMD_OPEN_SESSION 0 #define OPTEE_MSG_CMD_INVOKE_COMMAND 1 @@ -346,6 +401,10 @@ struct optee_msg_arg { #define OPTEE_MSG_CMD_UNREGISTER_SHM 5 #define OPTEE_MSG_CMD_DO_BOTTOM_HALF 6 #define OPTEE_MSG_CMD_STOP_ASYNC_NOTIF 7 +#define OPTEE_MSG_CMD_LEND_RSTMEM 8 +#define OPTEE_MSG_CMD_RECLAIM_RSTMEM 9 +#define OPTEE_MSG_CMD_GET_RSTMEM_CONFIG 10 +#define OPTEE_MSG_CMD_ASSIGN_RSTMEM 11 #define OPTEE_MSG_FUNCID_CALL_WITH_ARG 0x0004
#endif /* _OPTEE_MSG_H */ diff --git a/drivers/tee/optee/optee_smc.h b/drivers/tee/optee/optee_smc.h index 879426300821..abc379ce190c 100644 --- a/drivers/tee/optee/optee_smc.h +++ b/drivers/tee/optee/optee_smc.h @@ -264,7 +264,6 @@ struct optee_smc_get_shm_config_result { #define OPTEE_SMC_SEC_CAP_HAVE_RESERVED_SHM BIT(0) /* Secure world can communicate via previously unregistered shared memory */ #define OPTEE_SMC_SEC_CAP_UNREGISTERED_SHM BIT(1) - /* * Secure world supports commands "register/unregister shared memory", * secure world accepts command buffers located in any parts of non-secure RAM @@ -280,6 +279,10 @@ struct optee_smc_get_shm_config_result { #define OPTEE_SMC_SEC_CAP_RPC_ARG BIT(6) /* Secure world supports probing for RPMB device if needed */ #define OPTEE_SMC_SEC_CAP_RPMB_PROBE BIT(7) +/* Secure world supports Secure Data Path */ +#define OPTEE_SMC_SEC_CAP_SDP BIT(8) +/* Secure world supports dynamic restricted memory */ +#define OPTEE_SMC_SEC_CAP_DYNAMIC_RSTMEM BIT(9)
#define OPTEE_SMC_FUNCID_EXCHANGE_CAPABILITIES 9 #define OPTEE_SMC_EXCHANGE_CAPABILITIES \ @@ -451,6 +454,72 @@ struct optee_smc_disable_shm_cache_result {
/* See OPTEE_SMC_CALL_WITH_REGD_ARG above */ #define OPTEE_SMC_FUNCID_CALL_WITH_REGD_ARG 19 +/* + * Get Secure Data Path memory config + * + * Returns the Secure Data Path memory config. + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_GET_SDP_CONFIG + * a2-6 Not used, must be zero + * a7 Hypervisor Client ID register + * + * Have config return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1 Physical address of start of SDP memory + * a2 Size of SDP memory + * a3 Not used + * a4-7 Preserved + * + * Not available register usage: + * a0 OPTEE_SMC_RETURN_ENOTAVAIL + * a1-3 Not used + * a4-7 Preserved + */ +#define OPTEE_SMC_FUNCID_GET_SDP_CONFIG 20 +#define OPTEE_SMC_GET_SDP_CONFIG \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_SDP_CONFIG) + +struct optee_smc_get_sdp_config_result { + unsigned long status; + unsigned long start; + unsigned long size; + unsigned long flags; +}; + +/* + * Get Secure Data Path dynamic memory config + * + * Returns the Secure Data Path dynamic memory config. + * + * Call register usage: + * a0 SMC Function ID, OPTEE_SMC_GET_DYN_SHM_CONFIG + * a2-6 Not used, must be zero + * a7 Hypervisor Client ID register + * + * Have config return register usage: + * a0 OPTEE_SMC_RETURN_OK + * a1 Minamal size of SDP memory + * a2 Required alignment of size and start of registered SDP memory + * a3 Not used + * a4-7 Preserved + * + * Not available register usage: + * a0 OPTEE_SMC_RETURN_ENOTAVAIL + * a1-3 Not used + * a4-7 Preserved + */ + +#define OPTEE_SMC_FUNCID_GET_DYN_SDP_CONFIG 21 +#define OPTEE_SMC_GET_DYN_SDP_CONFIG \ + OPTEE_SMC_FAST_CALL_VAL(OPTEE_SMC_FUNCID_GET_DYN_SDP_CONFIG) + +struct optee_smc_get_dyn_sdp_config_result { + unsigned long status; + unsigned long size; + unsigned long align; + unsigned long flags; +};
/* * Resume from RPC (for example after processing a foreign interrupt)
Add support in the OP-TEE backend driver for restricted memory allocation. The support is limited to only the SMC ABI and for secure video buffers.
OP-TEE is probed for the range of restricted physical memory and a memory pool allocator is initialized if OP-TEE have support for such memory.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org --- drivers/tee/optee/Makefile | 1 + drivers/tee/optee/core.c | 1 + drivers/tee/optee/optee_private.h | 23 ++++++++++ drivers/tee/optee/rstmem.c | 76 +++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 69 ++++++++++++++++++++++++++-- 5 files changed, 167 insertions(+), 3 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c
diff --git a/drivers/tee/optee/Makefile b/drivers/tee/optee/Makefile index a6eff388d300..498969fb8e40 100644 --- a/drivers/tee/optee/Makefile +++ b/drivers/tee/optee/Makefile @@ -4,6 +4,7 @@ optee-objs += core.o optee-objs += call.o optee-objs += notif.o optee-objs += rpc.o +optee-objs += rstmem.o optee-objs += supp.o optee-objs += device.o optee-objs += smc_abi.o diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c index c75fddc83576..f4fa494789a4 100644 --- a/drivers/tee/optee/core.c +++ b/drivers/tee/optee/core.c @@ -182,6 +182,7 @@ void optee_remove_common(struct optee *optee) tee_device_unregister(optee->teedev);
tee_shm_pool_free(optee->pool); + optee_rstmem_pools_uninit(optee); optee_supp_uninit(&optee->supp); mutex_destroy(&optee->call_queue.mutex); rpmb_dev_put(optee->rpmb_dev); diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h index 20eda508dbac..0491889e5b0e 100644 --- a/drivers/tee/optee/optee_private.h +++ b/drivers/tee/optee/optee_private.h @@ -193,6 +193,20 @@ struct optee_ops { bool update_out); };
+/** + * struct optee_rstmem_pools - restricted memory pools + * @mutex: serializes write access to @xa when adding a new pool. + * @xa: XArray of struct tee_shm_pool where the index is the + * use case ID TEE_IOC_UC_* supplied for TEE_IOC_RSTMEM_ALLOC. + */ +struct optee_rstmem_pools { + /* + * Serializes write access to @xa when adding a new pool. + */ + struct mutex mutex; + struct xarray xa; +}; + /** * struct optee - main service struct * @supp_teedev: supplicant device @@ -206,6 +220,7 @@ struct optee_ops { * @notif: notification synchronization struct * @supp: supplicant synchronization struct for RPC to supplicant * @pool: shared memory pool + * @rstmem_pool: restricted memory pool for secure data path * @mutex: mutex protecting @rpmb_dev * @rpmb_dev: current RPMB device or NULL * @rpmb_scan_bus_done flag if device registation of RPMB dependent devices @@ -230,6 +245,7 @@ struct optee { struct optee_notif notif; struct optee_supp supp; struct tee_shm_pool *pool; + struct optee_rstmem_pools *rstmem_pools; /* Protects rpmb_dev pointer */ struct mutex rpmb_dev_mutex; struct rpmb_dev *rpmb_dev; @@ -286,6 +302,9 @@ void optee_supp_init(struct optee_supp *supp); void optee_supp_uninit(struct optee_supp *supp); void optee_supp_release(struct optee_supp *supp);
+int optee_rstmem_pools_init(struct optee *optee); +void optee_rstmem_pools_uninit(struct optee *optee); + int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params, struct tee_param *param); int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params, @@ -378,6 +397,10 @@ void optee_rpc_cmd(struct tee_context *ctx, struct optee *optee, int optee_do_bottom_half(struct tee_context *ctx); int optee_stop_async_notif(struct tee_context *ctx);
+int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm, + u32 flags, u32 use_case, size_t size); +void optee_rstmem_free(struct tee_context *ctx, struct tee_shm *shm); + /* * Small helpers */ diff --git a/drivers/tee/optee/rstmem.c b/drivers/tee/optee/rstmem.c new file mode 100644 index 000000000000..01456bc3e2f6 --- /dev/null +++ b/drivers/tee/optee/rstmem.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2024, Linaro Limited + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/dma-map-ops.h> +#include <linux/errno.h> +#include <linux/genalloc.h> +#include <linux/slab.h> +#include <linux/string.h> +#include <linux/tee_core.h> +#include <linux/types.h> +#include "optee_private.h" + +int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm, + u32 flags, u32 use_case, size_t size) +{ + struct optee *optee = tee_get_drvdata(ctx->teedev); + struct tee_shm_pool *pool; + + if (!optee->rstmem_pools) + return -EINVAL; + if (flags) + return -EINVAL; + + pool = xa_load(&optee->rstmem_pools->xa, use_case); + if (!pool) + return -EINVAL; + + return pool->ops->alloc(pool, shm, size, 0); +} + +void optee_rstmem_free(struct tee_context *ctx, struct tee_shm *shm) +{ + struct optee *optee = tee_get_drvdata(ctx->teedev); + struct tee_shm_pool *pool; + + pool = xa_load(&optee->rstmem_pools->xa, shm->use_case); + if (pool) + pool->ops->free(pool, shm); + else + pr_err("Can't find pool for use_case %u\n", shm->use_case); +} + +int optee_rstmem_pools_init(struct optee *optee) +{ + struct optee_rstmem_pools *pools; + + pools = kmalloc(sizeof(*pools), GFP_KERNEL); + if (!pools) + return -ENOMEM; + + mutex_init(&pools->mutex); + xa_init(&pools->xa); + optee->rstmem_pools = pools; + return 0; +} + +void optee_rstmem_pools_uninit(struct optee *optee) +{ + if (optee->rstmem_pools) { + struct tee_shm_pool *pool; + u_long idx; + + xa_for_each(&optee->rstmem_pools->xa, idx, pool) { + xa_erase(&optee->rstmem_pools->xa, idx); + pool->ops->destroy_pool(pool); + } + + xa_destroy(&optee->rstmem_pools->xa); + mutex_destroy(&optee->rstmem_pools->mutex); + kfree(optee->rstmem_pools); + optee->rstmem_pools = NULL; + } +} diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c index 788919a473d6..f5fd5f1d9a6b 100644 --- a/drivers/tee/optee/smc_abi.c +++ b/drivers/tee/optee/smc_abi.c @@ -1201,6 +1201,8 @@ static void optee_get_version(struct tee_device *teedev, v.gen_caps |= TEE_GEN_CAP_REG_MEM; if (optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_MEMREF_NULL) v.gen_caps |= TEE_GEN_CAP_MEMREF_NULL; + if (optee->rstmem_pools) + v.gen_caps |= TEE_GEN_CAP_RSTMEM; *vers = v; }
@@ -1223,6 +1225,8 @@ static const struct tee_driver_ops optee_clnt_ops = { .cancel_req = optee_cancel_req, .shm_register = optee_shm_register, .shm_unregister = optee_shm_unregister, + .rstmem_alloc = optee_rstmem_alloc, + .rstmem_free = optee_rstmem_free, };
static const struct tee_desc optee_clnt_desc = { @@ -1239,6 +1243,8 @@ static const struct tee_driver_ops optee_supp_ops = { .supp_send = optee_supp_send, .shm_register = optee_shm_register_supp, .shm_unregister = optee_shm_unregister_supp, + .rstmem_alloc = optee_rstmem_alloc, + .rstmem_free = optee_rstmem_free, };
static const struct tee_desc optee_supp_desc = { @@ -1619,6 +1625,57 @@ static inline int optee_load_fw(struct platform_device *pdev, } #endif
+static int optee_sdp_pool_init(struct optee *optee) +{ + bool sdp = optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_SDP; + struct tee_shm_pool *pool; + int rc; + + /* + * optee_sdp_pools_init() must be called if secure world has any + * SDP capability. If the static carvout is available initialize + * and add a pool for that. + */ + if (!sdp) + return 0; + + rc = optee_rstmem_pools_init(optee); + if (rc) + return rc; + + if (optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_SDP) { + union { + struct arm_smccc_res smccc; + struct optee_smc_get_sdp_config_result result; + } res; + + optee->smc.invoke_fn(OPTEE_SMC_GET_SDP_CONFIG, 0, 0, 0, 0, 0, 0, + 0, &res.smccc); + if (res.result.status != OPTEE_SMC_RETURN_OK) { + pr_err("Secure Data Path service not available\n"); + goto err; + } + + pool = tee_rstmem_gen_pool_alloc(res.result.start, + res.result.size); + if (IS_ERR(pool)) { + rc = PTR_ERR(pool); + goto err; + } + rc = xa_insert(&optee->rstmem_pools->xa, + TEE_IOC_UC_SECURE_VIDEO_PLAY, pool, GFP_KERNEL); + if (rc) { + pool->ops->destroy_pool(pool); + goto err; + } + } + + return 0; +err: + optee_rstmem_pools_uninit(optee); + return rc; +} + static int optee_probe(struct platform_device *pdev) { optee_invoke_fn *invoke_fn; @@ -1714,7 +1771,7 @@ static int optee_probe(struct platform_device *pdev) optee = kzalloc(sizeof(*optee), GFP_KERNEL); if (!optee) { rc = -ENOMEM; - goto err_free_pool; + goto err_free_shm_pool; }
optee->ops = &optee_ops; @@ -1726,10 +1783,14 @@ static int optee_probe(struct platform_device *pdev) (sec_caps & OPTEE_SMC_SEC_CAP_RPMB_PROBE)) optee->in_kernel_rpmb_routing = true;
+ rc = optee_sdp_pool_init(optee); + if (rc) + goto err_free_optee; + teedev = tee_device_alloc(&optee_clnt_desc, NULL, pool, optee); if (IS_ERR(teedev)) { rc = PTR_ERR(teedev); - goto err_free_optee; + goto err_rstmem_pools_uninit; } optee->teedev = teedev;
@@ -1836,9 +1897,11 @@ static int optee_probe(struct platform_device *pdev) tee_device_unregister(optee->supp_teedev); err_unreg_teedev: tee_device_unregister(optee->teedev); +err_rstmem_pools_uninit: + optee_rstmem_pools_uninit(optee); err_free_optee: kfree(optee); -err_free_pool: +err_free_shm_pool: tee_shm_pool_free(pool); if (memremaped_shm) memunmap(memremaped_shm);
Add support in the OP-TEE backend driver dynamic restricted memory allocation with FF-A.
The restricted memory pools for dynamically allocated restrict memory are instantiated when requested by user-space. This instantiation can fail if OP-TEE doesn't support the requested use-case of restricted memory.
Restricted memory pools based on a static carveout or dynamic allocation can coexist for different use-cases. We use only dynamic allocation with FF-A.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org --- drivers/tee/optee/ffa_abi.c | 135 ++++++++++++- drivers/tee/optee/optee_private.h | 10 +- drivers/tee/optee/rstmem.c | 316 +++++++++++++++++++++++++++++- 3 files changed, 457 insertions(+), 4 deletions(-)
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c index 02e6175ac5f0..f500cf101c8d 100644 --- a/drivers/tee/optee/ffa_abi.c +++ b/drivers/tee/optee/ffa_abi.c @@ -672,6 +672,122 @@ static int optee_ffa_do_call_with_arg(struct tee_context *ctx, return optee_ffa_yielding_call(ctx, &data, rpc_arg, system_thread); }
+static int do_call_lend_rstmem(struct optee *optee, u64 cookie, u32 use_case) +{ + struct optee_shm_arg_entry *entry; + struct optee_msg_arg *msg_arg; + struct tee_shm *shm; + u_int offs; + int rc; + + msg_arg = optee_get_msg_arg(optee->ctx, 1, &entry, &shm, &offs); + if (IS_ERR(msg_arg)) + return PTR_ERR(msg_arg); + + msg_arg->cmd = OPTEE_MSG_CMD_ASSIGN_RSTMEM; + msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT; + msg_arg->params[0].u.value.a = cookie; + msg_arg->params[0].u.value.b = use_case; + + rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false); + if (rc) + goto out; + if (msg_arg->ret != TEEC_SUCCESS) { + rc = -EINVAL; + goto out; + } + +out: + optee_free_msg_arg(optee->ctx, entry, offs); + return rc; +} + +static int optee_ffa_lend_rstmem(struct optee *optee, struct tee_shm *rstmem, + u16 *end_points, unsigned int ep_count) +{ + struct ffa_device *ffa_dev = optee->ffa.ffa_dev; + const struct ffa_mem_ops *mem_ops = ffa_dev->ops->mem_ops; + const struct ffa_msg_ops *msg_ops = ffa_dev->ops->msg_ops; + struct ffa_send_direct_data data; + struct ffa_mem_region_attributes *mem_attr; + struct ffa_mem_ops_args args = { + .use_txbuf = true, + .tag = rstmem->use_case, + }; + struct page *page; + struct scatterlist sgl; + unsigned int n; + int rc; + + mem_attr = kcalloc(ep_count, sizeof(*mem_attr), GFP_KERNEL); + for (n = 0; n < ep_count; n++) { + mem_attr[n].receiver = end_points[n]; + mem_attr[n].attrs = FFA_MEM_RW; + } + args.attrs = mem_attr; + args.nattrs = ep_count; + + page = phys_to_page(rstmem->paddr); + sg_init_table(&sgl, 1); + sg_set_page(&sgl, page, rstmem->size, 0); + + args.sg = &sgl; + rc = mem_ops->memory_lend(&args); + kfree(mem_attr); + if (rc) + return rc; + + rc = do_call_lend_rstmem(optee, args.g_handle, rstmem->use_case); + if (rc) + goto err_reclaim; + + rc = optee_shm_add_ffa_handle(optee, rstmem, args.g_handle); + if (rc) + goto err_unreg; + + rstmem->sec_world_id = args.g_handle; + + return 0; + +err_unreg: + data = (struct ffa_send_direct_data){ + .data0 = OPTEE_FFA_RELEASE_RSTMEM, + .data1 = (u32)args.g_handle, + .data2 = (u32)(args.g_handle >> 32), + }; + msg_ops->sync_send_receive(ffa_dev, &data); +err_reclaim: + mem_ops->memory_reclaim(args.g_handle, 0); + return rc; +} + +static int optee_ffa_reclaim_rstmem(struct optee *optee, struct tee_shm *rstmem) +{ + struct ffa_device *ffa_dev = optee->ffa.ffa_dev; + const struct ffa_msg_ops *msg_ops = ffa_dev->ops->msg_ops; + const struct ffa_mem_ops *mem_ops = ffa_dev->ops->mem_ops; + u64 global_handle = rstmem->sec_world_id; + struct ffa_send_direct_data data = { + .data0 = OPTEE_FFA_RELEASE_RSTMEM, + .data1 = (u32)global_handle, + .data2 = (u32)(global_handle >> 32) + }; + int rc; + + optee_shm_rem_ffa_handle(optee, global_handle); + rstmem->sec_world_id = 0; + + rc = msg_ops->sync_send_receive(ffa_dev, &data); + if (rc) + pr_err("Release SHM id 0x%llx rc %d\n", global_handle, rc); + + rc = mem_ops->memory_reclaim(global_handle, 0); + if (rc) + pr_err("mem_reclaim: 0x%llx %d", global_handle, rc); + + return rc; +} + /* * 6. Driver initialization * @@ -785,7 +901,10 @@ static void optee_ffa_get_version(struct tee_device *teedev, .gen_caps = TEE_GEN_CAP_GP | TEE_GEN_CAP_REG_MEM | TEE_GEN_CAP_MEMREF_NULL, }; + struct optee *optee = tee_get_drvdata(teedev);
+ if (optee->rstmem_pools) + v.gen_caps |= TEE_GEN_CAP_RSTMEM; *vers = v; }
@@ -804,6 +923,8 @@ static const struct tee_driver_ops optee_ffa_clnt_ops = { .cancel_req = optee_cancel_req, .shm_register = optee_ffa_shm_register, .shm_unregister = optee_ffa_shm_unregister, + .rstmem_alloc = optee_rstmem_alloc, + .rstmem_free = optee_rstmem_free, };
static const struct tee_desc optee_ffa_clnt_desc = { @@ -820,6 +941,8 @@ static const struct tee_driver_ops optee_ffa_supp_ops = { .supp_send = optee_supp_send, .shm_register = optee_ffa_shm_register, /* same as for clnt ops */ .shm_unregister = optee_ffa_shm_unregister_supp, + .rstmem_alloc = optee_rstmem_alloc, + .rstmem_free = optee_rstmem_free, };
static const struct tee_desc optee_ffa_supp_desc = { @@ -833,6 +956,8 @@ static const struct optee_ops optee_ffa_ops = { .do_call_with_arg = optee_ffa_do_call_with_arg, .to_msg_param = optee_ffa_to_msg_param, .from_msg_param = optee_ffa_from_msg_param, + .lend_rstmem = optee_ffa_lend_rstmem, + .reclaim_rstmem = optee_ffa_reclaim_rstmem, };
static void optee_ffa_remove(struct ffa_device *ffa_dev) @@ -937,11 +1062,17 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev) (sec_caps & OPTEE_FFA_SEC_CAP_RPMB_PROBE)) optee->in_kernel_rpmb_routing = true;
+ if (sec_caps & OPTEE_FFA_SEC_CAP_RSTMEM) { + rc = optee_rstmem_pools_init(optee); + if (rc) + goto err_free_pool; + } + teedev = tee_device_alloc(&optee_ffa_clnt_desc, NULL, optee->pool, optee); if (IS_ERR(teedev)) { rc = PTR_ERR(teedev); - goto err_free_pool; + goto err_free_rstmem_pools; } optee->teedev = teedev;
@@ -1020,6 +1151,8 @@ static int optee_ffa_probe(struct ffa_device *ffa_dev) tee_device_unregister(optee->teedev); err_free_pool: tee_shm_pool_free(pool); +err_free_rstmem_pools: + optee_rstmem_pools_uninit(optee); err_free_optee: kfree(optee); return rc; diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h index 0491889e5b0e..dd2a3a2224bc 100644 --- a/drivers/tee/optee/optee_private.h +++ b/drivers/tee/optee/optee_private.h @@ -174,9 +174,14 @@ struct optee; * @do_call_with_arg: enters OP-TEE in secure world * @to_msg_param: converts from struct tee_param to OPTEE_MSG parameters * @from_msg_param: converts from OPTEE_MSG parameters to struct tee_param + * @lend_rstmem: lends physically contiguous memory as restricted + * memory, inaccessible by the kernel + * @reclaim_rstmem: reclaims restricted memory previously lent with + * @lend_rstmem() and makes it accessible by the + * kernel again * * These OPs are only supposed to be used internally in the OP-TEE driver - * as a way of abstracting the different methogs of entering OP-TEE in + * as a way of abstracting the different methods of entering OP-TEE in * secure world. */ struct optee_ops { @@ -191,6 +196,9 @@ struct optee_ops { size_t num_params, const struct optee_msg_param *msg_params, bool update_out); + int (*lend_rstmem)(struct optee *optee, struct tee_shm *rstmem, + u16 *end_points, unsigned int ep_count); + int (*reclaim_rstmem)(struct optee *optee, struct tee_shm *rstmem); };
/** diff --git a/drivers/tee/optee/rstmem.c b/drivers/tee/optee/rstmem.c index 01456bc3e2f6..8c561d638733 100644 --- a/drivers/tee/optee/rstmem.c +++ b/drivers/tee/optee/rstmem.c @@ -4,6 +4,7 @@ */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/cma.h> #include <linux/dma-map-ops.h> #include <linux/errno.h> #include <linux/genalloc.h> @@ -13,11 +14,313 @@ #include <linux/types.h> #include "optee_private.h"
+#ifdef CONFIG_CMA +struct optee_rstmem_cma_pool { + struct tee_shm_pool pool; + struct page *page; + struct optee *optee; + size_t page_count; + u16 *end_points; + u_int end_point_count; + u_int align; + refcount_t refcount; + struct tee_shm rstmem; + /* Protects when initializing and tearing down this struct */ + struct mutex mutex; +}; + +static struct optee_rstmem_cma_pool * +to_rstmem_cma_pool(struct tee_shm_pool *pool) +{ + return container_of(pool, struct optee_rstmem_cma_pool, pool); +} + +static int init_cma_rstmem(struct optee_rstmem_cma_pool *rp) +{ + struct cma *cma = dev_get_cma_area(&rp->optee->teedev->dev); + int rc; + + rp->page = cma_alloc(cma, rp->page_count, rp->align, true/*no_warn*/); + if (!rp->page) + return -ENOMEM; + + /* + * TODO unmap the memory range since the physical memory will + * become inaccesible after the lend_rstmem() call. + */ + + rp->rstmem.paddr = page_to_phys(rp->page); + rp->rstmem.size = rp->page_count * PAGE_SIZE; + rc = rp->optee->ops->lend_rstmem(rp->optee, &rp->rstmem, + rp->end_points, rp->end_point_count); + if (rc) + goto err_release; + + rp->pool.private_data = gen_pool_create(PAGE_SHIFT, -1); + if (!rp->pool.private_data) { + rc = -ENOMEM; + goto err_reclaim; + } + + rc = gen_pool_add(rp->pool.private_data, rp->rstmem.paddr, + rp->rstmem.size, -1); + if (rc) + goto err_free_pool; + + refcount_set(&rp->refcount, 1); + return 0; + +err_free_pool: + gen_pool_destroy(rp->pool.private_data); +err_reclaim: + rp->optee->ops->reclaim_rstmem(rp->optee, &rp->rstmem); +err_release: + cma_release(cma, rp->page, rp->page_count); + rp->rstmem.paddr = 0; + rp->rstmem.size = 0; + rp->rstmem.sec_world_id = 0; + return rc; +} + +static int get_cma_rstmem(struct optee_rstmem_cma_pool *rp) +{ + int rc = 0; + + if (!refcount_inc_not_zero(&rp->refcount)) { + mutex_lock(&rp->mutex); + if (rp->pool.private_data) { + /* + * Another thread has already initialized the pool + * before us, or the pool was just about to be torn + * down. Either way we only need to increase the + * refcount and we're done. + */ + refcount_inc(&rp->refcount); + } else { + rc = init_cma_rstmem(rp); + } + mutex_unlock(&rp->mutex); + } + + return rc; +} + +static void release_cma_rstmem(struct optee_rstmem_cma_pool *rp) +{ + gen_pool_destroy(rp->pool.private_data); + rp->optee->ops->reclaim_rstmem(rp->optee, &rp->rstmem); + cma_release(dev_get_cma_area(&rp->optee->teedev->dev), rp->page, + rp->page_count); + + rp->pool.private_data = NULL; + rp->page = NULL; + rp->rstmem.paddr = 0; + rp->rstmem.size = 0; + rp->rstmem.sec_world_id = 0; +} + +static void put_cma_rstmem(struct optee_rstmem_cma_pool *rp) +{ + if (refcount_dec_and_test(&rp->refcount)) { + mutex_lock(&rp->mutex); + if (rp->pool.private_data) + release_cma_rstmem(rp); + mutex_unlock(&rp->mutex); + } +} + +static int rstmem_pool_op_cma_alloc(struct tee_shm_pool *pool, + struct tee_shm *shm, size_t size, + size_t align) +{ + struct optee_rstmem_cma_pool *rp = to_rstmem_cma_pool(pool); + size_t sz = ALIGN(size, PAGE_SIZE); + phys_addr_t pa; + int rc; + + rc = get_cma_rstmem(rp); + if (rc) + return rc; + + pa = gen_pool_alloc(rp->pool.private_data, sz); + if (!pa) { + put_cma_rstmem(rp); + return -ENOMEM; + } + + shm->size = sz; + shm->paddr = pa; + shm->offset = pa - page_to_phys(rp->page); + shm->sec_world_id = rp->rstmem.sec_world_id; + + return 0; +} + +static void rstmem_pool_op_cma_free(struct tee_shm_pool *pool, + struct tee_shm *shm) +{ + struct optee_rstmem_cma_pool *rp = to_rstmem_cma_pool(pool); + + gen_pool_free(rp->pool.private_data, shm->paddr, shm->size); + shm->size = 0; + shm->paddr = 0; + shm->offset = 0; + shm->sec_world_id = 0; + put_cma_rstmem(rp); +} + +static void pool_op_cma_destroy_pool(struct tee_shm_pool *pool) +{ + struct optee_rstmem_cma_pool *rp = to_rstmem_cma_pool(pool); + + mutex_destroy(&rp->mutex); + kfree(rp); +} + +static struct tee_shm_pool_ops rstmem_pool_ops_cma = { + .alloc = rstmem_pool_op_cma_alloc, + .free = rstmem_pool_op_cma_free, + .destroy_pool = pool_op_cma_destroy_pool, +}; + +static int get_rstmem_config(struct optee *optee, u32 use_case, + size_t *min_size, u_int *min_align, + u16 *end_points, u_int *ep_count) +{ + struct tee_param params[2] = { + [0] = { + .attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT, + .u.value.a = use_case, + }, + [1] = { + .attr = TEE_IOCTL_PARAM_ATTR_TYPE_MEMREF_OUTPUT, + }, + }; + struct optee_shm_arg_entry *entry; + struct tee_shm *shm_param = NULL; + struct optee_msg_arg *msg_arg; + struct tee_shm *shm; + u_int offs; + int rc; + + if (end_points && *ep_count) { + params[1].u.memref.size = *ep_count * sizeof(*end_points); + shm_param = tee_shm_alloc_priv_buf(optee->ctx, + params[1].u.memref.size); + if (IS_ERR(shm_param)) + return PTR_ERR(shm_param); + params[1].u.memref.shm = shm_param; + } + + msg_arg = optee_get_msg_arg(optee->ctx, ARRAY_SIZE(params), &entry, + &shm, &offs); + if (IS_ERR(msg_arg)) { + rc = PTR_ERR(msg_arg); + goto out_free_shm; + } + msg_arg->cmd = OPTEE_MSG_CMD_GET_RSTMEM_CONFIG; + + rc = optee->ops->to_msg_param(optee, msg_arg->params, + ARRAY_SIZE(params), params, + false /*!update_out*/); + if (rc) + goto out_free_msg; + + rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false); + if (rc) + goto out_free_msg; + if (msg_arg->ret && msg_arg->ret != TEEC_ERROR_SHORT_BUFFER) { + rc = -EINVAL; + goto out_free_msg; + } + + rc = optee->ops->from_msg_param(optee, params, ARRAY_SIZE(params), + msg_arg->params, true /*update_out*/); + if (rc) + goto out_free_msg; + + if (!msg_arg->ret && end_points && + *ep_count < params[1].u.memref.size / sizeof(u16)) { + rc = -EINVAL; + goto out_free_msg; + } + + *min_size = params[0].u.value.a; + *min_align = params[0].u.value.b; + *ep_count = params[1].u.memref.size / sizeof(u16); + + if (msg_arg->ret == TEEC_ERROR_SHORT_BUFFER) { + rc = -ENOSPC; + goto out_free_msg; + } + + if (end_points) + memcpy(end_points, tee_shm_get_va(shm_param, 0), + params[1].u.memref.size); + +out_free_msg: + optee_free_msg_arg(optee->ctx, entry, offs); +out_free_shm: + if (shm_param) + tee_shm_free(shm_param); + return rc; +} + +static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee, u32 use_case) +{ + struct optee_rstmem_cma_pool *rp; + size_t min_size; + int rc; + + rp = kzalloc(sizeof(*rp), GFP_KERNEL); + if (!rp) + return ERR_PTR(-ENOMEM); + rp->rstmem.use_case = use_case; + + rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, NULL, + &rp->end_point_count); + if (rc) { + if (rc != -ENOSPC) + goto err; + rp->end_points = kcalloc(rp->end_point_count, + sizeof(*rp->end_points), GFP_KERNEL); + if (!rp->end_points) { + rc = -ENOMEM; + goto err; + } + rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, + rp->end_points, &rp->end_point_count); + if (rc) + goto err_kfree_eps; + } + + rp->pool.ops = &rstmem_pool_ops_cma; + rp->optee = optee; + rp->page_count = min_size / PAGE_SIZE; + mutex_init(&rp->mutex); + + return &rp->pool; + +err_kfree_eps: + kfree(rp->end_points); +err: + kfree(rp); + return ERR_PTR(rc); +} +#else /*CONFIG_CMA*/ +static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused, + u32 use_case __unused) +{ + return ERR_PTR(-EINVAL); +} +#endif /*CONFIG_CMA*/ + int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm, u32 flags, u32 use_case, size_t size) { struct optee *optee = tee_get_drvdata(ctx->teedev); struct tee_shm_pool *pool; + int rc;
if (!optee->rstmem_pools) return -EINVAL; @@ -25,8 +328,17 @@ int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm, return -EINVAL;
pool = xa_load(&optee->rstmem_pools->xa, use_case); - if (!pool) - return -EINVAL; + if (!pool) { + pool = alloc_rstmem_pool(optee, use_case); + if (IS_ERR(pool)) + return PTR_ERR(pool); + rc = xa_insert(&optee->rstmem_pools->xa, use_case, pool, + GFP_KERNEL); + if (rc) { + pool->ops->destroy_pool(pool); + return rc; + } + }
return pool->ops->alloc(pool, shm, size, 0); }
Hi Jens,
kernel test robot noticed the following build errors:
[auto build test ERROR on fac04efc5c793dccbd07e2d59af9f90b7fc0dca4]
url: https://github.com/intel-lab-lkp/linux/commits/Jens-Wiklander/tee-add-restri... base: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4 patch link: https://lore.kernel.org/r/20241217100809.3962439-6-jens.wiklander%40linaro.o... patch subject: [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation config: arm64-randconfig-002-20241220 (https://download.01.org/0day-ci/archive/20241220/202412202214.OdgzpOke-lkp@i...) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241220/202412202214.OdgzpOke-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202412202214.OdgzpOke-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/tee/optee/rstmem.c:8: In file included from include/linux/dma-map-ops.h:9: In file included from include/linux/dma-mapping.h:8: In file included from include/linux/scatterlist.h:8: In file included from include/linux/mm.h:2223: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~
drivers/tee/optee/rstmem.c:311:67: error: expected ')'
311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused, | ^ drivers/tee/optee/rstmem.c:311:46: note: to match this '(' 311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused, | ^
drivers/tee/optee/rstmem.c:332:35: error: too many arguments to function call, expected single argument 'optee', have 2 arguments
332 | pool = alloc_rstmem_pool(optee, use_case); | ~~~~~~~~~~~~~~~~~ ^~~~~~~~ drivers/tee/optee/rstmem.c:311:29: note: 'alloc_rstmem_pool' declared here 311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused, | ^ ~~~~~~~~~~~~~~~~~~~ 1 warning and 2 errors generated.
vim +311 drivers/tee/optee/rstmem.c
268 269 static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee, u32 use_case) 270 { 271 struct optee_rstmem_cma_pool *rp; 272 size_t min_size; 273 int rc; 274 275 rp = kzalloc(sizeof(*rp), GFP_KERNEL); 276 if (!rp) 277 return ERR_PTR(-ENOMEM); 278 rp->rstmem.use_case = use_case; 279 280 rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, NULL, 281 &rp->end_point_count); 282 if (rc) { 283 if (rc != -ENOSPC) 284 goto err; 285 rp->end_points = kcalloc(rp->end_point_count, 286 sizeof(*rp->end_points), GFP_KERNEL); 287 if (!rp->end_points) { 288 rc = -ENOMEM; 289 goto err; 290 } 291 rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, 292 rp->end_points, &rp->end_point_count); 293 if (rc) 294 goto err_kfree_eps; 295 } 296 297 rp->pool.ops = &rstmem_pool_ops_cma; 298 rp->optee = optee; 299 rp->page_count = min_size / PAGE_SIZE; 300 mutex_init(&rp->mutex); 301 302 return &rp->pool; 303 304 err_kfree_eps: 305 kfree(rp->end_points); 306 err: 307 kfree(rp); 308 return ERR_PTR(rc); 309 } 310 #else /*CONFIG_CMA*/
311 static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
312 u32 use_case __unused) 313 { 314 return ERR_PTR(-EINVAL); 315 } 316 #endif /*CONFIG_CMA*/ 317 318 int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm, 319 u32 flags, u32 use_case, size_t size) 320 { 321 struct optee *optee = tee_get_drvdata(ctx->teedev); 322 struct tee_shm_pool *pool; 323 int rc; 324 325 if (!optee->rstmem_pools) 326 return -EINVAL; 327 if (flags) 328 return -EINVAL; 329 330 pool = xa_load(&optee->rstmem_pools->xa, use_case); 331 if (!pool) {
332 pool = alloc_rstmem_pool(optee, use_case);
333 if (IS_ERR(pool)) 334 return PTR_ERR(pool); 335 rc = xa_insert(&optee->rstmem_pools->xa, use_case, pool, 336 GFP_KERNEL); 337 if (rc) { 338 pool->ops->destroy_pool(pool); 339 return rc; 340 } 341 } 342 343 return pool->ops->alloc(pool, shm, size, 0); 344 } 345
Hi Jens,
kernel test robot noticed the following build errors:
[auto build test ERROR on fac04efc5c793dccbd07e2d59af9f90b7fc0dca4]
url: https://github.com/intel-lab-lkp/linux/commits/Jens-Wiklander/tee-add-restri... base: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4 patch link: https://lore.kernel.org/r/20241217100809.3962439-6-jens.wiklander%40linaro.o... patch subject: [PATCH v4 5/6] optee: FF-A: dynamic restricted memory allocation config: arm64-randconfig-r112-20241221 (https://download.01.org/0day-ci/archive/20241221/202412211952.LOCC8Lbo-lkp@i...) compiler: aarch64-linux-gcc (GCC) 14.2.0 reproduce: (https://download.01.org/0day-ci/archive/20241221/202412211952.LOCC8Lbo-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202412211952.LOCC8Lbo-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/tee/optee/rstmem.c:311:67: error: expected ';', ',' or ')' before '__unused'
311 | static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused, | ^~~~~~~~ drivers/tee/optee/rstmem.c: In function 'optee_rstmem_alloc':
drivers/tee/optee/rstmem.c:332:24: error: implicit declaration of function 'alloc_rstmem_pool' [-Wimplicit-function-declaration]
332 | pool = alloc_rstmem_pool(optee, use_case); | ^~~~~~~~~~~~~~~~~
drivers/tee/optee/rstmem.c:332:22: error: assignment to 'struct tee_shm_pool *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
332 | pool = alloc_rstmem_pool(optee, use_case); | ^
vim +311 drivers/tee/optee/rstmem.c
268 269 static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee, u32 use_case) 270 { 271 struct optee_rstmem_cma_pool *rp; 272 size_t min_size; 273 int rc; 274 275 rp = kzalloc(sizeof(*rp), GFP_KERNEL); 276 if (!rp) 277 return ERR_PTR(-ENOMEM); 278 rp->rstmem.use_case = use_case; 279 280 rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, NULL, 281 &rp->end_point_count); 282 if (rc) { 283 if (rc != -ENOSPC) 284 goto err; 285 rp->end_points = kcalloc(rp->end_point_count, 286 sizeof(*rp->end_points), GFP_KERNEL); 287 if (!rp->end_points) { 288 rc = -ENOMEM; 289 goto err; 290 } 291 rc = get_rstmem_config(optee, use_case, &min_size, &rp->align, 292 rp->end_points, &rp->end_point_count); 293 if (rc) 294 goto err_kfree_eps; 295 } 296 297 rp->pool.ops = &rstmem_pool_ops_cma; 298 rp->optee = optee; 299 rp->page_count = min_size / PAGE_SIZE; 300 mutex_init(&rp->mutex); 301 302 return &rp->pool; 303 304 err_kfree_eps: 305 kfree(rp->end_points); 306 err: 307 kfree(rp); 308 return ERR_PTR(rc); 309 } 310 #else /*CONFIG_CMA*/
311 static struct tee_shm_pool *alloc_rstmem_pool(struct optee *optee __unused,
312 u32 use_case __unused) 313 { 314 return ERR_PTR(-EINVAL); 315 } 316 #endif /*CONFIG_CMA*/ 317 318 int optee_rstmem_alloc(struct tee_context *ctx, struct tee_shm *shm, 319 u32 flags, u32 use_case, size_t size) 320 { 321 struct optee *optee = tee_get_drvdata(ctx->teedev); 322 struct tee_shm_pool *pool; 323 int rc; 324 325 if (!optee->rstmem_pools) 326 return -EINVAL; 327 if (flags) 328 return -EINVAL; 329 330 pool = xa_load(&optee->rstmem_pools->xa, use_case); 331 if (!pool) {
332 pool = alloc_rstmem_pool(optee, use_case);
333 if (IS_ERR(pool)) 334 return PTR_ERR(pool); 335 rc = xa_insert(&optee->rstmem_pools->xa, use_case, pool, 336 GFP_KERNEL); 337 if (rc) { 338 pool->ops->destroy_pool(pool); 339 return rc; 340 } 341 } 342 343 return pool->ops->alloc(pool, shm, size, 0); 344 } 345
Add support in the OP-TEE backend driver for dynamic restricted memory allocation using the SMC ABI.
Signed-off-by: Jens Wiklander jens.wiklander@linaro.org --- drivers/tee/optee/smc_abi.c | 74 +++++++++++++++++++++++++++++++++++-- 1 file changed, 71 insertions(+), 3 deletions(-)
diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c index f5fd5f1d9a6b..25b02dbbc76e 100644 --- a/drivers/tee/optee/smc_abi.c +++ b/drivers/tee/optee/smc_abi.c @@ -1001,6 +1001,67 @@ static int optee_smc_do_call_with_arg(struct tee_context *ctx, return rc; }
+static int optee_smc_lend_rstmem(struct optee *optee, struct tee_shm *rstmem, + u16 *end_points, unsigned int ep_count) +{ + struct optee_shm_arg_entry *entry; + struct optee_msg_arg *msg_arg; + struct tee_shm *shm; + u_int offs; + int rc; + + msg_arg = optee_get_msg_arg(optee->ctx, 2, &entry, &shm, &offs); + if (IS_ERR(msg_arg)) + return PTR_ERR(msg_arg); + + msg_arg->cmd = OPTEE_MSG_CMD_LEND_RSTMEM; + msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_VALUE_INPUT; + msg_arg->params[0].u.value.a = OPTEE_MSG_RSTMEM_SECURE_VIDEO_PLAY; + msg_arg->params[1].attr = OPTEE_MSG_ATTR_TYPE_TMEM_INPUT; + msg_arg->params[1].u.tmem.buf_ptr = rstmem->paddr; + msg_arg->params[1].u.tmem.size = rstmem->size; + msg_arg->params[1].u.tmem.shm_ref = (u_long)rstmem; + + rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false); + if (rc) + goto out; + if (msg_arg->ret != TEEC_SUCCESS) { + rc = -EINVAL; + goto out; + } + +out: + optee_free_msg_arg(optee->ctx, entry, offs); + return rc; +} + +static int optee_smc_reclaim_rstmem(struct optee *optee, struct tee_shm *rstmem) +{ + struct optee_shm_arg_entry *entry; + struct optee_msg_arg *msg_arg; + struct tee_shm *shm; + u_int offs; + int rc; + + msg_arg = optee_get_msg_arg(optee->ctx, 1, &entry, &shm, &offs); + if (IS_ERR(msg_arg)) + return PTR_ERR(msg_arg); + + msg_arg->cmd = OPTEE_MSG_CMD_RECLAIM_RSTMEM; + msg_arg->params[0].attr = OPTEE_MSG_ATTR_TYPE_RMEM_INPUT; + msg_arg->params[0].u.rmem.shm_ref = (u_long)rstmem; + + rc = optee->ops->do_call_with_arg(optee->ctx, shm, offs, false); + if (rc) + goto out; + if (msg_arg->ret != TEEC_SUCCESS) + rc = -EINVAL; + +out: + optee_free_msg_arg(optee->ctx, entry, offs); + return rc; +} + /* * 5. Asynchronous notification */ @@ -1258,6 +1319,8 @@ static const struct optee_ops optee_ops = { .do_call_with_arg = optee_smc_do_call_with_arg, .to_msg_param = optee_to_msg_param, .from_msg_param = optee_from_msg_param, + .lend_rstmem = optee_smc_lend_rstmem, + .reclaim_rstmem = optee_smc_reclaim_rstmem, };
static int enable_async_notif(optee_invoke_fn *invoke_fn) @@ -1627,6 +1690,7 @@ static inline int optee_load_fw(struct platform_device *pdev,
static int optee_sdp_pool_init(struct optee *optee) { + bool dyn_sdp = optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_RSTMEM; bool sdp = optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_SDP; struct tee_shm_pool *pool; int rc; @@ -1634,9 +1698,11 @@ static int optee_sdp_pool_init(struct optee *optee) /* * optee_sdp_pools_init() must be called if secure world has any * SDP capability. If the static carvout is available initialize - * and add a pool for that. + * and add a pool for that. If there's an error from secure world + * we complain but don't call optee_sdp_pools_uninit() unless we + * know that there is no SDP capability left. */ - if (!sdp) + if (!dyn_sdp && !sdp) return 0;
rc = optee_rstmem_pools_init(optee); @@ -1653,7 +1719,9 @@ static int optee_sdp_pool_init(struct optee *optee) 0, &res.smccc); if (res.result.status != OPTEE_SMC_RETURN_OK) { pr_err("Secure Data Path service not available\n"); - goto err; + if (!dyn_sdp) + goto err; + return 0; }
pool = tee_rstmem_gen_pool_alloc(res.result.start,
On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also a use-case parameter. This is used by the backend TEE driver to decide on allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) has been identified so far to serve as examples of what can be expected. More use-cases can be added in userspace ABI, but it's up to the backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases requires isolation from different parts of the system. A restricted memory pool can be based on a static carveout instantiated while probing the TEE backend driver, or dynamically allocated from CMA and made restricted as needed by the TEE.
This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v4 repo sync -j8 cd build make toolchains -j$(nproc) make SPMC_AT_EL=1 all -j$(nproc) make SPMC_AT_EL=1 run-only # login and at the prompt: xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test without FF-A using the original SMC ABI instead. Please remember to do %rm -rf ../trusted-firmware-a/build/qemu for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. There are also some negative tests for out of bounds buffers etc.
I think I've dropped this on earlier encrypted dma-buf discussions for TEE, but can't find one right now ...
Do we have some open source userspace for this? To my knowledge we have two implementations of encrypted/content protected dma-buf in upstream right now in the amd and intel gpu drivers, and unless I'm mistaken they both have some minimal userspace supporting EXT_protected_textures:
https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT...
It's not great, but it does just barely clear the bar in my opinion. I guess something in gstreamer or similar video pipeline framework would also do the job.
Especially with the context of the uapi discussion in the v1/RFC thread I think we need more than a bare-bones testcase to make sure this works in actual use.
Cheers, Sima
Thanks, Jens
Changes since V3:
- Make the use_case and flags field in struct tee_shm u32's instead of u16's
- Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
- Import namespace DMA_BUF in module tee, reported by lkp@intel.com
- Added a note in the commit message for "optee: account for direction while converting parameters" why it's needed
- Factor out dynamic restricted memory allocation from "optee: support restricted memory allocation" into two new commits "optee: FF-A: dynamic restricted memory allocation" and "optee: smc abi: dynamic restricted memory allocation"
- Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
- Based on v6.12
- Replaced the flags for SVP and Trusted UID memory with a u32 field with unique id for each use case
- Added dynamic allocation of restricted memory pools
- Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
- Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
- Based on v6.11
- Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
- Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap
- Simplifications and cleanup
- New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support"
- Replaced the word "secure" with "restricted" where applicable
Jens Wiklander (6): tee: add restricted memory allocation optee: account for direction while converting parameters optee: sync secure world ABI headers optee: support restricted memory allocation optee: FF-A: dynamic restricted memory allocation optee: smc abi: dynamic restricted memory allocation
drivers/tee/Makefile | 1 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/call.c | 10 +- drivers/tee/optee/core.c | 1 + drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- drivers/tee/optee/optee_ffa.h | 27 ++- drivers/tee/optee/optee_msg.h | 65 ++++- drivers/tee/optee/optee_private.h | 75 ++++-- drivers/tee/optee/optee_smc.h | 71 +++++- drivers/tee/optee/rpc.c | 31 ++- drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- drivers/tee/tee_core.c | 38 ++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 +++++- include/linux/tee_core.h | 15 ++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++- 20 files changed, 1358 insertions(+), 76 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c create mode 100644 drivers/tee/tee_rstmem.c
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
2.43.0
Hi Simona,
On Wed, 18 Dec 2024 at 16:36, Simona Vetter simona.vetter@ffwll.ch wrote:
On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also a use-case parameter. This is used by the backend TEE driver to decide on allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) has been identified so far to serve as examples of what can be expected. More use-cases can be added in userspace ABI, but it's up to the backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases requires isolation from different parts of the system. A restricted memory pool can be based on a static carveout instantiated while probing the TEE backend driver, or dynamically allocated from CMA and made restricted as needed by the TEE.
This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v4 repo sync -j8 cd build make toolchains -j$(nproc) make SPMC_AT_EL=1 all -j$(nproc) make SPMC_AT_EL=1 run-only # login and at the prompt: xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test without FF-A using the original SMC ABI instead. Please remember to do %rm -rf ../trusted-firmware-a/build/qemu for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. There are also some negative tests for out of bounds buffers etc.
I think I've dropped this on earlier encrypted dma-buf discussions for TEE, but can't find one right now ...
Thanks for raising this query.
Do we have some open source userspace for this? To my knowledge we have two implementations of encrypted/content protected dma-buf in upstream right now in the amd and intel gpu drivers, and unless I'm mistaken they both have some minimal userspace supporting EXT_protected_textures:
First of all to clarify the support Jens is adding here for allocating restricted shared memory allocation in TEE subsystem is meant to be generic and not specific to only secure media pipeline use-case. Then here we not only have open source test applications but rather open source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a core feature where we maintain a stable and extensible ABI among the kernel and the OP-TEE core.
Restricted memory is a feature enforced by hardware specific firewalls where a particular TEE implementation governs which particular block of memory is accessible to a particular peripheral or a CPU running in a higher privileged mode than the Linux kernel. There can be numeric use-cases surrounding that as follows:
- Secure media pipeline where the contents gets decrypted and stored in a restricted buffer which are then accessible only to media display pipeline peripherals. - Trusted user interface where a peripheral takes input from the user and stores it in a restricted buffer which then is accessible to TEE implementation only. - Another possible use-case can be for the TEE implementation to store key material in a restricted buffer which is only accessible to the hardware crypto accelerator.
I am sure there will be more use-cases related to this feature but those will only be possible once we provide a stable and extensible restricted memory interface among the Linux user-space and the secure world user-space (normally referred to as Trusted Applications).
[1] https://github.com/OP-TEE/optee_os/pull/7159
https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT...
It's not great, but it does just barely clear the bar in my opinion. I guess something in gstreamer or similar video pipeline framework would also do the job.
Especially with the context of the uapi discussion in the v1/RFC thread I think we need more than a bare-bones testcase to make sure this works in actual use.
Currently the TEE subsystem already supports a stable ABI for shared memory allocator among Linux user-space and secure world user-space here [2]. And the stable ABI for restricted memory is also along the same lines meant to be a vendor neutral abstraction for the user-space access. The current test cases not only test the interface but also perform regression tests too.
I am also in favour of end to end open source use-cases. But I fear without progressing in a step wise manner as with this proposal we would rather force developers to upstream all the software pieces in one go which will be kind of a chicken and egg situation. I am sure once this feature lands Mediatek folks will be interested to port their secure video playback patchset [3] on top of it. Similarly other silicon vendors like NXP, Qcom etc. will be motivated to do the same.
[2] https://docs.kernel.org/userspace-api/tee.html [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@medi...
-Sumit
Cheers, Sima
Thanks, Jens
Changes since V3:
- Make the use_case and flags field in struct tee_shm u32's instead of u16's
- Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
- Import namespace DMA_BUF in module tee, reported by lkp@intel.com
- Added a note in the commit message for "optee: account for direction while converting parameters" why it's needed
- Factor out dynamic restricted memory allocation from "optee: support restricted memory allocation" into two new commits "optee: FF-A: dynamic restricted memory allocation" and "optee: smc abi: dynamic restricted memory allocation"
- Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
- Based on v6.12
- Replaced the flags for SVP and Trusted UID memory with a u32 field with unique id for each use case
- Added dynamic allocation of restricted memory pools
- Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
- Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
- Based on v6.11
- Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
- Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap
- Simplifications and cleanup
- New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support"
- Replaced the word "secure" with "restricted" where applicable
Jens Wiklander (6): tee: add restricted memory allocation optee: account for direction while converting parameters optee: sync secure world ABI headers optee: support restricted memory allocation optee: FF-A: dynamic restricted memory allocation optee: smc abi: dynamic restricted memory allocation
drivers/tee/Makefile | 1 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/call.c | 10 +- drivers/tee/optee/core.c | 1 + drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- drivers/tee/optee/optee_ffa.h | 27 ++- drivers/tee/optee/optee_msg.h | 65 ++++- drivers/tee/optee/optee_private.h | 75 ++++-- drivers/tee/optee/optee_smc.h | 71 +++++- drivers/tee/optee/rpc.c | 31 ++- drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- drivers/tee/tee_core.c | 38 ++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 +++++- include/linux/tee_core.h | 15 ++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++- 20 files changed, 1358 insertions(+), 76 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c create mode 100644 drivers/tee/tee_rstmem.c
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
2.43.0
-- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
Restricted memory is a feature enforced by hardware specific firewalls where a particular TEE implementation governs which particular block of memory is accessible to a particular peripheral or a CPU running in a higher privileged mode than the Linux kernel.
[...]
- Another possible use-case can be for the TEE implementation to store
key material in a restricted buffer which is only accessible to the hardware crypto accelerator.
Just a heads-up:
For RSA sign/verify operations using rsassa-pkcs1 encoding, the message to be signed/verified (which I understand could be located in restricted memory) is prepended by a padding.
The crypto subsystem does the prepending of the padding in software. The actual signature generation/verification (which is an RSA encrypt or decrypt operation) may be performed in hardware by a crypto accelerator.
Before commit 8552cb04e083 ("crypto: rsassa-pkcs1 - Copy source data for SG list"), the kernel constructed a scatterlist consisting of the padding on the one hand, and of the message to be signed/verified on the other hand. I believe this worked for use cases where the message is located in restricted memory.
However since that commit, the kernel kmalloc's a new buffer and copies the message to be signed/verified into it. The argument was that although the *kernel* may be able to access the data, the crypto accelerator may *not* be able to do so. In particular, portions of the padding are located in the kernel's .rodata section which is a valid virtual address on x86 but not on arm64 and which may be inaccessible to a crypto accelerator.
However in the case of restricted memory, the situation is exactly the opposite: The kernel may *not* be able to access the data, but the crypto accelerator can access it just fine.
I did raise a concern about this to the maintainer, but to no avail: https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
This is the alternative solution I would have preferred: https://lore.kernel.org/r/3de5d373c86dcaa5abc36f501c1398c4fbf05f2f.173286510...
I am also in favour of end to end open source use-cases. But I fear without progressing in a step wise manner as with this proposal we would rather force developers to upstream all the software pieces in one go which will be kind of a chicken and egg situation. I am sure once this feature lands Mediatek folks will be interested to port their secure video playback patchset [3] on top of it. Similarly other silicon vendors like NXP, Qcom etc. will be motivated to do the same.
The crypto use case may be easier to bring up than the video decoding use case because you don't need to implement a huge amount of user space code.
Thanks,
Lukas
On Tue, Dec 24, 2024 at 10:28:31AM +0100, Lukas Wunner wrote:
I did raise a concern about this to the maintainer, but to no avail: https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
Sorry, wrong link. This is the one I meant to copy-paste... :(
On Tue, Dec 24, 2024 at 10:32:41AM +0100, Lukas Wunner wrote:
On Tue, Dec 24, 2024 at 10:28:31AM +0100, Lukas Wunner wrote:
I did raise a concern about this to the maintainer, but to no avail: https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
Sorry, wrong link. This is the one I meant to copy-paste... :(
Herbert asked a logical question, which got no response from your side.
Hi Lukas,
On Tue, 24 Dec 2024 at 14:58, Lukas Wunner lukas@wunner.de wrote:
On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
Restricted memory is a feature enforced by hardware specific firewalls where a particular TEE implementation governs which particular block of memory is accessible to a particular peripheral or a CPU running in a higher privileged mode than the Linux kernel.
[...]
- Another possible use-case can be for the TEE implementation to store
key material in a restricted buffer which is only accessible to the hardware crypto accelerator.
Just a heads-up:
For RSA sign/verify operations using rsassa-pkcs1 encoding, the message to be signed/verified (which I understand could be located in restricted memory) is prepended by a padding.
The crypto subsystem does the prepending of the padding in software. The actual signature generation/verification (which is an RSA encrypt or decrypt operation) may be performed in hardware by a crypto accelerator.
Before commit 8552cb04e083 ("crypto: rsassa-pkcs1 - Copy source data for SG list"), the kernel constructed a scatterlist consisting of the padding on the one hand, and of the message to be signed/verified on the other hand. I believe this worked for use cases where the message is located in restricted memory.
However since that commit, the kernel kmalloc's a new buffer and copies the message to be signed/verified into it. The argument was that although the *kernel* may be able to access the data, the crypto accelerator may *not* be able to do so. In particular, portions of the padding are located in the kernel's .rodata section which is a valid virtual address on x86 but not on arm64 and which may be inaccessible to a crypto accelerator.
However in the case of restricted memory, the situation is exactly the opposite: The kernel may *not* be able to access the data, but the crypto accelerator can access it just fine.
I did raise a concern about this to the maintainer, but to no avail: https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
Herbert's point is valid that there isn't any point for mapping restricted memory in the kernel virtual address space as any kernel access to that space can lead to platform specific hardware error scenarios. And for that reason we simply disallow dma_buf_mmap() and don't support dma_buf_vmap() for DMA-bufs holding TEE restricted memory. The only consumers for those DMA-bufs will be the DMA capable peripherals granted access permissions by the TEE implementation. IOW, kernel role here will be to just provide the DMA-buf infrastructure for buffers to be set up by TEE and then setting up DMA addresses for peripherals to access them. The hardware crypto accelerator can be one such peripheral.
This is the alternative solution I would have preferred: https://lore.kernel.org/r/3de5d373c86dcaa5abc36f501c1398c4fbf05f2f.173286510...
I am also in favour of end to end open source use-cases. But I fear without progressing in a step wise manner as with this proposal we would rather force developers to upstream all the software pieces in one go which will be kind of a chicken and egg situation. I am sure once this feature lands Mediatek folks will be interested to port their secure video playback patchset [3] on top of it. Similarly other silicon vendors like NXP, Qcom etc. will be motivated to do the same.
The crypto use case may be easier to bring up than the video decoding use case because you don't need to implement a huge amount of user space code.
Agree, if you already have such an existing hardware use-case then please feel free to build up on this patch-set.
-Sumit
Thanks,
Lukas
On Thu, Dec 26, 2024 at 11:29:23AM +0530, Sumit Garg wrote:
On Tue, 24 Dec 2024 at 14:58, Lukas Wunner lukas@wunner.de wrote:
However in the case of restricted memory, the situation is exactly the opposite: The kernel may *not* be able to access the data, but the crypto accelerator can access it just fine.
I did raise a concern about this to the maintainer, but to no avail: https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
Herbert's point is valid that there isn't any point for mapping restricted memory in the kernel virtual address space as any kernel access to that space can lead to platform specific hardware error scenarios. And for that reason we simply disallow dma_buf_mmap() and don't support dma_buf_vmap() for DMA-bufs holding TEE restricted memory.
The API for signature generation/verification (e.g. crypto_sig_sign(), crypto_sig_verify()) no longer accepts scatterlists, only buffers in virtual address space:
https://lore.kernel.org/all/ZIrnPcPj9Zbq51jK@gondor.apana.org.au/
Hence in order to use buffers in restricted memory for signature generation/verification, you'd need to map them into virtual address space first.
Thanks,
Lukas
On Thu, Dec 26, 2024 at 12:26:29PM +0100, Lukas Wunner wrote:
On Thu, Dec 26, 2024 at 11:29:23AM +0530, Sumit Garg wrote:
On Tue, 24 Dec 2024 at 14:58, Lukas Wunner lukas@wunner.de wrote:
However in the case of restricted memory, the situation is exactly the opposite: The kernel may *not* be able to access the data, but the crypto accelerator can access it just fine.
I did raise a concern about this to the maintainer, but to no avail: https://lore.kernel.org/r/Z1Kym1-9ka8kGHrM@wunner.de/
Herbert's point is valid that there isn't any point for mapping restricted memory in the kernel virtual address space as any kernel access to that space can lead to platform specific hardware error scenarios. And for that reason we simply disallow dma_buf_mmap() and don't support dma_buf_vmap() for DMA-bufs holding TEE restricted memory.
The API for signature generation/verification (e.g. crypto_sig_sign(), crypto_sig_verify()) no longer accepts scatterlists, only buffers in virtual address space:
https://lore.kernel.org/all/ZIrnPcPj9Zbq51jK@gondor.apana.org.au/
Hence in order to use buffers in restricted memory for signature generation/verification, you'd need to map them into virtual address space first.
Nope, you need to get that old api back. Kernel virtual address space mappings for dma-buf are very intentionally optional. -Sima
On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
Hi Simona,
On Wed, 18 Dec 2024 at 16:36, Simona Vetter simona.vetter@ffwll.ch wrote:
On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also a use-case parameter. This is used by the backend TEE driver to decide on allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) has been identified so far to serve as examples of what can be expected. More use-cases can be added in userspace ABI, but it's up to the backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases requires isolation from different parts of the system. A restricted memory pool can be based on a static carveout instantiated while probing the TEE backend driver, or dynamically allocated from CMA and made restricted as needed by the TEE.
This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v4 repo sync -j8 cd build make toolchains -j$(nproc) make SPMC_AT_EL=1 all -j$(nproc) make SPMC_AT_EL=1 run-only # login and at the prompt: xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test without FF-A using the original SMC ABI instead. Please remember to do %rm -rf ../trusted-firmware-a/build/qemu for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. There are also some negative tests for out of bounds buffers etc.
I think I've dropped this on earlier encrypted dma-buf discussions for TEE, but can't find one right now ...
Thanks for raising this query.
Do we have some open source userspace for this? To my knowledge we have two implementations of encrypted/content protected dma-buf in upstream right now in the amd and intel gpu drivers, and unless I'm mistaken they both have some minimal userspace supporting EXT_protected_textures:
First of all to clarify the support Jens is adding here for allocating restricted shared memory allocation in TEE subsystem is meant to be generic and not specific to only secure media pipeline use-case. Then here we not only have open source test applications but rather open source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a core feature where we maintain a stable and extensible ABI among the kernel and the OP-TEE core.
Restricted memory is a feature enforced by hardware specific firewalls where a particular TEE implementation governs which particular block of memory is accessible to a particular peripheral or a CPU running in a higher privileged mode than the Linux kernel. There can be numeric use-cases surrounding that as follows:
- Secure media pipeline where the contents gets decrypted and stored
in a restricted buffer which are then accessible only to media display pipeline peripherals.
- Trusted user interface where a peripheral takes input from the user
and stores it in a restricted buffer which then is accessible to TEE implementation only.
- Another possible use-case can be for the TEE implementation to store
key material in a restricted buffer which is only accessible to the hardware crypto accelerator.
I am sure there will be more use-cases related to this feature but those will only be possible once we provide a stable and extensible restricted memory interface among the Linux user-space and the secure world user-space (normally referred to as Trusted Applications).
[1] https://github.com/OP-TEE/optee_os/pull/7159
https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT...
It's not great, but it does just barely clear the bar in my opinion. I guess something in gstreamer or similar video pipeline framework would also do the job.
Especially with the context of the uapi discussion in the v1/RFC thread I think we need more than a bare-bones testcase to make sure this works in actual use.
Currently the TEE subsystem already supports a stable ABI for shared memory allocator among Linux user-space and secure world user-space here [2]. And the stable ABI for restricted memory is also along the same lines meant to be a vendor neutral abstraction for the user-space access. The current test cases not only test the interface but also perform regression tests too.
I am also in favour of end to end open source use-cases. But I fear without progressing in a step wise manner as with this proposal we would rather force developers to upstream all the software pieces in one go which will be kind of a chicken and egg situation. I am sure once this feature lands Mediatek folks will be interested to port their secure video playback patchset [3] on top of it. Similarly other silicon vendors like NXP, Qcom etc. will be motivated to do the same.
[2] https://docs.kernel.org/userspace-api/tee.html [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@medi...
We get entire opengl/vulkan driver stacks ready before we merge new drm drivers, I really don't think this is too hard from a technical pov. And I think the mediatek patches had the same issue of lacking userspace for it, so that's not moving things forward. -Sima
-Sumit
Cheers, Sima
Thanks, Jens
Changes since V3:
- Make the use_case and flags field in struct tee_shm u32's instead of u16's
- Add more description for TEE_IOC_RSTMEM_ALLOC in the header file
- Import namespace DMA_BUF in module tee, reported by lkp@intel.com
- Added a note in the commit message for "optee: account for direction while converting parameters" why it's needed
- Factor out dynamic restricted memory allocation from "optee: support restricted memory allocation" into two new commits "optee: FF-A: dynamic restricted memory allocation" and "optee: smc abi: dynamic restricted memory allocation"
- Guard CMA usage with #ifdef CONFIG_CMA, effectively disabling dynamic restricted memory allocate if CMA isn't configured
Changes since the V2 RFC:
- Based on v6.12
- Replaced the flags for SVP and Trusted UID memory with a u32 field with unique id for each use case
- Added dynamic allocation of restricted memory pools
- Added OP-TEE ABI both with and without FF-A for dynamic restricted memory
- Added support for FF-A with FFA_LEND
Changes since the V1 RFC:
- Based on v6.11
- Complete rewrite, replacing the restricted heap with TEE_IOC_RSTMEM_ALLOC
Changes since Olivier's post [2]:
- Based on Yong Wu's post [1] where much of dma-buf handling is done in the generic restricted heap
- Simplifications and cleanup
- New commit message for "dma-buf: heaps: add Linaro restricted dmabuf heap support"
- Replaced the word "secure" with "restricted" where applicable
Jens Wiklander (6): tee: add restricted memory allocation optee: account for direction while converting parameters optee: sync secure world ABI headers optee: support restricted memory allocation optee: FF-A: dynamic restricted memory allocation optee: smc abi: dynamic restricted memory allocation
drivers/tee/Makefile | 1 + drivers/tee/optee/Makefile | 1 + drivers/tee/optee/call.c | 10 +- drivers/tee/optee/core.c | 1 + drivers/tee/optee/ffa_abi.c | 178 +++++++++++++- drivers/tee/optee/optee_ffa.h | 27 ++- drivers/tee/optee/optee_msg.h | 65 ++++- drivers/tee/optee/optee_private.h | 75 ++++-- drivers/tee/optee/optee_smc.h | 71 +++++- drivers/tee/optee/rpc.c | 31 ++- drivers/tee/optee/rstmem.c | 388 ++++++++++++++++++++++++++++++ drivers/tee/optee/smc_abi.c | 213 ++++++++++++++-- drivers/tee/tee_core.c | 38 ++- drivers/tee/tee_private.h | 2 + drivers/tee/tee_rstmem.c | 201 ++++++++++++++++ drivers/tee/tee_shm.c | 2 + drivers/tee/tee_shm_pool.c | 69 +++++- include/linux/tee_core.h | 15 ++ include/linux/tee_drv.h | 2 + include/uapi/linux/tee.h | 44 +++- 20 files changed, 1358 insertions(+), 76 deletions(-) create mode 100644 drivers/tee/optee/rstmem.c create mode 100644 drivers/tee/tee_rstmem.c
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
2.43.0
-- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Wed, 8 Jan 2025 at 22:27, Simona Vetter simona.vetter@ffwll.ch wrote:
On Tue, Dec 24, 2024 at 12:05:19PM +0530, Sumit Garg wrote:
Hi Simona,
On Wed, 18 Dec 2024 at 16:36, Simona Vetter simona.vetter@ffwll.ch wrote:
On Tue, Dec 17, 2024 at 11:07:36AM +0100, Jens Wiklander wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
TEE_IOC_RSTMEM_ALLOC takes in addition to a size and flags parameters also a use-case parameter. This is used by the backend TEE driver to decide on allocation policy and which devices should be able to access the memory.
Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) has been identified so far to serve as examples of what can be expected. More use-cases can be added in userspace ABI, but it's up to the backend TEE drivers to provide the implementation.
Each use-case has it's own restricted memory pool since different use-cases requires isolation from different parts of the system. A restricted memory pool can be based on a static carveout instantiated while probing the TEE backend driver, or dynamically allocated from CMA and made restricted as needed by the TEE.
This can be tested on QEMU with the following steps: repo init -u https://github.com/jenswi-linaro/manifest.git -m qemu_v8.xml \ -b prototype/sdp-v4 repo sync -j8 cd build make toolchains -j$(nproc) make SPMC_AT_EL=1 all -j$(nproc) make SPMC_AT_EL=1 run-only # login and at the prompt: xtest --sdp-basic
The SPMC_AT_EL=1 parameter configures the build with FF-A and an SPMC at S-EL1 inside OP-TEE. The parameter can be changed into SPMC_AT_EL=n to test without FF-A using the original SMC ABI instead. Please remember to do %rm -rf ../trusted-firmware-a/build/qemu for TF-A to be rebuilt properly using the new configuration.
https://optee.readthedocs.io/en/latest/building/prerequisites.html list dependencies needed to build the above.
The tests are pretty basic, mostly checking that a Trusted Application in the secure world can access and manipulate the memory. There are also some negative tests for out of bounds buffers etc.
I think I've dropped this on earlier encrypted dma-buf discussions for TEE, but can't find one right now ...
Thanks for raising this query.
Do we have some open source userspace for this? To my knowledge we have two implementations of encrypted/content protected dma-buf in upstream right now in the amd and intel gpu drivers, and unless I'm mistaken they both have some minimal userspace supporting EXT_protected_textures:
First of all to clarify the support Jens is adding here for allocating restricted shared memory allocation in TEE subsystem is meant to be generic and not specific to only secure media pipeline use-case. Then here we not only have open source test applications but rather open source firmware too (OP-TEE as a Trusted OS) [1] supporting this as a core feature where we maintain a stable and extensible ABI among the kernel and the OP-TEE core.
Restricted memory is a feature enforced by hardware specific firewalls where a particular TEE implementation governs which particular block of memory is accessible to a particular peripheral or a CPU running in a higher privileged mode than the Linux kernel. There can be numeric use-cases surrounding that as follows:
- Secure media pipeline where the contents gets decrypted and stored
in a restricted buffer which are then accessible only to media display pipeline peripherals.
- Trusted user interface where a peripheral takes input from the user
and stores it in a restricted buffer which then is accessible to TEE implementation only.
- Another possible use-case can be for the TEE implementation to store
key material in a restricted buffer which is only accessible to the hardware crypto accelerator.
I am sure there will be more use-cases related to this feature but those will only be possible once we provide a stable and extensible restricted memory interface among the Linux user-space and the secure world user-space (normally referred to as Trusted Applications).
[1] https://github.com/OP-TEE/optee_os/pull/7159
https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/EXT/EXT...
It's not great, but it does just barely clear the bar in my opinion. I guess something in gstreamer or similar video pipeline framework would also do the job.
Especially with the context of the uapi discussion in the v1/RFC thread I think we need more than a bare-bones testcase to make sure this works in actual use.
Currently the TEE subsystem already supports a stable ABI for shared memory allocator among Linux user-space and secure world user-space here [2]. And the stable ABI for restricted memory is also along the same lines meant to be a vendor neutral abstraction for the user-space access. The current test cases not only test the interface but also perform regression tests too.
I am also in favour of end to end open source use-cases. But I fear without progressing in a step wise manner as with this proposal we would rather force developers to upstream all the software pieces in one go which will be kind of a chicken and egg situation. I am sure once this feature lands Mediatek folks will be interested to port their secure video playback patchset [3] on top of it. Similarly other silicon vendors like NXP, Qcom etc. will be motivated to do the same.
[2] https://docs.kernel.org/userspace-api/tee.html [3] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@medi...
We get entire opengl/vulkan driver stacks ready before we merge new drm drivers, I really don't think this is too hard from a technical pov. And I think the mediatek patches had the same issue of lacking userspace for it, so that's not moving things forward. -Sima
Okay fair enough, I think I get your point. Currently we are missing at least one peripheral support being the consumer for these restricted DMA-bufs. So I discussed with Jens offline that we can try with a crypto peripheral use-case first which can simply be demonstrated using the current OP-TEE client user-space.
Also, in crypto peripheral use-case we can target the symmetric crypto use-case first which already has a concept of hardware backed symmetric key [1]. IOW, we should be able to come up with a generic symmetric crypto algorithm which can be supported by different crypto accelerators using a TEE backed restricted key DMA buffer.
[1] https://www.youtube.com/watch?v=GbcpwUBFGDw
-Sumit
+Florent, who's working on protected-mode support in Panthor.
Hi Jens,
On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content).
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators.
Regards,
Boris
[1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t
Hi Boris,
On Thu, 13 Feb 2025 at 01:26, Boris Brezillon boris.brezillon@collabora.com wrote:
+Florent, who's working on protected-mode support in Panthor.
Hi Jens,
On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content).
Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators.
This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically:
- DMA heaps require reliance on DT to discover static restricted regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically. - Dynamic allocation of buffers and making them restricted requires vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction. - TEE subsystem already has a well defined user-space interface for managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines.
[1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqieh... [2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjFt...
-Sumit
Regards,
Boris
[1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t
Hi,
On Thu, Feb 13, 2025 at 7:42 AM Sumit Garg sumit.garg@linaro.org wrote:
Hi Boris,
On Thu, 13 Feb 2025 at 01:26, Boris Brezillon boris.brezillon@collabora.com wrote:
+Florent, who's working on protected-mode support in Panthor.
Hi Jens,
On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content).
Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc.
I'm preparing a demo based on GStreamer to share. It helps with more real-world examples to see that APIs etc work.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators.
This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically:
- DMA heaps require reliance on DT to discover static restricted
regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically.
- Dynamic allocation of buffers and making them restricted requires
vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction.
- TEE subsystem already has a well defined user-space interface for
managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines.
[1] https://lore.kernel.org/lkml/mzur3odofwwrdqnystozjgf3qtvb73wqjm6g2vf5dfsqieh... [2] https://lore.kernel.org/lkml/CAFA6WYPtp3H5JhxzgH9=z2EvNL7Kdku3EmG1aDkTS-gjFt...
Thanks for the good summary. :-)
Cheers, Jens
-Sumit
Regards,
Boris
[1]https://lwn.net/ml/all/cover.1738228114.git.florent.tomasin@arm.com/#t
On Thu, 13 Feb 2025 12:11:52 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Hi Boris,
On Thu, 13 Feb 2025 at 01:26, Boris Brezillon boris.brezillon@collabora.com wrote:
+Florent, who's working on protected-mode support in Panthor.
Hi Jens,
On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content).
Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators.
This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically:
- DMA heaps require reliance on DT to discover static restricted
regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically.
Hm, the system heap [1] doesn't rely on any DT information AFAICT. The dynamic allocation scheme, where the TEE implementation allocates a chunk of protected memory for us would have a similar behavior, I guess.
- Dynamic allocation of buffers and making them restricted requires
vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction.
Yeah, the lend rstmem feature is clearly something tee specific, and I think that's okay to assume the user knows the protection request should go through the tee subsystem in that case.
- TEE subsystem already has a well defined user-space interface for
managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines.
Okay, so the very reason I'm asking about the dma-buf heap interface is because there might be cases where the protected/restricted allocation doesn't go through the TEE (Mediatek has a TEE-free implementation for instance, but I realize vendor implementations are probably not the best selling point :-/). If we expose things as a dma-heap, we have a solution where integrators can pick the dma-heap they think is relevant for protected buffer allocations without the various drivers (GPU, video codec, ...) having to implement a dispatch function for all possible implementations. The same goes for userspace allocations, where passing a dma-heap name, is simpler than supporting different ioctl()s based on the allocation backend.
[1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system...
On Thu, 13 Feb 2025 at 14:06, Boris Brezillon boris.brezillon@collabora.com wrote:
On Thu, 13 Feb 2025 12:11:52 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Hi Boris,
On Thu, 13 Feb 2025 at 01:26, Boris Brezillon boris.brezillon@collabora.com wrote:
+Florent, who's working on protected-mode support in Panthor.
Hi Jens,
On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content).
Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators.
This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically:
- DMA heaps require reliance on DT to discover static restricted
regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically.
Hm, the system heap [1] doesn't rely on any DT information AFAICT.
Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information.
The dynamic allocation scheme, where the TEE implementation allocates a chunk of protected memory for us would have a similar behavior, I guess.
In a dynamic scheme, the allocation will still be from CMA or system heap depending on TEE implementation capabilities but the restriction will be enforced via interaction with TEE.
- Dynamic allocation of buffers and making them restricted requires
vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction.
Yeah, the lend rstmem feature is clearly something tee specific, and I think that's okay to assume the user knows the protection request should go through the tee subsystem in that case.
Yeah but how will the user discover that? Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced.
- TEE subsystem already has a well defined user-space interface for
managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines.
Okay, so the very reason I'm asking about the dma-buf heap interface is because there might be cases where the protected/restricted allocation doesn't go through the TEE (Mediatek has a TEE-free implementation for instance, but I realize vendor implementations are probably not the best selling point :-/).
You can always have a system with memory and peripheral access permissions setup during boot (or even have a pre-configured hardware as a special case) prior to booting up the kernel too. But that even gets somehow configured by a TEE implementation during boot, so calling it a TEE-free implementation seems over-simplified and not a scalable solution. However, this patchset [1] from Mediatek requires runtime TEE interaction too.
[1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@medi...
If we expose things as a dma-heap, we have a solution where integrators can pick the dma-heap they think is relevant for protected buffer allocations without the various drivers (GPU, video codec, ...) having to implement a dispatch function for all possible implementations. The same goes for userspace allocations, where passing a dma-heap name, is simpler than supporting different ioctl()s based on the allocation backend.
There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic.
[1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system...
-Sumit
On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg sumit.garg@linaro.org wrote:
On Thu, 13 Feb 2025 at 14:06, Boris Brezillon boris.brezillon@collabora.com wrote:
On Thu, 13 Feb 2025 12:11:52 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Hi Boris,
On Thu, 13 Feb 2025 at 01:26, Boris Brezillon boris.brezillon@collabora.com wrote:
+Florent, who's working on protected-mode support in Panthor.
Hi Jens,
On Tue, 17 Dec 2024 11:07:36 +0100 Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
This patch set allocates the restricted DMA-bufs via the TEE subsystem.
We're currently working on protected-mode support for Panthor [1] and it looks like your series (and the OP-TEE implementation that goes with it) would allow us to have a fully upstream/open solution for the protected content use case we're trying to support. I need a bit more time to play with the implementation but this looks very promising (especially the lend rstmem feature, which might help us allocate our FW sections that are supposed to execute code accessing protected content).
Glad to hear that, if you can demonstrate an open source use case based on this series then it will help to land it. We really would love to see support for restricted DMA-buf consumers be it GPU, crypto accelerator, media pipeline etc.
The TEE subsystem handles the DMA-buf allocations since it is the TEE (OP-TEE, AMD-TEE, TS-TEE, or perhaps a future QCOMTEE) which sets up the restrictions for the memory used for the DMA-bufs.
I've added a new IOCTL, TEE_IOC_RSTMEM_ALLOC, to allocate the restricted DMA-bufs. This IOCTL reaches the backend TEE driver, allowing it to choose how to allocate the restricted physical memory.
I'll probably have more questions soon, but here's one to start: any particular reason you didn't go for a dma-heap to expose restricted buffer allocation to userspace? I see you already have a cdev you can take ioctl()s from, but my understanding was that dma-heap was the standard solution for these device-agnostic/central allocators.
This series started with the DMA heap approach only here [1] but later discussions [2] lead us here. To point out specifically:
- DMA heaps require reliance on DT to discover static restricted
regions carve-outs whereas via the TEE implementation driver (eg. OP-TEE) those can be discovered dynamically.
Hm, the system heap [1] doesn't rely on any DT information AFAICT.
Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information.
Right, but there's nothing in the DMA heap provider API forcing that.
The dynamic allocation scheme, where the TEE implementation allocates a chunk of protected memory for us would have a similar behavior, I guess.
In a dynamic scheme, the allocation will still be from CMA or system heap depending on TEE implementation capabilities but the restriction will be enforced via interaction with TEE.
Sorry, that's a wording issue. By dynamic allocation I meant the mode where allocations goes through the TEE, not the lend rstmem thing. BTW, calling the lend mode dynamic-allocation is kinda confusing, because in a sense, both modes can be considered dynamic allocation from the user PoV. I get that when the TEE allocates memory, it's picking from its fixed address/size pool, hence the name, but when I first read this, I thought the dynamic mode was the other one, and the static mode was the one where you reserve a mem range from the DT, query it from the driver and pass it to the TEE to restrict access post reservation/static allocation.
- Dynamic allocation of buffers and making them restricted requires
vendor specific driver hooks with DMA heaps whereas the TEE subsystem abstracts that out with underlying TEE implementation (eg. OP-TEE) managing the dynamic buffer restriction.
Yeah, the lend rstmem feature is clearly something tee specific, and I think that's okay to assume the user knows the protection request should go through the tee subsystem in that case.
Yeah but how will the user discover that?
There's nothing to discover here. It would just be explicitly specified:
- for in-kernel users it can be a module parameter (or a DT prop if that's deemed acceptable) - for userspace, it can be an envvar, a config file, or whatever the app/lib uses to get config options
Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced.
If the consensus is that restricted/protected memory allocation should always be routed to the TEE, sure, but I had the feeling this wasn't as clear as that. OTOH, using a dma-heap to expose the TEE-SDP implementation provides the same benefits, without making potential future non-TEE based implementations a pain for users. The dma-heap ioctl being common to all implementations, it just becomes a configuration matter if we want to change the heap we rely on for protected/restricted buffer allocation. And because heaps have unique/well-known names, users can still default to (or rely solely on) the TEE-SPD implementation if they want.
- TEE subsystem already has a well defined user-space interface for
managing shared memory buffers with TEE and restricted DMA buffers will be yet another interface managed along similar lines.
Okay, so the very reason I'm asking about the dma-buf heap interface is because there might be cases where the protected/restricted allocation doesn't go through the TEE (Mediatek has a TEE-free implementation for instance, but I realize vendor implementations are probably not the best selling point :-/).
You can always have a system with memory and peripheral access permissions setup during boot (or even have a pre-configured hardware as a special case) prior to booting up the kernel too. But that even gets somehow configured by a TEE implementation during boot, so calling it a TEE-free implementation seems over-simplified and not a scalable solution. However, this patchset [1] from Mediatek requires runtime TEE interaction too.
[1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@medi...
If we expose things as a dma-heap, we have a solution where integrators can pick the dma-heap they think is relevant for protected buffer allocations without the various drivers (GPU, video codec, ...) having to implement a dispatch function for all possible implementations. The same goes for userspace allocations, where passing a dma-heap name, is simpler than supporting different ioctl()s based on the allocation backend.
There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic.
Just because all previous protected/restricted dma-heap effort failed to make it upstream, doesn't mean dma-heap is the wrong way of exposing this feature IMHO.
Regards,
Boris
Hi,
On Thu, 13 Feb 2025 at 12:40, Boris Brezillon boris.brezillon@collabora.com wrote:
On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information.
Right, but there's nothing in the DMA heap provider API forcing that.
Yeah. DMA heaps are just a way to allocate memory from a specific place. It allows people to settle on having a single way to do allocations from weird platform-specific places; the only weird platform-specific part userspace needs to deal with is figuring out the name to use. The rest is at least a unified API: the point of dma-heaps was exactly to have a single coherent API for userspace, not to create one API for ZONE_CMA and DT ranges and everyone else doing their own thing.
Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced.
If the consensus is that restricted/protected memory allocation should always be routed to the TEE, sure, but I had the feeling this wasn't as clear as that. OTOH, using a dma-heap to expose the TEE-SDP implementation provides the same benefits, without making potential future non-TEE based implementations a pain for users. The dma-heap ioctl being common to all implementations, it just becomes a configuration matter if we want to change the heap we rely on for protected/restricted buffer allocation. And because heaps have unique/well-known names, users can still default to (or rely solely on) the TEE-SPD implementation if they want.
There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic.
Just because all previous protected/restricted dma-heap effort failed to make it upstream, doesn't mean dma-heap is the wrong way of exposing this feature IMHO.
To be fair, having a TEE implementation does give us a much better chance of having a sensible cross-vendor plan. And the fact it's already (sort of accidentally and only on one platform AFAICT) ready for a 'test' interface, where we can still exercise protected allocation paths but without having to go through all the platform-specific setup that is inaccessible to most people, is also really great! That's probably been the biggest barrier to having this tested outside of IHVs and OEMs.
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT? How does userspace pick which TEE device to use? What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
Cheers, Daniel
Hi,
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 12:40, Boris Brezillon boris.brezillon@collabora.com wrote:
On Thu, 13 Feb 2025 14:46:01 +0530 Sumit Garg sumit.garg@linaro.org wrote:
Yeah but all the prior vendor specific secure/restricted DMA heaps relied on DT information.
Right, but there's nothing in the DMA heap provider API forcing that.
Yeah. DMA heaps are just a way to allocate memory from a specific place. It allows people to settle on having a single way to do allocations from weird platform-specific places; the only weird platform-specific part userspace needs to deal with is figuring out the name to use. The rest is at least a unified API: the point of dma-heaps was exactly to have a single coherent API for userspace, not to create one API for ZONE_CMA and DT ranges and everyone else doing their own thing.
Rather than that it's better for the user to directly ask the TEE device to allocate restricted memory without worrying about how the memory restriction gets enforced.
If the consensus is that restricted/protected memory allocation should always be routed to the TEE, sure, but I had the feeling this wasn't as clear as that. OTOH, using a dma-heap to expose the TEE-SDP implementation provides the same benefits, without making potential future non-TEE based implementations a pain for users. The dma-heap ioctl being common to all implementations, it just becomes a configuration matter if we want to change the heap we rely on for protected/restricted buffer allocation. And because heaps have unique/well-known names, users can still default to (or rely solely on) the TEE-SPD implementation if they want.
There have been several attempts with DMA heaps in the past which all resulted in a very vendor specific vertically integrated solution. But the solution with TEE subsystem aims to make it generic and vendor agnostic.
Just because all previous protected/restricted dma-heap effort failed to make it upstream, doesn't mean dma-heap is the wrong way of exposing this feature IMHO.
To be fair, having a TEE implementation does give us a much better chance of having a sensible cross-vendor plan. And the fact it's already (sort of accidentally and only on one platform AFAICT) ready for a 'test' interface, where we can still exercise protected allocation paths but without having to go through all the platform-specific setup that is inaccessible to most people, is also really great! That's probably been the biggest barrier to having this tested outside of IHVs and OEMs.
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
How does userspace pick which TEE device to use?
There's normally only one and even if there is more than one it should be safe to assume that only one of them should be used when allocating restricted memory (TEE_GEN_CAP_RSTMEM from TEE_IOC_VERSION).
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
Cheers, Jens
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
Standardising on heaps allows us to add those in a similar way. If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
Cheers, Daniel
Hi,
On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant.
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself.
Standardising on heaps allows us to add those in a similar way.
How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both?
If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example.
Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps.
Cheers, Jens
Cheers, Daniel
On Fri, 14 Feb 2025 at 15:37, Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant.
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself.
Standardising on heaps allows us to add those in a similar way.
How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both?
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example.
Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps.
+1
-Sumit
On Fri, 14 Feb 2025 18:37:14 +0530 Sumit Garg sumit.garg@linaro.org wrote:
On Fri, 14 Feb 2025 at 15:37, Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant.
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself.
Standardising on heaps allows us to add those in a similar way.
How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both?
I would say one heap per-profile.
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
Hm, I fail to see where Christian dismiss the dma-heaps solution in this email. He even says:
If the memory is not physically attached to any device, but rather just
memory attached to the CPU or a system wide memory controller then expose the memory as DMA-heap with specific requirements (e.g. certain sized pages, contiguous, restricted, encrypted, ...).
If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example.
Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps.
Honestly, when I look at dma-heap implementations, they seem to be trivial shells around existing (more complex) allocators, and the boiler plate [1] to expose a dma-heap is relatively small. The dma-buf implementation, you already have, so we're talking about a hundred lines of code to maintain, which shouldn't be significantly more than what you have for the new ioctl() to be honest. And I'll insist on what Daniel said, it's a small price to pay to have a standard interface to expose to userspace. If dma-heaps are not used for this kind things, I honestly wonder what they will be used for...
Regards,
Boris
[1]https://elixir.bootlin.com/linux/v6.13.2/source/drivers/dma-buf/heaps/system...
On Fri, 14 Feb 2025 at 21:19, Boris Brezillon boris.brezillon@collabora.com wrote:
On Fri, 14 Feb 2025 18:37:14 +0530 Sumit Garg sumit.garg@linaro.org wrote:
On Fri, 14 Feb 2025 at 15:37, Jens Wiklander jens.wiklander@linaro.org wrote:
Hi,
On Thu, Feb 13, 2025 at 6:39 PM Daniel Stone daniel@fooishbar.org wrote:
Hi,
On Thu, 13 Feb 2025 at 15:57, Jens Wiklander jens.wiklander@linaro.org wrote:
On Thu, Feb 13, 2025 at 3:05 PM Daniel Stone daniel@fooishbar.org wrote:
But just because TEE is one good backend implementation, doesn't mean it should be the userspace ABI. Why should userspace care that TEE has mediated the allocation instead of it being a predefined range within DT?
The TEE may very well use a predefined range that part is abstracted with the interface.
Of course. But you can also (and this has been shipped on real devices) handle this without any per-allocation TEE needs by simply allocating from a memory range which is predefined within DT.
From the userspace point of view, why should there be one ABI to allocate memory from a predefined range which is delivered by DT to the kernel, and one ABI to allocate memory from a predefined range which is mediated by TEE?
We need some way to specify the protection profile (or use case as I've called it in the ABI) required for the buffer. Whether it's defined in DT seems irrelevant.
What advantage does userspace get from having to have a different codepath to get a different handle to memory? What about x86?
I think this proposal is looking at it from the wrong direction. Instead of working upwards from the implementation to userspace, start with userspace and work downwards. The interesting property to focus on is allocating memory, not that EL1 is involved behind the scenes.
From what I've gathered from earlier discussions, it wasn't much of a problem for userspace to handle this. If the kernel were to provide it via a different ABI, how would it be easier to implement in the kernel? I think we need an example to understand your suggestion.
It is a problem for userspace, because we need to expose acceptable parameters for allocation through the entire stack. If you look at the dmabuf documentation in the kernel for how buffers should be allocated and exchanged, you can see the negotiation flow for modifiers. This permeates through KMS, EGL, Vulkan, Wayland, GStreamer, and more.
What dma-buf properties are you referring to? dma_heap_ioctl_allocate() accepts a few flags for the resulting file descriptor and no flags for the heap itself.
Standardising on heaps allows us to add those in a similar way.
How would you solve this with heaps? Would you use one heap for each protection profile (use case), add heap_flags, or do a bit of both?
I would say one heap per-profile.
And then it would have a per vendor multiplication factor as each vendor enforces memory restriction in a platform specific manner which won't scale.
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
Hm, I fail to see where Christian dismiss the dma-heaps solution in this email. He even says:
If the memory is not physically attached to any device, but rather just
memory attached to the CPU or a system wide memory controller then expose the memory as DMA-heap with specific requirements (e.g. certain sized pages, contiguous, restricted, encrypted, ...).
I am not saying Christian dismissed DMA heaps but rather how scalability is an issue. What we are proposing here is a generic interface via TEE to the firmware/Trusted OS which can perform all the platform specific memory restrictions. This solution will scale across vendors.
If we have to add different allocation mechanisms, then the complexity increases, permeating not only into all the different userspace APIs, but also into the drivers which need to support every different allocation mechanism even if they have no opinion on it - e.g. Mali doesn't care in any way whether the allocation comes from a heap or TEE or ACPI or whatever, it cares only that the memory is protected.
Does that help?
I think you're missing the stage where an unprotected buffer is received and decrypted into a protected buffer. If you use the TEE for decryption or to configure the involved devices for the use case, it makes sense to let the TEE allocate the buffers, too. A TEE doesn't have to be an OS in the secure world, it can be an abstraction to support the use case depending on the design. So the restricted buffer is already allocated before we reach Mali in your example.
Allocating restricted buffers from the TEE subsystem saves us from maintaining proxy dma-buf heaps.
Honestly, when I look at dma-heap implementations, they seem to be trivial shells around existing (more complex) allocators, and the boiler plate [1] to expose a dma-heap is relatively small. The dma-buf implementation, you already have, so we're talking about a hundred lines of code to maintain, which shouldn't be significantly more than what you have for the new ioctl() to be honest.
It will rather be redundant vendor specific code under DMA heaps calling into firmware/Trusted OS to enforce memory restrictions as you can look into Mediatek example [1]. With TEE subsystem managing that it won't be the case as we will provide a common abstraction for the communication with underlying firmware/Trusted OS.
[1] https://lore.kernel.org/linux-arm-kernel/20240515112308.10171-1-yong.wu@medi...
And I'll insist on what Daniel said, it's a small price to pay to have a standard interface to expose to userspace. If dma-heaps are not used for this kind things, I honestly wonder what they will be used for...
Let's try not to forcefully find a use-case for DMA heaps when there is a better alternative available. I am still failing to see why you don't consider following as a standardised user-space interface:
"When user-space has to work with restricted memory, ask TEE device to allocate it"
-Sumit
Hi Sumit,
On Mon, 17 Feb 2025 at 06:13, Sumit Garg sumit.garg@linaro.org wrote:
On Fri, 14 Feb 2025 at 21:19, Boris Brezillon boris.brezillon@collabora.com wrote:
I would say one heap per-profile.
And then it would have a per vendor multiplication factor as each vendor enforces memory restriction in a platform specific manner which won't scale.
Yes, they do enforce it in a platform-specific manner, but so does TEE. There is no one golden set of semantics which is globally applicable between all hardware and all products in a useful manner.
So, if we define protected,secure-video + protected,secure-video-record + protected,trusted-ui heap names, we have exactly the same number of axes. The only change is from uint32_t to string.
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
Hm, I fail to see where Christian dismiss the dma-heaps solution in this email. He even says:
If the memory is not physically attached to any device, but rather just
memory attached to the CPU or a system wide memory controller then expose the memory as DMA-heap with specific requirements (e.g. certain sized pages, contiguous, restricted, encrypted, ...).
I am not saying Christian dismissed DMA heaps but rather how scalability is an issue. What we are proposing here is a generic interface via TEE to the firmware/Trusted OS which can perform all the platform specific memory restrictions. This solution will scale across vendors.
I read something completely different into Christian's mail.
What Christian is saying is that injecting generic constraint solving into the kernel doesn't scale. It's not OK to build out generic infrastructure in the kernel which queries a bunch of leaf drivers and attempts to somehow come up with something which satisfies userspace-provided constraints.
But this isn't the same thing as saying 'dma-heaps is wrong'! Again, there is no additional complexity in the kernel between a dma-heap which bridges over to TEE, and a TEE userspace interface which also bridges over to TEE. Both of them are completely fine according to what he's said.
Honestly, when I look at dma-heap implementations, they seem to be trivial shells around existing (more complex) allocators, and the boiler plate [1] to expose a dma-heap is relatively small. The dma-buf implementation, you already have, so we're talking about a hundred lines of code to maintain, which shouldn't be significantly more than what you have for the new ioctl() to be honest.
It will rather be redundant vendor specific code under DMA heaps calling into firmware/Trusted OS to enforce memory restrictions as you can look into Mediatek example [1]. With TEE subsystem managing that it won't be the case as we will provide a common abstraction for the communication with underlying firmware/Trusted OS.
Yes, it's common for everyone who uses TEE to implement SVP. It's not common for the people who do _not_ use TEE to implement SVP. Which means that userspace has to type out both, and what we're asking in this thread is: why?
Why should userspace have to support dma-heap allocation for platforms supporting SVP via a static DT-defined carveout as well as supporting TEE API allocation for platforms supporting SVP via a dynamic carveout? What benefit does it bring to have this surfaced as a completely separate uAPI?
And I'll insist on what Daniel said, it's a small price to pay to have a standard interface to expose to userspace. If dma-heaps are not used for this kind things, I honestly wonder what they will be used for...
Let's try not to forcefully find a use-case for DMA heaps when there is a better alternative available.
What makes it better? If you could explain very clearly the benefit userspace will gain from asking TEE to allocate $n bytes for TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate $n bytes for protected,secure-video, I think that would really help. Right now, I don't understand how it would be better in any way whatsoever for userspace. And I think your decision to implement it as a separate API is based on a misunderstanding of Christian's position.
I am still failing to see why you don't consider following as a standardised user-space interface:
"When user-space has to work with restricted memory, ask TEE device to allocate it"
As far as I can tell, having userspace work with the TEE interface brings zero benefit (again, please correct me if I'm wrong and explain how it's better). The direct cost - call it a disbenefit - it brings is that we have to spend a pile of time typing out support for TEE allocation in every media/GPU/display driver/application, and when we do any kind of negotiation, we have to have one protocol definition for TEE and one for non-TEE.
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
So that's 'why not TEE as the single uAPI for SVP'. So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
Cheers, Daniel
On Tue, Feb 18, 2025 at 04:22:10PM +0000, Daniel Stone wrote:
Hi Sumit,
On Mon, 17 Feb 2025 at 06:13, Sumit Garg sumit.garg@linaro.org wrote:
On Fri, 14 Feb 2025 at 21:19, Boris Brezillon boris.brezillon@collabora.com wrote:
I would say one heap per-profile.
And then it would have a per vendor multiplication factor as each vendor enforces memory restriction in a platform specific manner which won't scale.
Yes, they do enforce it in a platform-specific manner, but so does TEE. There is no one golden set of semantics which is globally applicable between all hardware and all products in a useful manner.
So, if we define protected,secure-video + protected,secure-video-record + protected,trusted-ui heap names, we have exactly the same number of axes. The only change is from uint32_t to string.
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
Hm, I fail to see where Christian dismiss the dma-heaps solution in this email. He even says:
If the memory is not physically attached to any device, but rather just
memory attached to the CPU or a system wide memory controller then expose the memory as DMA-heap with specific requirements (e.g. certain sized pages, contiguous, restricted, encrypted, ...).
I am not saying Christian dismissed DMA heaps but rather how scalability is an issue. What we are proposing here is a generic interface via TEE to the firmware/Trusted OS which can perform all the platform specific memory restrictions. This solution will scale across vendors.
I read something completely different into Christian's mail.
What Christian is saying is that injecting generic constraint solving into the kernel doesn't scale. It's not OK to build out generic infrastructure in the kernel which queries a bunch of leaf drivers and attempts to somehow come up with something which satisfies userspace-provided constraints.
Fully agreeing. The one thing we discussed, but haven't implemented yet, is that we'd add sysfs links from devices to the dma-heaps they support. Including allowing for priorities and different use-cases on the same device. We just haven't gotten there yet.
But even with that it's up to userspace to do the constraint solving, not the kernel.
But this isn't the same thing as saying 'dma-heaps is wrong'! Again, there is no additional complexity in the kernel between a dma-heap which bridges over to TEE, and a TEE userspace interface which also bridges over to TEE. Both of them are completely fine according to what he's said.
Honestly, when I look at dma-heap implementations, they seem to be trivial shells around existing (more complex) allocators, and the boiler plate [1] to expose a dma-heap is relatively small. The dma-buf implementation, you already have, so we're talking about a hundred lines of code to maintain, which shouldn't be significantly more than what you have for the new ioctl() to be honest.
It will rather be redundant vendor specific code under DMA heaps calling into firmware/Trusted OS to enforce memory restrictions as you can look into Mediatek example [1]. With TEE subsystem managing that it won't be the case as we will provide a common abstraction for the communication with underlying firmware/Trusted OS.
Yes, it's common for everyone who uses TEE to implement SVP. It's not common for the people who do _not_ use TEE to implement SVP. Which means that userspace has to type out both, and what we're asking in this thread is: why?
Why should userspace have to support dma-heap allocation for platforms supporting SVP via a static DT-defined carveout as well as supporting TEE API allocation for platforms supporting SVP via a dynamic carveout? What benefit does it bring to have this surfaced as a completely separate uAPI?
And I'll insist on what Daniel said, it's a small price to pay to have a standard interface to expose to userspace. If dma-heaps are not used for this kind things, I honestly wonder what they will be used for...
Let's try not to forcefully find a use-case for DMA heaps when there is a better alternative available.
What makes it better? If you could explain very clearly the benefit userspace will gain from asking TEE to allocate $n bytes for TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate $n bytes for protected,secure-video, I think that would really help. Right now, I don't understand how it would be better in any way whatsoever for userspace. And I think your decision to implement it as a separate API is based on a misunderstanding of Christian's position.
I am still failing to see why you don't consider following as a standardised user-space interface:
"When user-space has to work with restricted memory, ask TEE device to allocate it"
As far as I can tell, having userspace work with the TEE interface brings zero benefit (again, please correct me if I'm wrong and explain how it's better). The direct cost - call it a disbenefit - it brings is that we have to spend a pile of time typing out support for TEE allocation in every media/GPU/display driver/application, and when we do any kind of negotiation, we have to have one protocol definition for TEE and one for non-TEE.
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
So that's 'why not TEE as the single uAPI for SVP'. So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
Completely concurring on everything said above. TEE exposed through a dma-buf heap (or maybe special v4l allocation flag for secure video playback) and then we prime import that on the display side. Maybe also through drm render drivers for the EGL/VK protected content extensions. Same for any other hw means to allocate content protected buffers, TEE is not special here at all.
Anything else needs seriously good justifications why the entire dma-buf heap design is busted.
Cheers, Sima
On Tue, 18 Feb 2025 at 21:52, Daniel Stone daniel@fooishbar.org wrote:
Hi Sumit,
On Mon, 17 Feb 2025 at 06:13, Sumit Garg sumit.garg@linaro.org wrote:
On Fri, 14 Feb 2025 at 21:19, Boris Brezillon boris.brezillon@collabora.com wrote:
I would say one heap per-profile.
And then it would have a per vendor multiplication factor as each vendor enforces memory restriction in a platform specific manner which won't scale.
Yes, they do enforce it in a platform-specific manner, but so does TEE. There is no one golden set of semantics which is globally applicable between all hardware and all products in a useful manner.
So, if we define protected,secure-video + protected,secure-video-record + protected,trusted-ui heap names, we have exactly the same number of axes. The only change is from uint32_t to string.
Christian gave an historical background here [1] as to why that hasn't worked in the past with DMA heaps given the scalability issues.
[1] https://lore.kernel.org/dri-devel/e967e382-6cca-4dee-8333-39892d532f71@gmail...
Hm, I fail to see where Christian dismiss the dma-heaps solution in this email. He even says:
If the memory is not physically attached to any device, but rather just
memory attached to the CPU or a system wide memory controller then expose the memory as DMA-heap with specific requirements (e.g. certain sized pages, contiguous, restricted, encrypted, ...).
I am not saying Christian dismissed DMA heaps but rather how scalability is an issue. What we are proposing here is a generic interface via TEE to the firmware/Trusted OS which can perform all the platform specific memory restrictions. This solution will scale across vendors.
I read something completely different into Christian's mail.
What Christian is saying is that injecting generic constraint solving into the kernel doesn't scale. It's not OK to build out generic infrastructure in the kernel which queries a bunch of leaf drivers and attempts to somehow come up with something which satisfies userspace-provided constraints.
But this isn't the same thing as saying 'dma-heaps is wrong'! Again, there is no additional complexity in the kernel between a dma-heap which bridges over to TEE, and a TEE userspace interface which also bridges over to TEE. Both of them are completely fine according to what he's said.
Honestly, when I look at dma-heap implementations, they seem to be trivial shells around existing (more complex) allocators, and the boiler plate [1] to expose a dma-heap is relatively small. The dma-buf implementation, you already have, so we're talking about a hundred lines of code to maintain, which shouldn't be significantly more than what you have for the new ioctl() to be honest.
It will rather be redundant vendor specific code under DMA heaps calling into firmware/Trusted OS to enforce memory restrictions as you can look into Mediatek example [1]. With TEE subsystem managing that it won't be the case as we will provide a common abstraction for the communication with underlying firmware/Trusted OS.
Yes, it's common for everyone who uses TEE to implement SVP. It's not common for the people who do _not_ use TEE to implement SVP. Which means that userspace has to type out both, and what we're asking in this thread is: why?
Why should userspace have to support dma-heap allocation for platforms supporting SVP via a static DT-defined carveout as well as supporting TEE API allocation for platforms supporting SVP via a dynamic carveout? What benefit does it bring to have this surfaced as a completely separate uAPI?
And I'll insist on what Daniel said, it's a small price to pay to have a standard interface to expose to userspace. If dma-heaps are not used for this kind things, I honestly wonder what they will be used for...
Let's try not to forcefully find a use-case for DMA heaps when there is a better alternative available.
What makes it better? If you could explain very clearly the benefit userspace will gain from asking TEE to allocate $n bytes for TEE_IOC_UC_SECURE_VIDEO_PLAY, compared to asking dma-heap to allocate $n bytes for protected,secure-video, I think that would really help. Right now, I don't understand how it would be better in any way whatsoever for userspace. And I think your decision to implement it as a separate API is based on a misunderstanding of Christian's position.
I am still failing to see why you don't consider following as a standardised user-space interface:
"When user-space has to work with restricted memory, ask TEE device to allocate it"
As far as I can tell, having userspace work with the TEE interface brings zero benefit (again, please correct me if I'm wrong and explain how it's better). The direct cost - call it a disbenefit - it brings is that we have to spend a pile of time typing out support for TEE allocation in every media/GPU/display driver/application, and when we do any kind of negotiation, we have to have one protocol definition for TEE and one for non-TEE.
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a bit more? As to how the protected/encrypted media content pipeline works? Which architecture support does your use-case require? Is there any higher privileged level firmware interaction required to perform media content decryption into restricted memory? Do you plan to upstream corresponding support in near future?
Let me try to elaborate on the Secure Video Path (SVP) flow requiring a TEE implementation (in general terms a higher privileged firmware managing the pipeline as the kernel/user-space has no access permissions to the plain text media content):
- Firstly a content decryption key is securely provisioned into the TEE implementation. - Interaction with TEE to set up access permissions of different peripherals in the media pipeline so that they can access restricted memory. - Interaction with TEE to allocate restricted memory buffers. - Interaction with TEE to decrypt downloaded encrypted media content from normal memory buffers to restricted memory buffers. - Then the further media pipeline is able to process the plain media content in restricted buffers and display it.
So that's 'why not TEE as the single uAPI for SVP'.
Let's try to see if your SVP use-case really converges with TEE based SVP such that we really need a single uAPI.
So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space to perform an IOCTL into TEE to register the DMA-bufs as you can see here [1]. Then it will rather be two handles for user-space to manage. Similarly during restricted memory allocation/free we need another glue layer under DMA heaps to TEE subsystem.
The reason is simply which has been iterated over many times in the past threads that:
"If user-space has to interact with a TEE device for SVP use-case then why it's not better to ask TEE to allocate restricted DMA-bufs too"
[1] https://lkml.indiana.edu/hypermail/linux/kernel/2408.3/08296.html
-Sumit
Hi Sumit,
On Fri, 21 Feb 2025 at 11:24, Sumit Garg sumit.garg@linaro.org wrote:
On Tue, 18 Feb 2025 at 21:52, Daniel Stone daniel@fooishbar.org wrote:
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a bit more? As to how the protected/encrypted media content pipeline works? Which architecture support does your use-case require? Is there any higher privileged level firmware interaction required to perform media content decryption into restricted memory? Do you plan to upstream corresponding support in near future?
You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
There are TI Jacinto platforms which implement a 'secure' area configured statically by (IIRC) BL2, with static permissions defined for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've heard of another SoC vendor doing the same, but I don't think I can share those details. There is no TEE interaction.
I'm writing this message from an AMD laptop which implements restricted content paths outside of TEE. I don't have the full picture of how SVP is implemented on AMD systems, but I do know that I don't have any TEE devices exposed.
Let me try to elaborate on the Secure Video Path (SVP) flow requiring a TEE implementation (in general terms a higher privileged firmware managing the pipeline as the kernel/user-space has no access permissions to the plain text media content):
- [...]
Yeah, I totally understand the TEE usecase. I think that TEE is a good design to implement this. I think that TEE should be used for SVP where it makes sense.
Please understand that I am _not_ arguing that no-one should use TEE for SVP!
So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space to perform an IOCTL into TEE to register the DMA-bufs as you can see here [1]. Then it will rather be two handles for user-space to manage.
Yes, the decoder would need to do this. That's common though: if you want to share a buffer between V4L2 and DRM, you have three handles: the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to bridge the two.
Similarly during restricted memory allocation/free we need another glue layer under DMA heaps to TEE subsystem.
Yep.
The reason is simply which has been iterated over many times in the past threads that:
"If user-space has to interact with a TEE device for SVP use-case
then why it's not better to ask TEE to allocate restricted DMA-bufs too"
The first word in your proposition is load-bearing.
Build out the usecase a little more here. You have a DRMed video stream coming in, which you need to decode (involving TEE for this usecase). You get a dmabuf handle to the decoded frame. You need to pass the dmabuf across to the Wayland compositor. The compositor needs to pass it to EGL/Vulkan to import and do composition, which in turn passes it to the GPU DRM driver. The output of the composition is in turn shared between the GPU DRM driver and the separate KMS DRM driver, with the involvement of GBM.
For the platforms I'm interested in, the GPU DRM driver needs to switch into protected mode, which has no involvement at all with TEE - it's architecturally impossible to have TEE involved without moving most of the GPU driver into TEE and destroying performance. The display hardware also needs to engage protected mode, which again has no involvement with TEE and again would need to have half the driver moved into TEE for no benefit in order to do so. The Wayland compositor also has no interest in TEE: it tells the GPU DRM driver about the protected status of its buffers, and that's it.
What these components _are_ opinionated about, is the way buffers are allocated and managed. We built out dmabuf modifiers for this usecase, and we have a good negotiation protocol around that. We also really care about buffer placement in some usecases - e.g. some display/codec hardware requires buffers to be sourced from contiguous memory, other hardware needs to know that when it shares buffers with another device, it needs to place the buffers outside of inaccessible/slow local RAM. So we built out dma-heaps, so every part of the component in the stack can communicate their buffer-placement needs in the same way as we do modifiers, and negotiate an acceptable allocation.
That's my starting point for this discussion. We have a mechanism to deal with the fact that buffers need to be shared between different IP blocks which have their own constraints on buffer placement, avoiding the current problem of having every subsystem reinvent their own allocation uAPI which was burying us in impedance mismatch and confusion. That mechanism is dma-heaps. It seems like your starting point from this discussion is that you've implemented a TEE-centric design for SVP, and so all of userspace should bypass our existing cross-subsystem special-purpose allocation mechanism, and write specifically to one implementation. I believe that is a massive step backwards and an immediate introduction of technical debt.
Again, having an implementation of SVP via TEE makes a huge amount of sense. Having _most_ SVP implementations via TEE still makes a lot of sense. Having _all_ SVP implementations eventually be via TEE would still make sense. But even if we were at that point - which we aren't - it still doesn't justify telling userspace 'use the generic dma-heap uAPI for every device-specific allocation constraint, apart from SVP which has a completely different way to allocate some bytes'.
Cheers, Daniel
Hi Daniel,
On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Sumit,
On Fri, 21 Feb 2025 at 11:24, Sumit Garg sumit.garg@linaro.org wrote:
On Tue, 18 Feb 2025 at 21:52, Daniel Stone daniel@fooishbar.org wrote:
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a bit more? As to how the protected/encrypted media content pipeline works? Which architecture support does your use-case require? Is there any higher privileged level firmware interaction required to perform media content decryption into restricted memory? Do you plan to upstream corresponding support in near future?
You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
There are TI Jacinto platforms which implement a 'secure' area configured statically by (IIRC) BL2, with static permissions defined for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've heard of another SoC vendor doing the same, but I don't think I can share those details. There is no TEE interaction.
I'm writing this message from an AMD laptop which implements restricted content paths outside of TEE. I don't have the full picture of how SVP is implemented on AMD systems, but I do know that I don't have any TEE devices exposed.
Let me try to elaborate on the Secure Video Path (SVP) flow requiring a TEE implementation (in general terms a higher privileged firmware managing the pipeline as the kernel/user-space has no access permissions to the plain text media content):
- [...]
Yeah, I totally understand the TEE usecase. I think that TEE is a good design to implement this. I think that TEE should be used for SVP where it makes sense.
Please understand that I am _not_ arguing that no-one should use TEE for SVP!
So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space to perform an IOCTL into TEE to register the DMA-bufs as you can see here [1]. Then it will rather be two handles for user-space to manage.
Yes, the decoder would need to do this. That's common though: if you want to share a buffer between V4L2 and DRM, you have three handles: the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to bridge the two.
Similarly during restricted memory allocation/free we need another glue layer under DMA heaps to TEE subsystem.
Yep.
The reason is simply which has been iterated over many times in the past threads that:
"If user-space has to interact with a TEE device for SVP use-case
then why it's not better to ask TEE to allocate restricted DMA-bufs too"
The first word in your proposition is load-bearing.
Build out the usecase a little more here. You have a DRMed video stream coming in, which you need to decode (involving TEE for this usecase). You get a dmabuf handle to the decoded frame. You need to pass the dmabuf across to the Wayland compositor. The compositor needs to pass it to EGL/Vulkan to import and do composition, which in turn passes it to the GPU DRM driver. The output of the composition is in turn shared between the GPU DRM driver and the separate KMS DRM driver, with the involvement of GBM.
For the platforms I'm interested in, the GPU DRM driver needs to switch into protected mode, which has no involvement at all with TEE - it's architecturally impossible to have TEE involved without moving most of the GPU driver into TEE and destroying performance. The display hardware also needs to engage protected mode, which again has no involvement with TEE and again would need to have half the driver moved into TEE for no benefit in order to do so. The Wayland compositor also has no interest in TEE: it tells the GPU DRM driver about the protected status of its buffers, and that's it.
What these components _are_ opinionated about, is the way buffers are allocated and managed. We built out dmabuf modifiers for this usecase, and we have a good negotiation protocol around that. We also really care about buffer placement in some usecases - e.g. some display/codec hardware requires buffers to be sourced from contiguous memory, other hardware needs to know that when it shares buffers with another device, it needs to place the buffers outside of inaccessible/slow local RAM. So we built out dma-heaps, so every part of the component in the stack can communicate their buffer-placement needs in the same way as we do modifiers, and negotiate an acceptable allocation.
That's my starting point for this discussion. We have a mechanism to deal with the fact that buffers need to be shared between different IP blocks which have their own constraints on buffer placement, avoiding the current problem of having every subsystem reinvent their own allocation uAPI which was burying us in impedance mismatch and confusion. That mechanism is dma-heaps. It seems like your starting point from this discussion is that you've implemented a TEE-centric design for SVP, and so all of userspace should bypass our existing cross-subsystem special-purpose allocation mechanism, and write specifically to one implementation. I believe that is a massive step backwards and an immediate introduction of technical debt.
Again, having an implementation of SVP via TEE makes a huge amount of sense. Having _most_ SVP implementations via TEE still makes a lot of sense. Having _all_ SVP implementations eventually be via TEE would still make sense. But even if we were at that point - which we aren't
- it still doesn't justify telling userspace 'use the generic dma-heap
uAPI for every device-specific allocation constraint, apart from SVP which has a completely different way to allocate some bytes'.
I must admit that I don't see how this makes a significant difference, but then I haven't hacked much in the stacks you're talking about, so I'm going to take your word for it.
I've experimented with providing a dma-heap replacing the TEE API. The implementation is more complex than I first anticipated, adding about 400 lines to the patch set. From user space, it looks like another dma-heap. I'm using the names you gave earlier, protected,secure-video, protected,trusted-ui, and protected,secure-video-record. However, I wonder if we shouldn't use "restricted" instead of "protected" since we had agreed to call it restricted memory earlier.
I'll soon post this in a v6 and an updated demo.
Cheers, Jens
On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote:
Hi Daniel,
On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Sumit,
On Fri, 21 Feb 2025 at 11:24, Sumit Garg sumit.garg@linaro.org wrote:
On Tue, 18 Feb 2025 at 21:52, Daniel Stone daniel@fooishbar.org wrote:
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a bit more? As to how the protected/encrypted media content pipeline works? Which architecture support does your use-case require? Is there any higher privileged level firmware interaction required to perform media content decryption into restricted memory? Do you plan to upstream corresponding support in near future?
You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
There are TI Jacinto platforms which implement a 'secure' area configured statically by (IIRC) BL2, with static permissions defined for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've heard of another SoC vendor doing the same, but I don't think I can share those details. There is no TEE interaction.
I'm writing this message from an AMD laptop which implements restricted content paths outside of TEE. I don't have the full picture of how SVP is implemented on AMD systems, but I do know that I don't have any TEE devices exposed.
Let me try to elaborate on the Secure Video Path (SVP) flow requiring a TEE implementation (in general terms a higher privileged firmware managing the pipeline as the kernel/user-space has no access permissions to the plain text media content):
- [...]
Yeah, I totally understand the TEE usecase. I think that TEE is a good design to implement this. I think that TEE should be used for SVP where it makes sense.
Please understand that I am _not_ arguing that no-one should use TEE for SVP!
So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space to perform an IOCTL into TEE to register the DMA-bufs as you can see here [1]. Then it will rather be two handles for user-space to manage.
Yes, the decoder would need to do this. That's common though: if you want to share a buffer between V4L2 and DRM, you have three handles: the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to bridge the two.
Similarly during restricted memory allocation/free we need another glue layer under DMA heaps to TEE subsystem.
Yep.
The reason is simply which has been iterated over many times in the past threads that:
"If user-space has to interact with a TEE device for SVP use-case
then why it's not better to ask TEE to allocate restricted DMA-bufs too"
The first word in your proposition is load-bearing.
Build out the usecase a little more here. You have a DRMed video stream coming in, which you need to decode (involving TEE for this usecase). You get a dmabuf handle to the decoded frame. You need to pass the dmabuf across to the Wayland compositor. The compositor needs to pass it to EGL/Vulkan to import and do composition, which in turn passes it to the GPU DRM driver. The output of the composition is in turn shared between the GPU DRM driver and the separate KMS DRM driver, with the involvement of GBM.
For the platforms I'm interested in, the GPU DRM driver needs to switch into protected mode, which has no involvement at all with TEE - it's architecturally impossible to have TEE involved without moving most of the GPU driver into TEE and destroying performance. The display hardware also needs to engage protected mode, which again has no involvement with TEE and again would need to have half the driver moved into TEE for no benefit in order to do so. The Wayland compositor also has no interest in TEE: it tells the GPU DRM driver about the protected status of its buffers, and that's it.
What these components _are_ opinionated about, is the way buffers are allocated and managed. We built out dmabuf modifiers for this usecase, and we have a good negotiation protocol around that. We also really care about buffer placement in some usecases - e.g. some display/codec hardware requires buffers to be sourced from contiguous memory, other hardware needs to know that when it shares buffers with another device, it needs to place the buffers outside of inaccessible/slow local RAM. So we built out dma-heaps, so every part of the component in the stack can communicate their buffer-placement needs in the same way as we do modifiers, and negotiate an acceptable allocation.
That's my starting point for this discussion. We have a mechanism to deal with the fact that buffers need to be shared between different IP blocks which have their own constraints on buffer placement, avoiding the current problem of having every subsystem reinvent their own allocation uAPI which was burying us in impedance mismatch and confusion. That mechanism is dma-heaps. It seems like your starting point from this discussion is that you've implemented a TEE-centric design for SVP, and so all of userspace should bypass our existing cross-subsystem special-purpose allocation mechanism, and write specifically to one implementation. I believe that is a massive step backwards and an immediate introduction of technical debt.
Again, having an implementation of SVP via TEE makes a huge amount of sense. Having _most_ SVP implementations via TEE still makes a lot of sense. Having _all_ SVP implementations eventually be via TEE would still make sense. But even if we were at that point - which we aren't
- it still doesn't justify telling userspace 'use the generic dma-heap
uAPI for every device-specific allocation constraint, apart from SVP which has a completely different way to allocate some bytes'.
I must admit that I don't see how this makes a significant difference, but then I haven't hacked much in the stacks you're talking about, so I'm going to take your word for it.
I've experimented with providing a dma-heap replacing the TEE API. The implementation is more complex than I first anticipated, adding about 400 lines to the patch set.
I did anticipated this but let's give it a try and see if DMA heaps really adds any value from user-space point of view. If it does then it will be worth the maintenence overhead.
From user space, it looks like another dma-heap. I'm using the names you gave earlier, protected,secure-video, protected,trusted-ui, and protected,secure-video-record. However, I wonder if we shouldn't use "restricted" instead of "protected" since we had agreed to call it restricted memory earlier.
Let's stick with "restricted" memory buffer references only.
-Sumit
Le mardi 04 mars 2025 à 13:15 +0530, Sumit Garg a écrit :
On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote:
Hi Daniel,
On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Sumit,
On Fri, 21 Feb 2025 at 11:24, Sumit Garg sumit.garg@linaro.org wrote:
On Tue, 18 Feb 2025 at 21:52, Daniel Stone daniel@fooishbar.org wrote:
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a bit more? As to how the protected/encrypted media content pipeline works? Which architecture support does your use-case require? Is there any higher privileged level firmware interaction required to perform media content decryption into restricted memory? Do you plan to upstream corresponding support in near future?
You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
There are TI Jacinto platforms which implement a 'secure' area configured statically by (IIRC) BL2, with static permissions defined for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've heard of another SoC vendor doing the same, but I don't think I can share those details. There is no TEE interaction.
I'm writing this message from an AMD laptop which implements restricted content paths outside of TEE. I don't have the full picture of how SVP is implemented on AMD systems, but I do know that I don't have any TEE devices exposed.
Let me try to elaborate on the Secure Video Path (SVP) flow requiring a TEE implementation (in general terms a higher privileged firmware managing the pipeline as the kernel/user-space has no access permissions to the plain text media content):
- [...]
Yeah, I totally understand the TEE usecase. I think that TEE is a good design to implement this. I think that TEE should be used for SVP where it makes sense.
Please understand that I am _not_ arguing that no-one should use TEE for SVP!
So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space to perform an IOCTL into TEE to register the DMA-bufs as you can see here [1]. Then it will rather be two handles for user-space to manage.
Yes, the decoder would need to do this. That's common though: if you want to share a buffer between V4L2 and DRM, you have three handles: the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to bridge the two.
Similarly during restricted memory allocation/free we need another glue layer under DMA heaps to TEE subsystem.
Yep.
The reason is simply which has been iterated over many times in the past threads that:
"If user-space has to interact with a TEE device for SVP use-case then why it's not better to ask TEE to allocate restricted DMA-bufs too"
The first word in your proposition is load-bearing.
Build out the usecase a little more here. You have a DRMed video stream coming in, which you need to decode (involving TEE for this usecase). You get a dmabuf handle to the decoded frame. You need to pass the dmabuf across to the Wayland compositor. The compositor needs to pass it to EGL/Vulkan to import and do composition, which in turn passes it to the GPU DRM driver. The output of the composition is in turn shared between the GPU DRM driver and the separate KMS DRM driver, with the involvement of GBM.
For the platforms I'm interested in, the GPU DRM driver needs to switch into protected mode, which has no involvement at all with TEE - it's architecturally impossible to have TEE involved without moving most of the GPU driver into TEE and destroying performance. The display hardware also needs to engage protected mode, which again has no involvement with TEE and again would need to have half the driver moved into TEE for no benefit in order to do so. The Wayland compositor also has no interest in TEE: it tells the GPU DRM driver about the protected status of its buffers, and that's it.
What these components _are_ opinionated about, is the way buffers are allocated and managed. We built out dmabuf modifiers for this usecase, and we have a good negotiation protocol around that. We also really care about buffer placement in some usecases - e.g. some display/codec hardware requires buffers to be sourced from contiguous memory, other hardware needs to know that when it shares buffers with another device, it needs to place the buffers outside of inaccessible/slow local RAM. So we built out dma-heaps, so every part of the component in the stack can communicate their buffer-placement needs in the same way as we do modifiers, and negotiate an acceptable allocation.
That's my starting point for this discussion. We have a mechanism to deal with the fact that buffers need to be shared between different IP blocks which have their own constraints on buffer placement, avoiding the current problem of having every subsystem reinvent their own allocation uAPI which was burying us in impedance mismatch and confusion. That mechanism is dma-heaps. It seems like your starting point from this discussion is that you've implemented a TEE-centric design for SVP, and so all of userspace should bypass our existing cross-subsystem special-purpose allocation mechanism, and write specifically to one implementation. I believe that is a massive step backwards and an immediate introduction of technical debt.
Again, having an implementation of SVP via TEE makes a huge amount of sense. Having _most_ SVP implementations via TEE still makes a lot of sense. Having _all_ SVP implementations eventually be via TEE would still make sense. But even if we were at that point - which we aren't
- it still doesn't justify telling userspace 'use the generic dma-heap
uAPI for every device-specific allocation constraint, apart from SVP which has a completely different way to allocate some bytes'.
I must admit that I don't see how this makes a significant difference, but then I haven't hacked much in the stacks you're talking about, so I'm going to take your word for it.
I've experimented with providing a dma-heap replacing the TEE API. The implementation is more complex than I first anticipated, adding about 400 lines to the patch set.
I did anticipated this but let's give it a try and see if DMA heaps really adds any value from user-space point of view. If it does then it will be worth the maintenence overhead.
From user space, it looks like another dma-heap. I'm using the names you gave earlier, protected,secure-video, protected,trusted-ui, and protected,secure-video-record. However, I wonder if we shouldn't use "restricted" instead of "protected" since we had agreed to call it restricted memory earlier.
Let's stick with "restricted" memory buffer references only.
Until now, we didn't have a standard to balance our naming choice, we simply wanted to move away from "secure" which didn't mean much, and restricted met our needs. I think the discussion is worth having again, now that there is a standard that decided toward "protected". Matchcing the Khronos standard means reducing a lot of confusion.
https://docs.vulkan.org/guide/latest/protected.html
regards, Nicolas
Hi,
On Tue, Mar 18, 2025 at 7:38 PM Nicolas Dufresne nicolas@ndufresne.ca wrote:
Le mardi 04 mars 2025 à 13:15 +0530, Sumit Garg a écrit :
On Tue, Mar 04, 2025 at 08:17:23AM +0100, Jens Wiklander wrote:
Hi Daniel,
On Fri, Feb 21, 2025 at 3:12 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Sumit,
On Fri, 21 Feb 2025 at 11:24, Sumit Garg sumit.garg@linaro.org wrote:
On Tue, 18 Feb 2025 at 21:52, Daniel Stone daniel@fooishbar.org wrote:
dma-heaps was created to solve the problem of having too many 'allocate $n bytes from $specialplace' uAPIs. The proliferation was painful and making it difficult for userspace to do what it needed to do. Userspace doesn't _yet_ make full use of it, but the solution is to make userspace make full use of it, not to go create entirely separate allocation paths for unclear reasons.
Besides, I'm writing this from a platform that implements SVP not via TEE. I've worked on platforms which implement SVP without any TEE, where the TEE implementation would be at best a no-op stub, and at worst flat-out impossible.
Can you elaborate the non-TEE use-case for Secure Video Path (SVP) a bit more? As to how the protected/encrypted media content pipeline works? Which architecture support does your use-case require? Is there any higher privileged level firmware interaction required to perform media content decryption into restricted memory? Do you plan to upstream corresponding support in near future?
You can see the MTK SVP patches on list which use the MTK SMC to mediate it.
There are TI Jacinto platforms which implement a 'secure' area configured statically by (IIRC) BL2, with static permissions defined for each AXI endpoint, e.g. CPU write + codec RW + dispc read. I've heard of another SoC vendor doing the same, but I don't think I can share those details. There is no TEE interaction.
I'm writing this message from an AMD laptop which implements restricted content paths outside of TEE. I don't have the full picture of how SVP is implemented on AMD systems, but I do know that I don't have any TEE devices exposed.
Let me try to elaborate on the Secure Video Path (SVP) flow requiring a TEE implementation (in general terms a higher privileged firmware managing the pipeline as the kernel/user-space has no access permissions to the plain text media content):
- [...]
Yeah, I totally understand the TEE usecase. I think that TEE is a good design to implement this. I think that TEE should be used for SVP where it makes sense.
Please understand that I am _not_ arguing that no-one should use TEE for SVP!
So, again, let's please turn this around: _why_ TEE? Who benefits from exposing this as completely separate to the more generic uAPI that we specifically designed to handle things like this?
The bridging between DMA heaps and TEE would still require user-space to perform an IOCTL into TEE to register the DMA-bufs as you can see here [1]. Then it will rather be two handles for user-space to manage.
Yes, the decoder would need to do this. That's common though: if you want to share a buffer between V4L2 and DRM, you have three handles: the V4L2 buffer handle, the DRM GEM handle, and the dmabuf you use to bridge the two.
Similarly during restricted memory allocation/free we need another glue layer under DMA heaps to TEE subsystem.
Yep.
The reason is simply which has been iterated over many times in the past threads that:
"If user-space has to interact with a TEE device for SVP use-case
then why it's not better to ask TEE to allocate restricted DMA-bufs too"
The first word in your proposition is load-bearing.
Build out the usecase a little more here. You have a DRMed video stream coming in, which you need to decode (involving TEE for this usecase). You get a dmabuf handle to the decoded frame. You need to pass the dmabuf across to the Wayland compositor. The compositor needs to pass it to EGL/Vulkan to import and do composition, which in turn passes it to the GPU DRM driver. The output of the composition is in turn shared between the GPU DRM driver and the separate KMS DRM driver, with the involvement of GBM.
For the platforms I'm interested in, the GPU DRM driver needs to switch into protected mode, which has no involvement at all with TEE - it's architecturally impossible to have TEE involved without moving most of the GPU driver into TEE and destroying performance. The display hardware also needs to engage protected mode, which again has no involvement with TEE and again would need to have half the driver moved into TEE for no benefit in order to do so. The Wayland compositor also has no interest in TEE: it tells the GPU DRM driver about the protected status of its buffers, and that's it.
What these components _are_ opinionated about, is the way buffers are allocated and managed. We built out dmabuf modifiers for this usecase, and we have a good negotiation protocol around that. We also really care about buffer placement in some usecases - e.g. some display/codec hardware requires buffers to be sourced from contiguous memory, other hardware needs to know that when it shares buffers with another device, it needs to place the buffers outside of inaccessible/slow local RAM. So we built out dma-heaps, so every part of the component in the stack can communicate their buffer-placement needs in the same way as we do modifiers, and negotiate an acceptable allocation.
That's my starting point for this discussion. We have a mechanism to deal with the fact that buffers need to be shared between different IP blocks which have their own constraints on buffer placement, avoiding the current problem of having every subsystem reinvent their own allocation uAPI which was burying us in impedance mismatch and confusion. That mechanism is dma-heaps. It seems like your starting point from this discussion is that you've implemented a TEE-centric design for SVP, and so all of userspace should bypass our existing cross-subsystem special-purpose allocation mechanism, and write specifically to one implementation. I believe that is a massive step backwards and an immediate introduction of technical debt.
Again, having an implementation of SVP via TEE makes a huge amount of sense. Having _most_ SVP implementations via TEE still makes a lot of sense. Having _all_ SVP implementations eventually be via TEE would still make sense. But even if we were at that point - which we aren't
- it still doesn't justify telling userspace 'use the generic dma-heap
uAPI for every device-specific allocation constraint, apart from SVP which has a completely different way to allocate some bytes'.
I must admit that I don't see how this makes a significant difference, but then I haven't hacked much in the stacks you're talking about, so I'm going to take your word for it.
I've experimented with providing a dma-heap replacing the TEE API. The implementation is more complex than I first anticipated, adding about 400 lines to the patch set.
I did anticipated this but let's give it a try and see if DMA heaps really adds any value from user-space point of view. If it does then it will be worth the maintenence overhead.
From user space, it looks like another dma-heap. I'm using the names you gave earlier, protected,secure-video, protected,trusted-ui, and protected,secure-video-record. However, I wonder if we shouldn't use "restricted" instead of "protected" since we had agreed to call it restricted memory earlier.
Let's stick with "restricted" memory buffer references only.
Until now, we didn't have a standard to balance our naming choice, we simply wanted to move away from "secure" which didn't mean much, and restricted met our needs. I think the discussion is worth having again, now that there is a standard that decided toward "protected". Matchcing the Khronos standard means reducing a lot of confusion.
Yeah, that's fine with me. I don't mind changing the name again as long as we progress. The latest version of the patchset is here [1]. I've published a demo and changed the patchset to provide a heap interface instead of a special interface in the TEE subsystem for memory allocations as requested. I'm interested in feedback on the patches in general, but in particular, on how the heap interface is provided.
[1] https://lore.kernel.org/lkml/20250305130634.1850178-1-jens.wiklander@linaro....
Cheers, Jens
op-tee@lists.trustedfirmware.org