From: Jeshwanth Kumar N K JESHWANTHKUMAR.NK@amd.com
At present, the shared memory for TEE ring buffer, command buffer and data buffer is allocated using get_free_pages(). The driver shares the physical address of these buffers with PSP so that it can be mapped by the Trusted OS.
In this patch series we have replaced get_free_pages() with dma_alloc_coherent() to allocate shared memory to cleanup the existing allocation method.
Rijo Thomas (3): crypto: ccp - Add function to allocate and free memory using DMA APIs crypto: ccp - Use psp_tee_alloc_buffer() and psp_tee_free_buffer() tee: amdtee: Use psp_tee_alloc_buffer() and psp_tee_free_buffer()
drivers/crypto/ccp/psp-dev.c | 3 + drivers/crypto/ccp/tee-dev.c | 119 ++++++++++++++++++---------- drivers/crypto/ccp/tee-dev.h | 11 +-- drivers/tee/amdtee/amdtee_private.h | 18 ++--- drivers/tee/amdtee/call.c | 74 ++++++++--------- drivers/tee/amdtee/core.c | 72 ++++++++++------- drivers/tee/amdtee/shm_pool.c | 21 ++--- include/linux/psp-tee.h | 47 +++++++++++ 8 files changed, 221 insertions(+), 144 deletions(-)
From: Rijo Thomas Rijo-john.Thomas@amd.com
As part of cleanup, memory allocation using get_free_pages() is replaced with DMA APIs.
psp_tee_alloc_buffer() and psp_tee_free_buffer() has been introduced, so that DMA address can be shared with PSP Trusted OS for buffer mapping. In the presence of IOMMU, the address will be an IOVA. So, it must be converted into a physical address before sharing it with PSP Trusted OS.
Signed-off-by: Rijo Thomas Rijo-john.Thomas@amd.com Signed-off-by: Jeshwanth Kumar JESHWANTHKUMAR.NK@amd.com --- drivers/crypto/ccp/psp-dev.c | 3 +++ drivers/crypto/ccp/tee-dev.c | 51 ++++++++++++++++++++++++++++++++++++ include/linux/psp-tee.h | 47 +++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+)
diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index d42d7bc62352..049954d9984b 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -163,6 +163,9 @@ int psp_dev_init(struct sp_device *sp) goto e_err; }
+ if (sp->set_psp_master_device) + sp->set_psp_master_device(sp); + psp->io_regs = sp->io_map;
ret = psp_get_capability(psp); diff --git a/drivers/crypto/ccp/tee-dev.c b/drivers/crypto/ccp/tee-dev.c index 5560bf8329a1..fa6f89572613 100644 --- a/drivers/crypto/ccp/tee-dev.c +++ b/drivers/crypto/ccp/tee-dev.c @@ -13,6 +13,8 @@ #include <linux/mutex.h> #include <linux/delay.h> #include <linux/slab.h> +#include <linux/dma-direct.h> +#include <linux/iommu.h> #include <linux/gfp.h> #include <linux/psp.h> #include <linux/psp-tee.h> @@ -22,6 +24,55 @@
static bool psp_dead;
+struct psp_tee_buffer *psp_tee_alloc_buffer(unsigned long size, gfp_t gfp) +{ + struct psp_device *psp = psp_get_master_device(); + struct psp_tee_buffer *tee_buf; + struct iommu_domain *dom; + + if (!psp || !size) + return NULL; + + tee_buf = kzalloc(sizeof(*tee_buf), GFP_KERNEL); + if (!tee_buf) + return NULL; + + tee_buf->vaddr = dma_alloc_coherent(psp->dev, size, &tee_buf->dma, gfp); + if (!tee_buf->vaddr || !tee_buf->dma) { + kfree(tee_buf); + return NULL; + } + + tee_buf->size = size; + + /* Check whether IOMMU is present. If present, translate IOVA + * to physical address, else the dma handle is the physical + * address. + */ + dom = iommu_get_domain_for_dev(psp->dev); + if (dom) + tee_buf->paddr = iommu_iova_to_phys(dom, tee_buf->dma); + else + tee_buf->paddr = tee_buf->dma; + + return tee_buf; +} +EXPORT_SYMBOL(psp_tee_alloc_buffer); + +void psp_tee_free_buffer(struct psp_tee_buffer *tee_buf) +{ + struct psp_device *psp = psp_get_master_device(); + + if (!psp || !tee_buf) + return; + + dma_free_coherent(psp->dev, tee_buf->size, + tee_buf->vaddr, tee_buf->dma); + + kfree(tee_buf); +} +EXPORT_SYMBOL(psp_tee_free_buffer); + static int tee_alloc_ring(struct psp_tee_device *tee, int ring_size) { struct ring_buf_manager *rb_mgr = &tee->rb_mgr; diff --git a/include/linux/psp-tee.h b/include/linux/psp-tee.h index cb0c95d6d76b..c3db3f33d069 100644 --- a/include/linux/psp-tee.h +++ b/include/linux/psp-tee.h @@ -40,6 +40,20 @@ enum tee_cmd_id { TEE_CMD_ID_UNMAP_SHARED_MEM, };
+/** + * struct psp_tee_buffer - Structure of a TEE buffer shared with PSP. + * @dma: DMA buffer address + * @paddr: Physical address of DMA buffer + * @vaddr: CPU virtual address of DMA buffer + * @size: Size of DMA buffer in bytes + */ +struct psp_tee_buffer { + dma_addr_t dma; + phys_addr_t paddr; + void *vaddr; + unsigned long size; +}; + #ifdef CONFIG_CRYPTO_DEV_SP_PSP /** * psp_tee_process_cmd() - Process command in Trusted Execution Environment @@ -75,6 +89,28 @@ int psp_tee_process_cmd(enum tee_cmd_id cmd_id, void *buf, size_t len, */ int psp_check_tee_status(void);
+/** + * psp_tee_alloc_buffer() - Allocates memory of requested size and flags using + * dma_alloc_coherent() API. + * + * This function can be used to allocate a shared memory region between the + * host and PSP TEE. + * + * Returns: + * non-NULL a valid pointer to struct psp_tee_buffer + * NULL on failure + */ +struct psp_tee_buffer *psp_tee_alloc_buffer(unsigned long size, gfp_t gfp); + +/** + * psp_tee_free_buffer() - Deallocates memory using dma_free_coherent() API. + * + * This function can be used to release shared memory region between host + * and PSP TEE. + * + */ +void psp_tee_free_buffer(struct psp_tee_buffer *tee_buffer); + #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
static inline int psp_tee_process_cmd(enum tee_cmd_id cmd_id, void *buf, @@ -87,5 +123,16 @@ static inline int psp_check_tee_status(void) { return -ENODEV; } + +static inline +struct psp_tee_buffer *psp_tee_alloc_buffer(unsigned long size, gfp_t gfp) +{ + return NULL; +} + +static inline void psp_tee_free_buffer(struct psp_tee_buffer *tee_buffer) +{ +} + #endif /* CONFIG_CRYPTO_DEV_SP_PSP */ #endif /* __PSP_TEE_H_ */
On Wed, Oct 25, 2023 at 12:26:58PM +0530, jeshwank wrote:
- tee_buf->vaddr = dma_alloc_coherent(psp->dev, size, &tee_buf->dma, gfp);
- if (!tee_buf->vaddr || !tee_buf->dma) {
kfree(tee_buf);
return NULL;
- }
- tee_buf->size = size;
- /* Check whether IOMMU is present. If present, translate IOVA
* to physical address, else the dma handle is the physical
* address.
*/
- dom = iommu_get_domain_for_dev(psp->dev);
- if (dom)
tee_buf->paddr = iommu_iova_to_phys(dom, tee_buf->dma);
- else
No, you can't mix the DMA API and iommu API. You need to stick to one or the other.
On 27-Oct-23 10:54 AM, Christoph Hellwig wrote:
On Wed, Oct 25, 2023 at 12:26:58PM +0530, jeshwank wrote:
- tee_buf->vaddr = dma_alloc_coherent(psp->dev, size, &tee_buf->dma, gfp);
- if (!tee_buf->vaddr || !tee_buf->dma) {
kfree(tee_buf);
return NULL;
- }
- tee_buf->size = size;
- /* Check whether IOMMU is present. If present, translate IOVA
* to physical address, else the dma handle is the physical
* address.
*/
- dom = iommu_get_domain_for_dev(psp->dev);
- if (dom)
tee_buf->paddr = iommu_iova_to_phys(dom, tee_buf->dma);
- else
No, you can't mix the DMA API and iommu API. You need to stick to one or the other.
Can you please elaborate a bit more? Is it because in the presence of IOMMU, a contiguous DMA or bus-address-space "buffer" may be mapped non-contiguously into the physical address space? As a result, for buffers larger than a page size, when PSP tries to map the physical address into it's address space, the PSP Trusted OS will not be able to read back the entire buffer data.
CPU CPU Bus Virtual Physical Address Address Address Space Space Space
+-------+ +------+ +------+ | | |MMIO | Offset | | | | Virtual |Space | applied | | C +-------+ --------> B +------+ ----------> +------+ A | | mapping | | by host | | +-----+ | | | | bridge | | +--------+ | | | | +------+ | | | | | CPU | | | | RAM | | | | Device | | | | | | | | | | | +-----+ +-------+ +------+ +------+ +--------+ | | Virtual |Buffer| Mapping | | X +-------+ --------> Y +------+ <---------- +------+ Z | | mapping | RAM | by IOMMU | | | | | | | | +-------+ +------+
Reference diagram from: https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt
Regards,
Jeshwanth
On Fri, Oct 27, 2023 at 07:05:17PM +0530, NK, JESHWANTHKUMAR wrote:
Can you please elaborate a bit more?
Becasue the DMA API is a blackbox. You can pass the dma_addr_t to hardware, and you can use the kernel virtual address. That it can sometimes be implemented using the IMMU API is an implementation detail. Similarly you can only feeds iovas generated by the IOMMU API into the IOMMU API, not any random other scalar value, which is what you can get from the DMA API.
Hi Christoph,
On 30-Oct-23 7:03 PM, Christoph Hellwig wrote:
On Fri, Oct 27, 2023 at 07:05:17PM +0530, NK, JESHWANTHKUMAR wrote:
Can you please elaborate a bit more?
Becasue the DMA API is a blackbox. You can pass the dma_addr_t to hardware, and you can use the kernel virtual address. That it can sometimes be implemented using the IMMU API is an implementation detail. Similarly you can only feeds iovas generated by the IOMMU API into the IOMMU API, not any random other scalar value, which is what you can get from the DMA API.
Thanks for the explanation, I will send a new version of the patch.
Regards,
Jeshwanth
Hi Christoph,
On 27-Oct-23 10:54 AM, Christoph Hellwig wrote:
On Wed, Oct 25, 2023 at 12:26:58PM +0530, jeshwank wrote:
- tee_buf->vaddr = dma_alloc_coherent(psp->dev, size, &tee_buf->dma, gfp);
- if (!tee_buf->vaddr || !tee_buf->dma) {
kfree(tee_buf);
return NULL;
- }
- tee_buf->size = size;
- /* Check whether IOMMU is present. If present, translate IOVA
* to physical address, else the dma handle is the physical
* address.
*/
- dom = iommu_get_domain_for_dev(psp->dev);
- if (dom)
tee_buf->paddr = iommu_iova_to_phys(dom, tee_buf->dma);
- else
No, you can't mix the DMA API and iommu API. You need to stick to one or the other.
Can you please elaborate?
From: Rijo Thomas Rijo-john.Thomas@amd.com
Allocate TEE ring buffer and command buffer using psp_tee_alloc_buffer(), and free TEE ring buffer and command buffer using psp_tee_free_buffer().
As part of cleanup, memory allocation using get_free_pages() is replaced with DMA APIs.
Signed-off-by: Rijo Thomas Rijo-john.Thomas@amd.com Co-developed-by: Jeshwanth Kumar JESHWANTHKUMAR.NK@amd.com Signed-off-by: Jeshwanth Kumar JESHWANTHKUMAR.NK@amd.com --- drivers/crypto/ccp/tee-dev.c | 68 ++++++++++++++---------------------- drivers/crypto/ccp/tee-dev.h | 11 +++--- 2 files changed, 31 insertions(+), 48 deletions(-)
diff --git a/drivers/crypto/ccp/tee-dev.c b/drivers/crypto/ccp/tee-dev.c index fa6f89572613..f0a94191662d 100644 --- a/drivers/crypto/ccp/tee-dev.c +++ b/drivers/crypto/ccp/tee-dev.c @@ -76,22 +76,16 @@ EXPORT_SYMBOL(psp_tee_free_buffer); static int tee_alloc_ring(struct psp_tee_device *tee, int ring_size) { struct ring_buf_manager *rb_mgr = &tee->rb_mgr; - void *start_addr;
if (!ring_size) return -EINVAL;
- /* We need actual physical address instead of DMA address, since - * Trusted OS running on AMD Secure Processor will map this region - */ - start_addr = (void *)__get_free_pages(GFP_KERNEL, get_order(ring_size)); - if (!start_addr) + rb_mgr->ring_buf = psp_tee_alloc_buffer(ring_size, + GFP_KERNEL | __GFP_ZERO); + if (!rb_mgr->ring_buf) { + dev_err(tee->dev, "ring allocation failed\n"); return -ENOMEM; - - memset(start_addr, 0x0, ring_size); - rb_mgr->ring_start = start_addr; - rb_mgr->ring_size = ring_size; - rb_mgr->ring_pa = __psp_pa(start_addr); + } mutex_init(&rb_mgr->mutex);
return 0; @@ -101,15 +95,8 @@ static void tee_free_ring(struct psp_tee_device *tee) { struct ring_buf_manager *rb_mgr = &tee->rb_mgr;
- if (!rb_mgr->ring_start) - return; + psp_tee_free_buffer(rb_mgr->ring_buf);
- free_pages((unsigned long)rb_mgr->ring_start, - get_order(rb_mgr->ring_size)); - - rb_mgr->ring_start = NULL; - rb_mgr->ring_size = 0; - rb_mgr->ring_pa = 0; mutex_destroy(&rb_mgr->mutex); }
@@ -133,35 +120,36 @@ static int tee_wait_cmd_poll(struct psp_tee_device *tee, unsigned int timeout, return -ETIMEDOUT; }
-static -struct tee_init_ring_cmd *tee_alloc_cmd_buffer(struct psp_tee_device *tee) +struct psp_tee_buffer *tee_alloc_cmd_buffer(struct psp_tee_device *tee) { struct tee_init_ring_cmd *cmd; + struct psp_tee_buffer *cmd_buffer;
- cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); - if (!cmd) + cmd_buffer = psp_tee_alloc_buffer(sizeof(*cmd), + GFP_KERNEL | __GFP_ZERO); + if (!cmd_buffer) return NULL;
- cmd->hi_addr = upper_32_bits(tee->rb_mgr.ring_pa); - cmd->low_addr = lower_32_bits(tee->rb_mgr.ring_pa); - cmd->size = tee->rb_mgr.ring_size; + cmd = (struct tee_init_ring_cmd *)cmd_buffer->vaddr; + cmd->hi_addr = upper_32_bits(tee->rb_mgr.ring_buf->paddr); + cmd->low_addr = lower_32_bits(tee->rb_mgr.ring_buf->paddr); + cmd->size = tee->rb_mgr.ring_buf->size;
dev_dbg(tee->dev, "tee: ring address: high = 0x%x low = 0x%x size = %u\n", cmd->hi_addr, cmd->low_addr, cmd->size);
- return cmd; + return cmd_buffer; }
-static inline void tee_free_cmd_buffer(struct tee_init_ring_cmd *cmd) +static inline void tee_free_cmd_buffer(struct psp_tee_buffer *cmd_buffer) { - kfree(cmd); + psp_tee_free_buffer(cmd_buffer); }
static int tee_init_ring(struct psp_tee_device *tee) { int ring_size = MAX_RING_BUFFER_ENTRIES * sizeof(struct tee_ring_cmd); - struct tee_init_ring_cmd *cmd; - phys_addr_t cmd_buffer; + struct psp_tee_buffer *cmd_buffer; unsigned int reg; int ret;
@@ -175,21 +163,19 @@ static int tee_init_ring(struct psp_tee_device *tee)
tee->rb_mgr.wptr = 0;
- cmd = tee_alloc_cmd_buffer(tee); - if (!cmd) { + cmd_buffer = tee_alloc_cmd_buffer(tee); + if (!cmd_buffer) { tee_free_ring(tee); return -ENOMEM; }
- cmd_buffer = __psp_pa((void *)cmd); - /* Send command buffer details to Trusted OS by writing to * CPU-PSP message registers */
- iowrite32(lower_32_bits(cmd_buffer), + iowrite32(lower_32_bits(cmd_buffer->paddr), tee->io_regs + tee->vdata->cmdbuff_addr_lo_reg); - iowrite32(upper_32_bits(cmd_buffer), + iowrite32(upper_32_bits(cmd_buffer->paddr), tee->io_regs + tee->vdata->cmdbuff_addr_hi_reg); iowrite32(TEE_RING_INIT_CMD, tee->io_regs + tee->vdata->cmdresp_reg); @@ -209,7 +195,7 @@ static int tee_init_ring(struct psp_tee_device *tee) }
free_buf: - tee_free_cmd_buffer(cmd); + tee_free_cmd_buffer(cmd_buffer);
return ret; } @@ -219,7 +205,7 @@ static void tee_destroy_ring(struct psp_tee_device *tee) unsigned int reg; int ret;
- if (!tee->rb_mgr.ring_start) + if (!tee->rb_mgr.ring_buf->vaddr) return;
if (psp_dead) @@ -308,7 +294,7 @@ static int tee_submit_cmd(struct psp_tee_device *tee, enum tee_cmd_id cmd_id, do { /* Get pointer to ring buffer command entry */ cmd = (struct tee_ring_cmd *) - (tee->rb_mgr.ring_start + tee->rb_mgr.wptr); + (tee->rb_mgr.ring_buf->vaddr + tee->rb_mgr.wptr);
rptr = ioread32(tee->io_regs + tee->vdata->ring_rptr_reg);
@@ -357,7 +343,7 @@ static int tee_submit_cmd(struct psp_tee_device *tee, enum tee_cmd_id cmd_id,
/* Update local copy of write pointer */ tee->rb_mgr.wptr += sizeof(struct tee_ring_cmd); - if (tee->rb_mgr.wptr >= tee->rb_mgr.ring_size) + if (tee->rb_mgr.wptr >= tee->rb_mgr.ring_buf->size) tee->rb_mgr.wptr = 0;
/* Trigger interrupt to Trusted OS */ diff --git a/drivers/crypto/ccp/tee-dev.h b/drivers/crypto/ccp/tee-dev.h index 49d26158b71e..0e4398d15f93 100644 --- a/drivers/crypto/ccp/tee-dev.h +++ b/drivers/crypto/ccp/tee-dev.h @@ -16,6 +16,7 @@
#include <linux/device.h> #include <linux/mutex.h> +#include <linux/psp-tee.h>
#define TEE_DEFAULT_TIMEOUT 10 #define MAX_BUFFER_SIZE 988 @@ -48,17 +49,13 @@ struct tee_init_ring_cmd {
/** * struct ring_buf_manager - Helper structure to manage ring buffer. - * @ring_start: starting address of ring buffer - * @ring_size: size of ring buffer in bytes - * @ring_pa: physical address of ring buffer * @wptr: index to the last written entry in ring buffer + * @ring_buf: ring buffer allocated using DMA api */ struct ring_buf_manager { - struct mutex mutex; /* synchronizes access to ring buffer */ - void *ring_start; - u32 ring_size; - phys_addr_t ring_pa; + struct mutex mutex; /* synchronizes access to ring buffer */ u32 wptr; + struct psp_tee_buffer *ring_buf; };
struct psp_tee_device {
Hi jeshwank,
kernel test robot noticed the following build warnings:
[auto build test WARNING on herbert-crypto-2.6/master] [also build test WARNING on linus/master v6.6-rc7] [cannot apply to herbert-cryptodev-2.6/master next-20231025] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/jeshwank/crypto-ccp-Add-funct... base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git master patch link: https://lore.kernel.org/r/20231025065700.1556152-3-JESHWANTHKUMAR.NK%40amd.c... patch subject: [PATCH 2/3] crypto: ccp - Use psp_tee_alloc_buffer() and psp_tee_free_buffer() config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20231026/202310260529.GEey5HQc-lkp@i...) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231026/202310260529.GEey5HQc-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202310260529.GEey5HQc-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/crypto/ccp/tee-dev.c:123:24: warning: no previous prototype for 'tee_alloc_cmd_buffer' [-Wmissing-prototypes]
123 | struct psp_tee_buffer *tee_alloc_cmd_buffer(struct psp_tee_device *tee) | ^~~~~~~~~~~~~~~~~~~~
vim +/tee_alloc_cmd_buffer +123 drivers/crypto/ccp/tee-dev.c
122
123 struct psp_tee_buffer *tee_alloc_cmd_buffer(struct psp_tee_device *tee)
124 { 125 struct tee_init_ring_cmd *cmd; 126 struct psp_tee_buffer *cmd_buffer; 127 128 cmd_buffer = psp_tee_alloc_buffer(sizeof(*cmd), 129 GFP_KERNEL | __GFP_ZERO); 130 if (!cmd_buffer) 131 return NULL; 132 133 cmd = (struct tee_init_ring_cmd *)cmd_buffer->vaddr; 134 cmd->hi_addr = upper_32_bits(tee->rb_mgr.ring_buf->paddr); 135 cmd->low_addr = lower_32_bits(tee->rb_mgr.ring_buf->paddr); 136 cmd->size = tee->rb_mgr.ring_buf->size; 137 138 dev_dbg(tee->dev, "tee: ring address: high = 0x%x low = 0x%x size = %u\n", 139 cmd->hi_addr, cmd->low_addr, cmd->size); 140 141 return cmd_buffer; 142 } 143
From: Rijo Thomas Rijo-john.Thomas@amd.com
Allocate shared memory using psp_tee_alloc_buffer(), and free shared memory using psp_tee_free_buffer().
As part of cleanup, memory allocation using get_free_pages() is replaced with DMA APIs.
Signed-off-by: Rijo Thomas Rijo-john.Thomas@amd.com Signed-off-by: Jeshwanth Kumar JESHWANTHKUMAR.NK@amd.com --- drivers/tee/amdtee/amdtee_private.h | 18 +++---- drivers/tee/amdtee/call.c | 74 +++++++++++++---------------- drivers/tee/amdtee/core.c | 72 +++++++++++++++++----------- drivers/tee/amdtee/shm_pool.c | 21 ++------ 4 files changed, 89 insertions(+), 96 deletions(-)
diff --git a/drivers/tee/amdtee/amdtee_private.h b/drivers/tee/amdtee/amdtee_private.h index 6d0f7062bb87..e2bb0a21942c 100644 --- a/drivers/tee/amdtee/amdtee_private.h +++ b/drivers/tee/amdtee/amdtee_private.h @@ -13,6 +13,7 @@ #include <linux/kref.h> #include <linux/types.h> #include "amdtee_if.h" +#include <linux/psp-tee.h>
#define DRIVER_NAME "amdtee" #define DRIVER_AUTHOR "AMD-TEE Linux driver team" @@ -78,19 +79,14 @@ struct amdtee_driver_data { struct amdtee *amdtee; };
-struct shmem_desc { - void *kaddr; - u64 size; -}; - /** * struct amdtee_shm_data - Shared memory data - * @kaddr: Kernel virtual address of shared memory + * @shm_buf: Pointer to shared memory buffer * @buf_id: Buffer id of memory mapped by TEE_CMD_ID_MAP_SHARED_MEM */ struct amdtee_shm_data { struct list_head shm_node; - void *kaddr; + struct psp_tee_buffer *shm_buf; u32 buf_id; };
@@ -145,11 +141,11 @@ int amdtee_invoke_func(struct tee_context *ctx,
int amdtee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
-int amdtee_map_shmem(struct tee_shm *shm); +int amdtee_alloc_shmem(struct tee_shm *shm);
-void amdtee_unmap_shmem(struct tee_shm *shm); +void amdtee_free_shmem(struct tee_shm *shm);
-int handle_load_ta(void *data, u32 size, +int handle_load_ta(struct psp_tee_buffer *buf, struct tee_ioctl_open_session_arg *arg);
int handle_unload_ta(u32 ta_handle); @@ -159,7 +155,7 @@ int handle_open_session(struct tee_ioctl_open_session_arg *arg, u32 *info,
int handle_close_session(u32 ta_handle, u32 info);
-int handle_map_shmem(u32 count, struct shmem_desc *start, u32 *buf_id); +int handle_map_shmem(struct psp_tee_buffer *shm_buf, u32 *buf_id);
void handle_unmap_shmem(u32 buf_id);
diff --git a/drivers/tee/amdtee/call.c b/drivers/tee/amdtee/call.c index e9b63dcb3194..031d37cc975c 100644 --- a/drivers/tee/amdtee/call.c +++ b/drivers/tee/amdtee/call.c @@ -283,53 +283,44 @@ int handle_invoke_cmd(struct tee_ioctl_invoke_arg *arg, u32 sinfo, return ret; }
-int handle_map_shmem(u32 count, struct shmem_desc *start, u32 *buf_id) +int handle_map_shmem(struct psp_tee_buffer *shm_buf, u32 *buf_id) { struct tee_cmd_map_shared_mem *cmd; - phys_addr_t paddr; - int ret, i; u32 status; + int ret;
- if (!count || !start || !buf_id) + if (!shm_buf || !shm_buf->vaddr || !buf_id) return -EINVAL;
cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); if (!cmd) return -ENOMEM;
- /* Size must be page aligned */ - for (i = 0; i < count ; i++) { - if (!start[i].kaddr || (start[i].size & (PAGE_SIZE - 1))) { - ret = -EINVAL; - goto free_cmd; - } - - if ((u64)start[i].kaddr & (PAGE_SIZE - 1)) { - pr_err("map shared memory: page unaligned. addr 0x%llx", - (u64)start[i].kaddr); - ret = -EINVAL; - goto free_cmd; - } + /* Size and address must be page aligned */ + if (shm_buf->size & (PAGE_SIZE - 1)) { + ret = -EINVAL; + goto free_cmd; }
- cmd->sg_list.count = count; - - /* Create buffer list */ - for (i = 0; i < count ; i++) { - paddr = __psp_pa(start[i].kaddr); - cmd->sg_list.buf[i].hi_addr = upper_32_bits(paddr); - cmd->sg_list.buf[i].low_addr = lower_32_bits(paddr); - cmd->sg_list.buf[i].size = start[i].size; - cmd->sg_list.size += cmd->sg_list.buf[i].size; - - pr_debug("buf[%d]:hi addr = 0x%x\n", i, - cmd->sg_list.buf[i].hi_addr); - pr_debug("buf[%d]:low addr = 0x%x\n", i, - cmd->sg_list.buf[i].low_addr); - pr_debug("buf[%d]:size = 0x%x\n", i, cmd->sg_list.buf[i].size); - pr_debug("list size = 0x%x\n", cmd->sg_list.size); + if ((u64)shm_buf->vaddr & (PAGE_SIZE - 1)) { + pr_err("map shared memory: page unaligned. addr 0x%llx", + (u64)shm_buf->vaddr); + ret = -EINVAL; + goto free_cmd; }
+ /* Update sg list */ + cmd->sg_list.count = 1; + cmd->sg_list.buf[0].hi_addr = upper_32_bits(shm_buf->paddr); + cmd->sg_list.buf[0].low_addr = lower_32_bits(shm_buf->paddr); + cmd->sg_list.buf[0].size = shm_buf->size; + cmd->sg_list.size = cmd->sg_list.buf[0].size; + + pr_debug("buf: hi addr = 0x%x\n", cmd->sg_list.buf[0].hi_addr); + pr_debug("buf: low addr = 0x%x\n", cmd->sg_list.buf[0].low_addr); + pr_debug("buf: size = 0x%x\n", cmd->sg_list.buf[0].size); + pr_debug("list size = 0x%x\n", cmd->sg_list.size); + *buf_id = 0;
ret = psp_tee_process_cmd(TEE_CMD_ID_MAP_SHARED_MEM, (void *)cmd, @@ -396,25 +387,24 @@ int handle_open_session(struct tee_ioctl_open_session_arg *arg, u32 *info, return ret; }
-int handle_load_ta(void *data, u32 size, struct tee_ioctl_open_session_arg *arg) +int handle_load_ta(struct psp_tee_buffer *buf, + struct tee_ioctl_open_session_arg *arg) { struct tee_cmd_unload_ta unload_cmd = {}; struct tee_cmd_load_ta load_cmd = {}; - phys_addr_t blob; int ret;
- if (size == 0 || !data || !arg) + if (buf->size == 0 || !buf->paddr || !arg) return -EINVAL;
- blob = __psp_pa(data); - if (blob & (PAGE_SIZE - 1)) { - pr_err("load TA: page unaligned. blob 0x%llx", blob); + if (buf->dma & (PAGE_SIZE - 1)) { + pr_err("load TA: page unaligned. addr 0x%llx", buf->dma); return -EINVAL; }
- load_cmd.hi_addr = upper_32_bits(blob); - load_cmd.low_addr = lower_32_bits(blob); - load_cmd.size = size; + load_cmd.hi_addr = upper_32_bits(buf->paddr); + load_cmd.low_addr = lower_32_bits(buf->paddr); + load_cmd.size = buf->size;
mutex_lock(&ta_refcount_mutex);
diff --git a/drivers/tee/amdtee/core.c b/drivers/tee/amdtee/core.c index 3c15f6a9e91c..37784360fc10 100644 --- a/drivers/tee/amdtee/core.c +++ b/drivers/tee/amdtee/core.c @@ -17,6 +17,7 @@ #include "amdtee_private.h" #include "../tee_private.h" #include <linux/psp-tee.h> +#include <linux/psp.h>
static struct amdtee_driver_data *drv_data; static DEFINE_MUTEX(session_list_mutex); @@ -158,7 +159,8 @@ u32 get_buffer_id(struct tee_shm *shm)
mutex_lock(&ctxdata->shm_mutex); list_for_each_entry(shmdata, &ctxdata->shm_list, shm_node) - if (shmdata->kaddr == shm->kaddr) { + if (shmdata->shm_buf && + shmdata->shm_buf->vaddr == shm->kaddr) { buf_id = shmdata->buf_id; break; } @@ -168,11 +170,13 @@ u32 get_buffer_id(struct tee_shm *shm) }
static DEFINE_MUTEX(drv_mutex); -static int copy_ta_binary(struct tee_context *ctx, void *ptr, void **ta, - size_t *ta_size) +static int copy_ta_binary(struct tee_context *ctx, void *ptr, + struct psp_tee_buffer **bufp) { const struct firmware *fw; char fw_name[TA_PATH_MAX]; + struct psp_tee_buffer *buf; + unsigned long size; struct { u32 lo; u16 mid; @@ -201,15 +205,16 @@ static int copy_ta_binary(struct tee_context *ctx, void *ptr, void **ta, goto unlock; }
- *ta_size = roundup(fw->size, PAGE_SIZE); - *ta = (void *)__get_free_pages(GFP_KERNEL, get_order(*ta_size)); - if (!*ta) { - pr_err("%s: get_free_pages failed\n", __func__); + size = roundup(fw->size, PAGE_SIZE); + buf = psp_tee_alloc_buffer(size, GFP_KERNEL); + if (!buf) { + pr_err("TA binary memory allocation failed\n"); rc = -ENOMEM; goto rel_fw; } + memcpy(buf->vaddr, fw->data, fw->size); + *bufp = buf;
- memcpy(*ta, fw->data, fw->size); rel_fw: release_firmware(fw); unlock: @@ -234,24 +239,23 @@ int amdtee_open_session(struct tee_context *ctx, { struct amdtee_context_data *ctxdata = ctx->data; struct amdtee_session *sess = NULL; + struct psp_tee_buffer *buf; u32 session_info, ta_handle; - size_t ta_size; int rc, i; - void *ta;
if (arg->clnt_login != TEE_IOCTL_LOGIN_PUBLIC) { pr_err("unsupported client login method\n"); return -EINVAL; }
- rc = copy_ta_binary(ctx, &arg->uuid[0], &ta, &ta_size); + rc = copy_ta_binary(ctx, &arg->uuid[0], &buf); if (rc) { pr_err("failed to copy TA binary\n"); return rc; }
/* Load the TA binary into TEE environment */ - handle_load_ta(ta, ta_size, arg); + handle_load_ta(buf, arg); if (arg->ret != TEEC_SUCCESS) goto out;
@@ -298,7 +302,7 @@ int amdtee_open_session(struct tee_context *ctx, }
out: - free_pages((u64)ta, get_order(ta_size)); + psp_tee_free_buffer(buf); return rc; }
@@ -338,51 +342,62 @@ int amdtee_close_session(struct tee_context *ctx, u32 session) return 0; }
-int amdtee_map_shmem(struct tee_shm *shm) +int amdtee_alloc_shmem(struct tee_shm *shm) { struct amdtee_context_data *ctxdata; struct amdtee_shm_data *shmnode; - struct shmem_desc shmem; - int rc, count; + struct psp_tee_buffer *shm_buf; u32 buf_id; + int rc;
if (!shm) return -EINVAL;
- shmnode = kmalloc(sizeof(*shmnode), GFP_KERNEL); - if (!shmnode) + shm_buf = psp_tee_alloc_buffer(shm->size, GFP_KERNEL | __GFP_ZERO); + if (!shm_buf) return -ENOMEM;
- count = 1; - shmem.kaddr = shm->kaddr; - shmem.size = shm->size; + shm->kaddr = shm_buf->vaddr; + shm->paddr = __psp_pa(shm_buf->vaddr); + + shmnode = kmalloc(sizeof(*shmnode), GFP_KERNEL); + if (!shmnode) { + rc = -ENOMEM; + goto free_dmabuf; + }
/* * Send a MAP command to TEE and get the corresponding * buffer Id */ - rc = handle_map_shmem(count, &shmem, &buf_id); + rc = handle_map_shmem(shm_buf, &buf_id); if (rc) { pr_err("map_shmem failed: ret = %d\n", rc); - kfree(shmnode); - return rc; + goto free_shmnode; }
- shmnode->kaddr = shm->kaddr; + shmnode->shm_buf = shm_buf; shmnode->buf_id = buf_id; ctxdata = shm->ctx->data; mutex_lock(&ctxdata->shm_mutex); list_add(&shmnode->shm_node, &ctxdata->shm_list); mutex_unlock(&ctxdata->shm_mutex);
- pr_debug("buf_id :[%x] kaddr[%p]\n", shmnode->buf_id, shmnode->kaddr); + pr_debug("buf_id :[%x] kaddr[%p]\n", shmnode->buf_id, shm->kaddr);
return 0; + +free_shmnode: + kfree(shmnode); +free_dmabuf: + psp_tee_free_buffer(shm_buf); + return rc; }
-void amdtee_unmap_shmem(struct tee_shm *shm) +void amdtee_free_shmem(struct tee_shm *shm) { struct amdtee_context_data *ctxdata; + struct psp_tee_buffer *shm_buf = NULL; struct amdtee_shm_data *shmnode; u32 buf_id;
@@ -390,6 +405,7 @@ void amdtee_unmap_shmem(struct tee_shm *shm) return;
buf_id = get_buffer_id(shm); + /* Unmap the shared memory from TEE */ handle_unmap_shmem(buf_id);
@@ -398,10 +414,12 @@ void amdtee_unmap_shmem(struct tee_shm *shm) list_for_each_entry(shmnode, &ctxdata->shm_list, shm_node) if (buf_id == shmnode->buf_id) { list_del(&shmnode->shm_node); + shm_buf = shmnode->shm_buf; kfree(shmnode); break; } mutex_unlock(&ctxdata->shm_mutex); + psp_tee_free_buffer(shm_buf); }
int amdtee_invoke_func(struct tee_context *ctx, diff --git a/drivers/tee/amdtee/shm_pool.c b/drivers/tee/amdtee/shm_pool.c index f0303126f199..443874c82611 100644 --- a/drivers/tee/amdtee/shm_pool.c +++ b/drivers/tee/amdtee/shm_pool.c @@ -12,25 +12,13 @@ static int pool_op_alloc(struct tee_shm_pool *pool, struct tee_shm *shm, size_t size, size_t align) { unsigned int order = get_order(size); - unsigned long va; int rc;
- /* - * Ignore alignment since this is already going to be page aligned - * and there's no need for any larger alignment. - */ - va = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); - if (!va) - return -ENOMEM; - - shm->kaddr = (void *)va; - shm->paddr = __psp_pa((void *)va); shm->size = PAGE_SIZE << order;
- /* Map the allocated memory in to TEE */ - rc = amdtee_map_shmem(shm); + /* Allocate and map memory in to TEE */ + rc = amdtee_alloc_shmem(shm); if (rc) { - free_pages(va, order); shm->kaddr = NULL; return rc; } @@ -41,8 +29,9 @@ static int pool_op_alloc(struct tee_shm_pool *pool, struct tee_shm *shm, static void pool_op_free(struct tee_shm_pool *pool, struct tee_shm *shm) { /* Unmap the shared memory from TEE */ - amdtee_unmap_shmem(shm); - free_pages((unsigned long)shm->kaddr, get_order(shm->size)); + amdtee_free_shmem(shm); + + shm->size = 0; shm->kaddr = NULL; }
Hi Jeshwank,
On Wed, 25 Oct 2023 at 12:27, jeshwank JESHWANTHKUMAR.NK@amd.com wrote:
From: Jeshwanth Kumar N K JESHWANTHKUMAR.NK@amd.com
At present, the shared memory for TEE ring buffer, command buffer and data buffer is allocated using get_free_pages(). The driver shares the physical address of these buffers with PSP so that it can be mapped by the Trusted OS.
In this patch series we have replaced get_free_pages() with dma_alloc_coherent() to allocate shared memory to cleanup the existing allocation method.
Thanks for putting this together but I can't find the reasoning behind this change neither in this commit message and nor in the patch descriptions. Care to explain why?
-Sumit
Rijo Thomas (3): crypto: ccp - Add function to allocate and free memory using DMA APIs crypto: ccp - Use psp_tee_alloc_buffer() and psp_tee_free_buffer() tee: amdtee: Use psp_tee_alloc_buffer() and psp_tee_free_buffer()
drivers/crypto/ccp/psp-dev.c | 3 + drivers/crypto/ccp/tee-dev.c | 119 ++++++++++++++++++---------- drivers/crypto/ccp/tee-dev.h | 11 +-- drivers/tee/amdtee/amdtee_private.h | 18 ++--- drivers/tee/amdtee/call.c | 74 ++++++++--------- drivers/tee/amdtee/core.c | 72 ++++++++++------- drivers/tee/amdtee/shm_pool.c | 21 ++--- include/linux/psp-tee.h | 47 +++++++++++ 8 files changed, 221 insertions(+), 144 deletions(-)
-- 2.25.1
On 25-Oct-23 7:01 PM, Sumit Garg wrote:
Hi Jeshwank,
On Wed, 25 Oct 2023 at 12:27, jeshwank JESHWANTHKUMAR.NK@amd.com wrote:
From: Jeshwanth Kumar N K JESHWANTHKUMAR.NK@amd.com
At present, the shared memory for TEE ring buffer, command buffer and data buffer is allocated using get_free_pages(). The driver shares the physical address of these buffers with PSP so that it can be mapped by the Trusted OS.
In this patch series we have replaced get_free_pages() with dma_alloc_coherent() to allocate shared memory to cleanup the existing allocation method.
Thanks for putting this together but I can't find the reasoning behind this change neither in this commit message and nor in the patch descriptions. Care to explain why?
-Sumit
Hi Sumit,
We see that there is an advantage in using dma_alloc_coherent() over get_free_pages(). The dma-ops associated with PSP PCIe device can be overridden. This capability will be helpful when we enable virtualization support. We plan to post a virtualization related patch in future.
Regards,
Jeshwanth
Rijo Thomas (3): crypto: ccp - Add function to allocate and free memory using DMA APIs crypto: ccp - Use psp_tee_alloc_buffer() and psp_tee_free_buffer() tee: amdtee: Use psp_tee_alloc_buffer() and psp_tee_free_buffer()
drivers/crypto/ccp/psp-dev.c | 3 + drivers/crypto/ccp/tee-dev.c | 119 ++++++++++++++++++---------- drivers/crypto/ccp/tee-dev.h | 11 +-- drivers/tee/amdtee/amdtee_private.h | 18 ++--- drivers/tee/amdtee/call.c | 74 ++++++++--------- drivers/tee/amdtee/core.c | 72 ++++++++++------- drivers/tee/amdtee/shm_pool.c | 21 ++--- include/linux/psp-tee.h | 47 +++++++++++ 8 files changed, 221 insertions(+), 144 deletions(-)
-- 2.25.1
On 10/26/23 05:30, NK, JESHWANTHKUMAR wrote:
On 25-Oct-23 7:01 PM, Sumit Garg wrote:
Hi Jeshwank,
On Wed, 25 Oct 2023 at 12:27, jeshwank JESHWANTHKUMAR.NK@amd.com wrote:
From: Jeshwanth Kumar N K JESHWANTHKUMAR.NK@amd.com
At present, the shared memory for TEE ring buffer, command buffer and data buffer is allocated using get_free_pages(). The driver shares the physical address of these buffers with PSP so that it can be mapped by the Trusted OS.
In this patch series we have replaced get_free_pages() with dma_alloc_coherent() to allocate shared memory to cleanup the existing allocation method.
Thanks for putting this together but I can't find the reasoning behind this change neither in this commit message and nor in the patch descriptions. Care to explain why?
-Sumit
Hi Sumit,
We see that there is an advantage in using dma_alloc_coherent() over get_free_pages(). The dma-ops associated with PSP PCIe device can be overridden. This capability will be helpful when we enable virtualization support. We plan to post a virtualization related patch in future.
To be specific, you are referring to Xen virtualization support, correct? Because I don't see how this works in a Qemu/KVM environment where you would get a GPA and not an SPA.
If that is the case, you should clearly specify that. Also, this looks like it should be introduced with the virtualization support that you submit in the future and not before.
Thanks, Tom
Regards,
Jeshwanth
Rijo Thomas (3): crypto: ccp - Add function to allocate and free memory using DMA APIs crypto: ccp - Use psp_tee_alloc_buffer() and psp_tee_free_buffer() tee: amdtee: Use psp_tee_alloc_buffer() and psp_tee_free_buffer()
drivers/crypto/ccp/psp-dev.c | 3 + drivers/crypto/ccp/tee-dev.c | 119 ++++++++++++++++++---------- drivers/crypto/ccp/tee-dev.h | 11 +-- drivers/tee/amdtee/amdtee_private.h | 18 ++--- drivers/tee/amdtee/call.c | 74 ++++++++--------- drivers/tee/amdtee/core.c | 72 ++++++++++------- drivers/tee/amdtee/shm_pool.c | 21 ++--- include/linux/psp-tee.h | 47 +++++++++++ 8 files changed, 221 insertions(+), 144 deletions(-)
-- 2.25.1
Hi Tom,
On 26-Oct-23 8:23 PM, Tom Lendacky wrote:
On 10/26/23 05:30, NK, JESHWANTHKUMAR wrote:
On 25-Oct-23 7:01 PM, Sumit Garg wrote:
Hi Jeshwank,
On Wed, 25 Oct 2023 at 12:27, jeshwank JESHWANTHKUMAR.NK@amd.com wrote:
From: Jeshwanth Kumar N K JESHWANTHKUMAR.NK@amd.com
At present, the shared memory for TEE ring buffer, command buffer and data buffer is allocated using get_free_pages(). The driver shares the physical address of these buffers with PSP so that it can be mapped by the Trusted OS.
In this patch series we have replaced get_free_pages() with dma_alloc_coherent() to allocate shared memory to cleanup the existing allocation method.
Thanks for putting this together but I can't find the reasoning behind this change neither in this commit message and nor in the patch descriptions. Care to explain why?
-Sumit
Hi Sumit,
We see that there is an advantage in using dma_alloc_coherent() over get_free_pages(). The dma-ops associated with PSP PCIe device can be overridden. This capability will be helpful when we enable virtualization support. We plan to post a virtualization related patch in future.
To be specific, you are referring to Xen virtualization support, correct? Because I don't see how this works in a Qemu/KVM environment where you would get a GPA and not an SPA.
The patch is not specific to Xen. We have verified it in Qemu/KVM and Xen PV mode. Support for Xen PVH mode will be added as a separate patch.
If that is the case, you should clearly specify that. Also, this looks like it should be introduced with the virtualization support that you submit in the future and not before.
I will update the commit message in the next version of the patch series to include these details.
Thanks, Tom
Regards,
Jeshwanth
Rijo Thomas (3): crypto: ccp - Add function to allocate and free memory using DMA APIs crypto: ccp - Use psp_tee_alloc_buffer() and psp_tee_free_buffer() tee: amdtee: Use psp_tee_alloc_buffer() and psp_tee_free_buffer()
drivers/crypto/ccp/psp-dev.c | 3 + drivers/crypto/ccp/tee-dev.c | 119 ++++++++++++++++++---------- drivers/crypto/ccp/tee-dev.h | 11 +-- drivers/tee/amdtee/amdtee_private.h | 18 ++--- drivers/tee/amdtee/call.c | 74 ++++++++--------- drivers/tee/amdtee/core.c | 72 ++++++++++------- drivers/tee/amdtee/shm_pool.c | 21 ++--- include/linux/psp-tee.h | 47 +++++++++++ 8 files changed, 221 insertions(+), 144 deletions(-)
-- 2.25.1
op-tee@lists.trustedfirmware.org