From: Dave Patel dave.patel@riscstar.com
Dear OP-TEE Developers,
This proposal introduces architectural runtime context management structures and tracking mechanisms for standard scalar Floating-Point (F/D) and Vector (V) ISA extensions on RISC-V platforms.
### 1. Architectural Alignment and Shift
Currently, the thread scheduling layer in OP-TEE OS implements a tightly coupled VFP model specific to the ARM architecture (e.g., thread_kernel_enable_vfp). This model relies heavily on a software driven "lazy context trap" mechanism, where the kernel disables the FPU to catch subsequent execution faults.
On RISC-V architectures, context tracking cannot rely on software initiated lazy traps via execution faults due to structural opcode overlap across custom extensions and the severe pipeline execution penalties of traps. Instead, RISC-V provides native hardware status state machines managed via the `sstatus.FS` (Floating-Point) and `sstatus.VS` (Vector) bitfields.
To map seamlessly into OP-TEE's existing `thread_*_vfp` thread scheduling, we utilize these hardware flags to implement "eager-on-dirty" context saving. The kernel leaves extensions enabled during active execution and only write at context switch if the hardware reports a `Dirty` status.
### 2. Implementation Subsystem Architecture
To maintain modular design the implementation explicitly divides scalar processing from scalable vector configurations:
1. Modularity & Headers: - <kernel/riscv_fp.h> specifies bitmasks and structures (`struct riscv_fp_state`) for scalar execution. - <kernel/riscv_vector.h> encapsulates scalable vector parameter state layouts (`struct riscv_vector_state`). This allows devices missing a vector unit to completely omit vector footprints or dependency.
2. Bitwidth and Layout Adaptability: - Scalar Low-Level Assembly (`fp_asm.S`): Natively adapts to both 32-bit (`rv32`) and 64-bit (`rv64`) width configurations via compiler `__riscv_xlen` preprocessing directives. - Vector Extension Optimization (`riscv_vector.c`): Fully isolated from primary thread files. Instead of storing vector registers sequentially, it adopts whole-register block transfer instructions (`vs8r.v` and `vl8r.v`). Grouping elements into blocks of eight compresses the save/restore pipeline into four core operational chunks (`v0`, `v8`, `v16`, `v24`).
3. Unified Scheduling Abstraction (`thread_vfp.c`): Unified top-level implementation (`thread_kernel_save_vfp`, `thread_user_enable_vfp`, etc.) are used for context routing.
### 3. Feedback Requested
We are seeking early design feedback from the community regarding: - Structural Convention: Should we keep the universal "vfp" naming scheme within the global `thread.h` header interfaces for structural backward compatibility, or is an explicit upstream refactoring toward a generic name (e.g., `thread_kernel_enable_coproc_regs`) preferred? - Vector Bounds Memory Allocation: What is the preferred approach for safely managing the dynamic heap footprint for vector register states (`vregs`) which scale based on runtime CPU `VLENB` bounds? - Eager Context switching has been proposed, hence are there any reservations on this ?
Looking forward to your suggestions and design critiques.
Dave Patel (3): Floating Point changes RISCV Vector changes Thread changes for RISCV floating point and vector changes
core/arch/riscv/include/riscv_fp.h | 30 +++++ core/arch/riscv/include/riscv_vector.h | 32 +++++ core/arch/riscv/kernel/riscv_fp.S | 159 +++++++++++++++++++++++++ core/arch/riscv/kernel/riscv_vector.c | 77 ++++++++++++ core/arch/riscv/kernel/thread_vfp.c | 142 ++++++++++++++++++++++ 5 files changed, 440 insertions(+) create mode 100644 core/arch/riscv/include/riscv_fp.h create mode 100644 core/arch/riscv/include/riscv_vector.h create mode 100644 core/arch/riscv/kernel/riscv_fp.S create mode 100644 core/arch/riscv/kernel/riscv_vector.c create mode 100644 core/arch/riscv/kernel/thread_vfp.c
-- 2.43.0
From: Dave Patel dave.patel@riscstar.com
Signed-off-by: Dave Patel dave.patel@riscstar.com --- core/arch/riscv/include/riscv_fp.h | 30 ++++++ core/arch/riscv/kernel/riscv_fp.S | 159 +++++++++++++++++++++++++++++ 2 files changed, 189 insertions(+) create mode 100644 core/arch/riscv/include/riscv_fp.h create mode 100644 core/arch/riscv/kernel/riscv_fp.S
diff --git a/core/arch/riscv/include/riscv_fp.h b/core/arch/riscv/include/riscv_fp.h new file mode 100644 index 000000000..19faf925f --- /dev/null +++ b/core/arch/riscv/include/riscv_fp.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: BSD-2-Clause */ +/* + * Copyright (c) 2026, RISCStar Limited + */ + +#ifndef __KERNEL_RISCV_FP_H +#define __KERNEL_RISCV_FP_H + +#include <types_ext.h> +#include <compiler.h> + +/* CSR Status Bit Masks for Floating-Point Extensions */ +#define SSTATUS_FS_MASK SHIFT_U32(3, 13) /* bits [14:13] */ +#define SSTATUS_FS_OFF SHIFT_U32(0, 13) +#define SSTATUS_FS_INITIAL SHIFT_U32(1, 13) +#define SSTATUS_FS_CLEAN SHIFT_U32(2, 13) +#define SSTATUS_FS_DIRTY SHIFT_U32(3, 13) + +/* Floating-Point Context Struct */ +struct riscv_fp_state { +#if __riscv_xlen == 64 + uint64_t fpregs[32]; +#elif __riscv_xlen == 32 + uint32_t fpregs[32]; +#endif + unsigned long fcsr; +}; + +#endif /* __KERNEL_RISCV_FP_H */ + diff --git a/core/arch/riscv/kernel/riscv_fp.S b/core/arch/riscv/kernel/riscv_fp.S new file mode 100644 index 000000000..4fa53649e --- /dev/null +++ b/core/arch/riscv/kernel/riscv_fp.S @@ -0,0 +1,159 @@ +/* SPDX-License-Identifier: BSD-2-Clause */ +/* + * Copyright (c) 2026, RISCStar Limited + */ + +#include <asm.h> + +/* void __asm_save_fp_state */ +FUNC __asm_save_fp_state , : +#if __riscv_xlen == 64 + fsd f0, 0(a0) + fsd f1, 8(a0) + fsd f2, 16(a0) + fsd f3, 24(a0) + fsd f4, 32(a0) + fsd f5, 40(a0) + fsd f6, 48(a0) + fsd f7, 56(a0) + fsd f8, 64(a0) + fsd f9, 72(a0) + fsd f10, 80(a0) + fsd f11, 88(a0) + fsd f12, 96(a0) + fsd f13, 104(a0) + fsd f14, 112(a0) + fsd f15, 120(a0) + fsd f16, 128(a0) + fsd f17, 136(a0) + fsd f18, 144(a0) + fsd f19, 152(a0) + fsd f20, 160(a0) + fsd f21, 168(a0) + fsd f22, 176(a0) + fsd f23, 184(a0) + fsd f24, 192(a0) + fsd f25, 200(a0) + fsd f26, 208(a0) + fsd f27, 216(a0) + fsd f28, 224(a0) + fsd f29, 232(a0) + fsd f30, 240(a0) + fsd f31, 248(a0) + csrr t0, fcsr + sd t0, 256(a0) +#elif __riscv_xlen == 32 + fsw f0, 0(a0) + fsw f1, 4(a0) + fsw f2, 8(a0) + fsw f3, 12(a0) + fsw f4, 16(a0) + fsw f5, 20(a0) + fsw f6, 24(a0) + fsw f7, 28(a0) + fsw f8, 32(a0) + fsw f9, 36(a0) + fsw f10, 40(a0) + fsw f11, 44(a0) + fsw f12, 48(a0) + fsw f13, 52(a0) + fsw f14, 56(a0) + fsw f15, 60(a0) + fsw f16, 64(a0) + fsw f17, 68(a0) + fsw f18, 72(a0) + fsw f19, 76(a0) + fsw f20, 80(a0) + fsw f21, 84(a0) + fsw f22, 88(a0) + fsw f23, 92(a0) + fsw f24, 96(a0) + fsw f25, 100(a0) + fsw f26, 104(a0) + fsw f27, 108(a0) + fsw f28, 112(a0) + fsw f29, 116(a0) + fsw f30, 120(a0) + fsw f31, 124(a0) + csrr t0, fcsr + sw t0, 128(a0) +#endif + ret +END_FUNC __asm_save_fp_state + +/* void __asm_restore_fp_state */ +FUNC __asm_restore_fp_state , : +#if __riscv_xlen == 64 + ld t0, 256(a0) + csrw fcsr, t0 + fld f0, 0(a0) + fld f1, 8(a0) + fld f2, 16(a0) + fld f3, 24(a0) + fld f4, 32(a0) + fld f5, 40(a0) + fld f6, 48(a0) + fld f7, 56(a0) + fld f8, 64(a0) + fld f9, 72(a0) + fld f10, 80(a0) + fld f11, 88(a0) + fld f12, 96(a0) + fld f13, 104(a0) + fld f14, 112(a0) + fld f15, 120(a0) + fld f16, 128(a0) + fld f17, 136(a0) + fld f18, 144(a0) + fld f19, 152(a0) + fld f20, 160(a0) + fld f21, 168(a0) + fld f22, 176(a0) + fld f23, 184(a0) + fld f24, 192(a0) + fld f25, 200(a0) + fld f26, 208(a0) + fld f27, 216(a0) + fld f28, 224(a0) + fld f29, 232(a0) + fld f30, 240(a0) + fld f31, 248(a0) +#elif __riscv_xlen == 32 + lw t0, 128(a0) + csrw fcsr, t0 + flw f0, 0(a0) + flw f1, 4(a0) + flw f2, 8(a0) + flw f3, 12(a0) + flw f4, 16(a0) + flw f5, 20(a0) + flw f6, 24(a0) + flw f7, 28(a0) + flw f8, 32(a0) + flw f9, 36(a0) + flw f10, 40(a0) + flw f11, 44(a0) + flw f12, 48(a0) + flw f13, 52(a0) + flw f14, 56(a0) + flw f15, 60(a0) + flw f16, 64(a0) + flw f17, 68(a0) + flw f18, 72(a0) + flw f19, 76(a0) + flw f20, 80(a0) + flw f21, 84(a0) + flw f22, 88(a0) + flw f23, 92(a0) + flw f24, 96(a0) + flw f25, 100(a0) + f_lw f26, 104(a0) + flw f27, 108(a0) + flw f28, 112(a0) + flw f29, 116(a0) + flw f30, 120(a0) + flw f31, 124(a0) +#endif + ret +END_FUNC __asm_restore_fp_state +
From: Dave Patel dave.patel@riscstar.com
Signed-off-by: Dave Patel dave.patel@riscstar.com --- core/arch/riscv/include/riscv_vector.h | 32 +++++++++++ core/arch/riscv/kernel/riscv_vector.c | 77 ++++++++++++++++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 core/arch/riscv/include/riscv_vector.h create mode 100644 core/arch/riscv/kernel/riscv_vector.c
diff --git a/core/arch/riscv/include/riscv_vector.h b/core/arch/riscv/include/riscv_vector.h new file mode 100644 index 000000000..2aa3b26db --- /dev/null +++ b/core/arch/riscv/include/riscv_vector.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: BSD-2-Clause */ +/* + * Copyright (c) 2026, RISCStar Limited + */ + +#ifndef __KERNEL_RISCV_VECTOR_H +#define __KERNEL_RISCV_VECTOR_H + +#include <types_ext.h> +#include <compiler.h> + +/* CSR Status Bit Masks for Vector Extensions */ +#define SSTATUS_VS_MASK SHIFT_U32(3, 9) /* bits [10:9] */ +#define SSTATUS_VS_OFF SHIFT_U32(0, 9) +#define SSTATUS_VS_INITIAL SHIFT_U32(1, 9) +#define SSTATUS_VS_CLEAN SHIFT_U32(2, 9) +#define SSTATUS_VS_DIRTY SHIFT_U32(3, 9) + +/* Vector Context Struct */ +struct riscv_vector_state { + uint8_t *vregs; + unsigned long vtype; + unsigned long vl; + unsigned long vcsr; + unsigned long vstart; +}; + +/* Internal context routers instantiated inside riscv_vector.c */ +void riscv_vector_save_internal(struct riscv_vector_state *dst); +void riscv_vector_restore_internal(const struct riscv_vector_state *src); + +#endif /* __KERNEL_RISCV_VECTOR_H */ diff --git a/core/arch/riscv/kernel/riscv_vector.c b/core/arch/riscv/kernel/riscv_vector.c new file mode 100644 index 000000000..b7f7431fe --- /dev/null +++ b/core/arch/riscv/kernel/riscv_vector.c @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: BSD-2-Clause */ +/* + * Copyright (c) 2026, RISCStar Limited + */ + +#include <kernel/riscv_vector.h> +#include <types_ext.h> + +void riscv_vector_save_internal(struct riscv_vector_state *dst) +{ + unsigned long vlenb; + uint8_t *base; + + if (!dst) + return; + + asm volatile("csrr %0, vtype" : "=r"(dst->vtype)); + asm volatile("csrr %0, vl" : "=r"(dst->vl)); + asm volatile("csrr %0, vcsr" : "=r"(dst->vcsr)); + asm volatile("csrr %0, vstart" : "=r"(dst->vstart)); + asm volatile("csrr %0, 0xc22" : "=r"(vlenb)); /* CSR_VLENB = 0xc22 */ + + base = dst->vregs; + +#define SAVE_VREG_CHUNK(i) \ + asm volatile( \ + " .option push\n\t" \ + " .option arch, +v\n\t" \ + " vs8r.v v" #i ", (%0)\n\t" \ + " .option pop\n\t" \ + :: "r"(base + (i) * vlenb) : "memory") + + SAVE_VREG_CHUNK(0); + SAVE_VREG_CHUNK(8); + SAVE_VREG_CHUNK(16); + SAVE_VREG_CHUNK(24); +#undef SAVE_VREG_CHUNK +} + +void riscv_vector_restore_internal(const struct riscv_vector_state *src) +{ + unsigned long vlenb; + const uint8_t *base; + + if (!src) + return; + + asm volatile("csrw vcsr, %0" :: "r"(src->vcsr)); + asm volatile("csrw vstart, %0" :: "r"(src->vstart)); + + /* Re-establish execution parameters prior to block transfer */ + asm volatile( + " .option push\n\t" + " .option arch, +v\n\t" + " vsetvl zero, %0, %1\n\t" + " .option pop\n\t" + :: "r"(src->vl), "r"(src->vtype)); + + asm volatile("csrr %0, 0xc22" : "=r"(vlenb)); + base = src->vregs; + +#define RESTORE_VREG_CHUNK(i) \ + \ + asm volatile( \ + " .option push\n\t" \ + " .option arch, +v\n\t" \ + " vl8r.v v" #i ", (%0)\n\t" \ + " .option pop\n\t" \ + :: "r"(base + (i) * vlenb) : "memory") + + RESTORE_VREG_CHUNK(0); + RESTORE_VREG_CHUNK(8); + RESTORE_VREG_CHUNK(16); + RESTORE_VREG_CHUNK(24); +#undef RESTORE_VREG_CHUNK +} +
From: Dave Patel dave.patel@riscstar.com
Signed-off-by: Dave Patel dave.patel@riscstar.com --- core/arch/riscv/kernel/thread_vfp.c | 142 ++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+) create mode 100644 core/arch/riscv/kernel/thread_vfp.c
diff --git a/core/arch/riscv/kernel/thread_vfp.c b/core/arch/riscv/kernel/thread_vfp.c new file mode 100644 index 000000000..9eaf20ad1 --- /dev/null +++ b/core/arch/riscv/kernel/thread_vfp.c @@ -0,0 +1,142 @@ +/* SPDX-License-Identifier: BSD-2-Clause */ +/* + * Copyright (c) 2026, RISCStar Limited + */ + +#include <kernel/thread.h> +#include <kernel/riscv_fp.h> +#include <kernel/riscv_vector.h> +#include <assert.h> + +void __asm_save_fp_state(struct riscv_fp_state *ctx); +void __asm_restore_fp_state(struct riscv_fp_state *ctx); + +#define STATE_TOKEN_FP SHIFT_U32(1, 0) +#define STATE_TOKEN_VEC SHIFT_U32(1, 1) + +struct thread_riscv_ext_state { + struct riscv_fp_state fp_ctx; + struct riscv_vector_state v_ctx; + bool fp_saved; + bool vec_saved; +}; + +static struct thread_riscv_ext_state cpu_ctx_states[CFG_NUM_THREADS]; + +static inline unsigned long read_sstatus(void) +{ + unsigned long val; + asm volatile("csrr %0, sstatus" : "=r"(val)); + return val; +} + +static inline void write_sstatus(unsigned long val) +{ + asm volatile("csrw sstatus, %0" :: "r"(val)); +} + +uint32_t thread_kernel_enable_vfp(void) +{ + unsigned long sstatus = read_sstatus(); + uint32_t active_token = 0; + + if ((sstatus & SSTATUS_FS_MASK) != SSTATUS_FS_CLEAN) { + sstatus &= ~SSTATUS_FS_MASK; + sstatus |= SSTATUS_FS_CLEAN; + active_token |= STATE_TOKEN_FP; + } + + if ((sstatus & SSTATUS_VS_MASK) != SSTATUS_VS_CLEAN) { + sstatus &= ~SSTATUS_VS_MASK; + sstatus |= SSTATUS_VS_CLEAN; + active_token |= STATE_TOKEN_VEC; + } + + write_sstatus(sstatus); + return active_token; +} + +void thread_kernel_disable_vfp(uint32_t state_value) +{ + unsigned long sstatus = read_sstatus(); + + if (state_value & STATE_TOKEN_FP) { + sstatus &= ~SSTATUS_FS_MASK; + sstatus |= SSTATUS_FS_OFF; + } + if (state_value & STATE_TOKEN_VEC) { + sstatus &= ~SSTATUS_VS_MASK; + sstatus |= SSTATUS_VS_OFF; + } + write_sstatus(sstatus); +} + +void thread_kernel_save_vfp(void) +{ + unsigned long sstatus = read_sstatus(); + uint32_t tid = thread_get_id(); + struct thread_riscv_ext_state *state = &cpu_ctx_states[tid]; + + if ((sstatus & SSTATUS_FS_MASK) == SSTATUS_FS_DIRTY) { + __asm_save_fp_state(&state->fp_ctx); + state->fp_saved = true; + sstatus = (sstatus & ~SSTATUS_FS_MASK) | SSTATUS_FS_CLEAN; + } + + if ((sstatus & SSTATUS_VS_MASK) == SSTATUS_VS_DIRTY) { + assert(state->v_ctx.vregs != NULL); + riscv_vector_save_internal(&state->v_ctx); + state->vec_saved = true; + sstatus = (sstatus & ~SSTATUS_VS_MASK) | SSTATUS_VS_CLEAN; + } + write_sstatus(sstatus); +} + +void thread_kernel_restore_vfp(void) +{ + unsigned long sstatus = read_sstatus(); + uint32_t tid = thread_get_id(); + struct thread_riscv_ext_state *state = &cpu_ctx_states[tid]; + + if (state->fp_saved) { + sstatus = (sstatus & ~SSTATUS_FS_MASK) | SSTATUS_FS_CLEAN; + write_sstatus(sstatus); + __asm_restore_fp_state(&state->fp_ctx); + state->fp_saved = false; + } + + if (state->vec_saved) { + sstatus = (sstatus & ~SSTATUS_VS_MASK) | SSTATUS_VS_CLEAN; + write_sstatus(sstatus); + riscv_vector_restore_internal(&state->v_ctx); + state->vec_saved = false; + } +} + +void thread_user_enable_vfp(struct thread_user_vfp_state *uvfp __unused) +{ + unsigned long sstatus = read_sstatus(); + sstatus = (sstatus & ~SSTATUS_FS_MASK) | SSTATUS_FS_INITIAL; + sstatus = (sstatus & ~SSTATUS_VS_MASK) | SSTATUS_VS_INITIAL; + write_sstatus(sstatus); +} + +void thread_user_save_vfp(struct thread_user_vfp_state *uvfp __unused) +{ + thread_kernel_save_vfp(); +} + +void thread_user_clear_vfp(struct thread_user_vfp_state *uvfp __unused) +{ + unsigned long sstatus = read_sstatus(); + uint32_t tid = thread_get_id(); + struct thread_riscv_ext_state *state = &cpu_ctx_states[tid]; + + state->fp_saved = false; + state->vec_saved = false; + + sstatus &= ~(SSTATUS_FS_MASK | SSTATUS_VS_MASK); + sstatus |= (SSTATUS_FS_OFF | SSTATUS_VS_OFF); + write_sstatus(sstatus); +} +
op-tee@lists.trustedfirmware.org