From: Dave Patel dave.patel@riscstar.com
Dear OP-TEE Developers,
This proposal introduces architectural runtime context management structures and tracking mechanisms for standard scalar Floating-Point (F/D) and Vector (V) ISA extensions on RISC-V platforms.
### 1. Architectural Alignment and Shift
Currently, the thread scheduling layer in OP-TEE OS implements a tightly coupled VFP model specific to the ARM architecture (e.g., thread_kernel_enable_vfp). This model relies heavily on a software driven "lazy context trap" mechanism, where the kernel disables the FPU to catch subsequent execution faults.
On RISC-V architectures, context tracking cannot rely on software initiated lazy traps via execution faults due to structural opcode overlap across custom extensions and the severe pipeline execution penalties of traps. Instead, RISC-V provides native hardware status state machines managed via the `sstatus.FS` (Floating-Point) and `sstatus.VS` (Vector) bitfields.
To map seamlessly into OP-TEE's existing `thread_*_vfp` thread scheduling, we utilize these hardware flags to implement "eager-on-dirty" context saving. The kernel leaves extensions enabled during active execution and only write at context switch if the hardware reports a `Dirty` status.
### 2. Implementation Subsystem Architecture
To maintain modular design the implementation explicitly divides scalar processing from scalable vector configurations:
1. Modularity & Headers: - <kernel/riscv_fp.h> specifies bitmasks and structures (`struct riscv_fp_state`) for scalar execution. - <kernel/riscv_vector.h> encapsulates scalable vector parameter state layouts (`struct riscv_vector_state`). This allows devices missing a vector unit to completely omit vector footprints or dependency.
2. Bitwidth and Layout Adaptability: - Scalar Low-Level Assembly (`fp_asm.S`): Natively adapts to both 32-bit (`rv32`) and 64-bit (`rv64`) width configurations via compiler `__riscv_xlen` preprocessing directives. - Vector Extension Optimization (`riscv_vector.c`): Fully isolated from primary thread files. Instead of storing vector registers sequentially, it adopts whole-register block transfer instructions (`vs8r.v` and `vl8r.v`). Grouping elements into blocks of eight compresses the save/restore pipeline into four core operational chunks (`v0`, `v8`, `v16`, `v24`).
3. Unified Scheduling Abstraction (`thread_vfp.c`): Unified top-level implementation (`thread_kernel_save_vfp`, `thread_user_enable_vfp`, etc.) are used for context routing.
### 3. Feedback Requested
We are seeking early design feedback from the community regarding: - Structural Convention: Should we keep the universal "vfp" naming scheme within the global `thread.h` header interfaces for structural backward compatibility, or is an explicit upstream refactoring toward a generic name (e.g., `thread_kernel_enable_coproc_regs`) preferred? - Vector Bounds Memory Allocation: What is the preferred approach for safely managing the dynamic heap footprint for vector register states (`vregs`) which scale based on runtime CPU `VLENB` bounds? - Eager Context switching has been proposed, hence are there any reservations on this ?
Looking forward to your suggestions and design critiques.
Dave Patel (3): Floating Point changes RISCV Vector changes Thread changes for RISCV floating point and vector changes
core/arch/riscv/include/riscv_fp.h | 30 +++++ core/arch/riscv/include/riscv_vector.h | 32 +++++ core/arch/riscv/kernel/riscv_fp.S | 159 +++++++++++++++++++++++++ core/arch/riscv/kernel/riscv_vector.c | 77 ++++++++++++ core/arch/riscv/kernel/thread_vfp.c | 142 ++++++++++++++++++++++ 5 files changed, 440 insertions(+) create mode 100644 core/arch/riscv/include/riscv_fp.h create mode 100644 core/arch/riscv/include/riscv_vector.h create mode 100644 core/arch/riscv/kernel/riscv_fp.S create mode 100644 core/arch/riscv/kernel/riscv_vector.c create mode 100644 core/arch/riscv/kernel/thread_vfp.c
-- 2.43.0