Hi,
On Fri, Feb 6, 2026 at 9:54 AM Jens Wiklander jens.wiklander@linaro.org wrote:
Hi Amir,
On Thu, Feb 5, 2026 at 3:13 AM Amirreza Zarrabi amirreza.zarrabi@oss.qualcomm.com wrote:
Hi Jens,
On 2/4/2026 6:46 PM, Jens Wiklander wrote:
Hi Amir,
On Tue, Feb 3, 2026 at 11:56 PM Amirreza Zarrabi amirreza.zarrabi@oss.qualcomm.com wrote:
Hi Jens,
On 2/3/2026 5:59 PM, Jens Wiklander wrote:
Hi,
On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi amirreza.zarrabi@oss.qualcomm.com wrote:
Hi Jens,
On 2/2/2026 10:36 PM, Jens Wiklander wrote: > Hi Amir, > > On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi > amirreza.zarrabi@oss.qualcomm.com wrote: >> >> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the >> client wait as killable so it can be interrupted during shutdown or >> after a supplicant crash. This changes the original lifetime expectations: >> the client task can now terminate while the supplicant is still processing >> its request. >> >> If the client exits first it removes the request from its queue and >> kfree()s it, while the request ID remains in supp->idr. A subsequent >> lookup on the supplicant path then dereferences freed memory, leading to >> a use-after-free. >> >> Serialise access to the request with supp->mutex: >> >> * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while >> looking up and touching the request. >> * Let optee_supp_thrd_req() notice that the client has terminated and >> signal optee_supp_send() accordingly. >> >> With these changes the request cannot be freed while the supplicant still >> has a reference, eliminating the race. >> >> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") >> Signed-off-by: Amirreza Zarrabi amirreza.zarrabi@oss.qualcomm.com >> --- >> Changes in v3: >> - Introduce processed flag instead of -1 for req->id. >> - Update optee_supp_release() as reported by Michael Wu. >> - Use mutex instead of guard. >> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.... >> >> Changes in v2: >> - Replace the static variable with a sentinel value. >> - Fix the issue with returning the popped request to the supplicant. >> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.... >> --- >> drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++-------------- >> 1 file changed, 86 insertions(+), 36 deletions(-) >
[snip]
>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params, >> >> mutex_lock(&supp->mutex); >> req = supp_pop_req(supp, num_params, param, &num_meta); >> - mutex_unlock(&supp->mutex); >> - >> if (IS_ERR(req)) { >> + mutex_unlock(&supp->mutex); > > We need a way to tell the difference between an id not found and an id > removed because of a killed requester. > How about storing NULL for revoked requests instead of an err-pointer? >
Not sure I'm following correctly. Are you expecting supp_pop_req() to return NULL instead of an err-pointer when a request has been revoked?
I was looking at it again, and storing an err-pointer as you do in this patch has the advantage that we can tell whether the ID has been revoked or was never supplied. In the latter case, it suggests that the supplicant is doing something wrong and might as well restart in an attempt to recover. So, please keep using an err-pointer as a placeholder, but we must be able to distinguish a revoked request from other errors to make sure that the supplicant doesn't restart due to a revoked request.
Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT (which seems more natural), so it doesn't overlap with other supp_pop_req() error codes and the supplicant can reliably detect it.
Any error returned by TEE_IOC_SUPPL_SEND (or TEE_IOC_SUPPL_RECV) will cause the tee-supplicant to exit. Even if we update it to ignore certain codes, we must also consider the installed base. There's not much tee-supplicant could do with this error, except logging it. But I don't think that is very useful either. Unless the tee-supplicant does anything wrong or if the device isn't working any longer, we shouldn't return an error.
The direction of data flow in optee_supp_send is from the supplicant to optee, so the only way I can return meaningful information back to the supplicant is through the return value. I suppose I could simply ignore the revoked request and return success, but it might be useful for the supplicant to know about it in case it needs to roll back something.
At this point I'm out of ideas :). Do you have any suggestions on how I can inform the supplicant about a revoked request in optee_supp_send while returning success return value?
This became a bit harder than I first thought. At this point, to fix the possible use-after-free, we have two options:
- Returning an error code: tee-supplicant will exit
- Returning OK: worst case, tee-supplicant can leak memory
During reboot, neither case is a problem. During normal operation, it's annoying if tee-supplicant exists, but you still need some privileges to kill the client. If we return an error, it's enough to update tee-supplicant to handle that error case, and we're done. The advantage is no added code to the kernel.
I think we should do what you suggested and return an error. This will not happen during normal operation. We'll fix tee-supplicant to handle the return error properly. tee-supplicant doesn't care about what error code it gets. If there's an error in TEE_IOC_SUPPL_SEND, it knows that no one will receive whatever was sent, and cleanup is needed.
Sumit and Jerome, what do you think?
Sounds good.
Thanks,