OP-TEE supplicant is a user-space daemon and it's possible for it being hung or crashed or killed in the middle of processing an OP-TEE RPC call. It becomes more complicated when there is incorrect shutdown ordering of the supplicant process vs the OP-TEE client application which can eventually lead to system hang-up waiting for the closure of the client application.
Allow the client process waiting in kernel for supplicant response to be killed rather than indefinitetly waiting in an unkillable state. This fixes issues observed during system reboot/shutdown when supplicant got hung for some reason or gets crashed/killed which lead to client getting hung in an unkillable state. It in turn lead to system being in hung up state requiring hard power off/on to recover.
Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver") Suggested-by: Arnd Bergmann arnd@arndb.de Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg sumit.garg@linaro.org ---
Changes in v2: - Switch to killable wait instead as suggested by Arnd instead of supplicant timeout. It atleast allow the client to wait for supplicant in killable state which in turn allows system to reboot or shutdown gracefully.
drivers/tee/optee/supp.c | 32 +++++++------------------------- 1 file changed, 7 insertions(+), 25 deletions(-)
diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c index 322a543b8c27..3fbfa9751931 100644 --- a/drivers/tee/optee/supp.c +++ b/drivers/tee/optee/supp.c @@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, struct optee *optee = tee_get_drvdata(ctx->teedev); struct optee_supp *supp = &optee->supp; struct optee_supp_req *req; - bool interruptable; u32 ret;
/* @@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, /* * Wait for supplicant to process and return result, once we've * returned from wait_for_completion(&req->c) successfully we have - * exclusive access again. + * exclusive access again. Allow the wait to be killable such that + * the wait doesn't turn into an indefinite state if the supplicant + * gets hung for some reason. */ - while (wait_for_completion_interruptible(&req->c)) { - mutex_lock(&supp->mutex); - interruptable = !supp->ctx; - if (interruptable) { - /* - * There's no supplicant available and since the - * supp->mutex currently is held none can - * become available until the mutex released - * again. - * - * Interrupting an RPC to supplicant is only - * allowed as a way of slightly improving the user - * experience in case the supplicant hasn't been - * started yet. During normal operation the supplicant - * will serve all requests in a timely manner and - * interrupting then wouldn't make sense. - */ + if (wait_for_completion_killable(&req->c)) { + if (!mutex_lock_killable(&supp->mutex)) { if (req->in_queue) { list_del(&req->link); req->in_queue = false; } + mutex_unlock(&supp->mutex); } - mutex_unlock(&supp->mutex); - - if (interruptable) { - req->ret = TEEC_ERROR_COMMUNICATION; - break; - } + req->ret = TEEC_ERROR_COMMUNICATION; }
ret = req->ret;
On Mon, Feb 3, 2025, at 09:00, Sumit Garg wrote:
OP-TEE supplicant is a user-space daemon and it's possible for it being hung or crashed or killed in the middle of processing an OP-TEE RPC call. It becomes more complicated when there is incorrect shutdown ordering of the supplicant process vs the OP-TEE client application which can eventually lead to system hang-up waiting for the closure of the client application.
Allow the client process waiting in kernel for supplicant response to be killed rather than indefinitetly waiting in an unkillable state.
It would be good to mention that the existing version ends up in a busy-loop here because of the broken wait_for_completion_interruptible() loop.
A normal uninterruptible wait should not have resulted in the hung-task watchdog getting triggered, but the endless loop would.
- if (wait_for_completion_killable(&req->c)) {
if (!mutex_lock_killable(&supp->mutex)) { if (req->in_queue) {
Using mutex_trylock() here is probably clearer than mutex_lock_killable(), since a task that is already in the process of getting killed won't ever wait for the mutex. mutex_lock_killable() does try to get the lock first though if nobody else holds it already, which is the same as trylock.
Arnd
Hi Arnd,
On Mon, 3 Feb 2025 at 13:46, Arnd Bergmann arnd@arndb.de wrote:
On Mon, Feb 3, 2025, at 09:00, Sumit Garg wrote:
OP-TEE supplicant is a user-space daemon and it's possible for it being hung or crashed or killed in the middle of processing an OP-TEE RPC call. It becomes more complicated when there is incorrect shutdown ordering of the supplicant process vs the OP-TEE client application which can eventually lead to system hang-up waiting for the closure of the client application.
Allow the client process waiting in kernel for supplicant response to be killed rather than indefinitetly waiting in an unkillable state.
It would be good to mention that the existing version ends up in a busy-loop here because of the broken wait_for_completion_interruptible() loop.
A normal uninterruptible wait should not have resulted in the hung-task watchdog getting triggered, but the endless loop would.
Sure, I will add that.
if (wait_for_completion_killable(&req->c)) {
if (!mutex_lock_killable(&supp->mutex)) { if (req->in_queue) {
Using mutex_trylock() here is probably clearer than mutex_lock_killable(), since a task that is already in the process of getting killed won't ever wait for the mutex. mutex_lock_killable() does try to get the lock first though if nobody else holds it already, which is the same as trylock.
Let's follow-up on this in the other thread.
-Sumit
Arnd
On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg sumit.garg@linaro.org wrote:
OP-TEE supplicant is a user-space daemon and it's possible for it being hung or crashed or killed in the middle of processing an OP-TEE RPC call. It becomes more complicated when there is incorrect shutdown ordering of the supplicant process vs the OP-TEE client application which can eventually lead to system hang-up waiting for the closure of the client application.
Allow the client process waiting in kernel for supplicant response to be killed rather than indefinitetly waiting in an unkillable state. This fixes issues observed during system reboot/shutdown when supplicant got hung for some reason or gets crashed/killed which lead to client getting hung in an unkillable state. It in turn lead to system being in hung up state requiring hard power off/on to recover.
Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver") Suggested-by: Arnd Bergmann arnd@arndb.de Cc: stable@vger.kernel.org Signed-off-by: Sumit Garg sumit.garg@linaro.org
Changes in v2:
- Switch to killable wait instead as suggested by Arnd instead of supplicant timeout. It atleast allow the client to wait for supplicant in killable state which in turn allows system to reboot or shutdown gracefully.
drivers/tee/optee/supp.c | 32 +++++++------------------------- 1 file changed, 7 insertions(+), 25 deletions(-)
diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c index 322a543b8c27..3fbfa9751931 100644 --- a/drivers/tee/optee/supp.c +++ b/drivers/tee/optee/supp.c @@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, struct optee *optee = tee_get_drvdata(ctx->teedev); struct optee_supp *supp = &optee->supp; struct optee_supp_req *req;
bool interruptable; u32 ret; /*
@@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params, /* * Wait for supplicant to process and return result, once we've * returned from wait_for_completion(&req->c) successfully we have
* exclusive access again.
* exclusive access again. Allow the wait to be killable such that
* the wait doesn't turn into an indefinite state if the supplicant
* gets hung for some reason. */
while (wait_for_completion_interruptible(&req->c)) {
mutex_lock(&supp->mutex);
interruptable = !supp->ctx;
if (interruptable) {
/*
* There's no supplicant available and since the
* supp->mutex currently is held none can
* become available until the mutex released
* again.
*
* Interrupting an RPC to supplicant is only
* allowed as a way of slightly improving the user
* experience in case the supplicant hasn't been
* started yet. During normal operation the supplicant
* will serve all requests in a timely manner and
* interrupting then wouldn't make sense.
*/
if (wait_for_completion_killable(&req->c)) {
if (!mutex_lock_killable(&supp->mutex)) {
Why not mutex_lock()? If we fail to acquire the mutex here, we will quite likely free the req list item below at the end of this function while it remains in the list.
Cheers, Jens
if (req->in_queue) { list_del(&req->link); req->in_queue = false; }
mutex_unlock(&supp->mutex); }
mutex_unlock(&supp->mutex);
if (interruptable) {
req->ret = TEEC_ERROR_COMMUNICATION;
break;
}
req->ret = TEEC_ERROR_COMMUNICATION; } ret = req->ret;
-- 2.43.0
On Mon, Feb 3, 2025, at 10:31, Jens Wiklander wrote:
On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg sumit.garg@linaro.org wrote:
Why not mutex_lock()? If we fail to acquire the mutex here, we will quite likely free the req list item below at the end of this function while it remains in the list.
Right, I had mentioned mutex_lock_killable in an earlier reply, as I didn't know exactly where it hang. If we know that the wait_event_interruptible() was causing the hang, then the normal mutex_lock should be fine.
Arnd
On Mon, 3 Feb 2025 at 16:23, Arnd Bergmann arnd@arndb.de wrote:
On Mon, Feb 3, 2025, at 10:31, Jens Wiklander wrote:
On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg sumit.garg@linaro.org wrote:
Why not mutex_lock()? If we fail to acquire the mutex here, we will quite likely free the req list item below at the end of this function while it remains in the list.
Right, I had mentioned mutex_lock_killable in an earlier reply, as I didn't know exactly where it hang. If we know that the wait_event_interruptible() was causing the hang, then the normal mutex_lock should be fine.
Yeah for my current test scenario mutex_lock() works fine too but I added mutex_lock_killable() just in another theoretical corner case being stuck acquiring mutex while someone is trying to kill the process. But let's avoid over complexity for the time being with a simpler mutex_lock() approach.
-Sumit
op-tee@lists.trustedfirmware.org