Hi,
I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
rcu_preempt detected stalls on CPUs/tasks
from Linux 6.6.90 (mainline)
Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
Regards
Thomas
Hi Thomas,
On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi,
I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
rcu_preempt detected stalls on CPUs/tasks
from Linux 6.6.90 (mainline)
Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can use timers, but then it's using the EL3 physical timer. So OP-TEE should stay off the EL1 physical timer. Sumit, what's your view?
Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
I'm surprised we haven't seen more of this issue.
Cheers, Jens
Regards
Thomas
Hi
One reason for not seeing this issue more is that the NXP SDK ships with the NS_TIMER_SWITCH=1 patch reverted.
The current SDK for example has this commit (it was there already 2020 on older branches): https://github.com/nxp-imx/imx-atf/commit/c73b052c4d57a10b9bfcd9002e8730088d...
/Lars
From: Jens Wiklander jens.wiklander@linaro.org Date: Wednesday, 20 August 2025 at 13:10 To: Stauffer Thomas MTANA Thomas.Stauffer@mt.com, Sumit Garg sumit.garg@kernel.org Cc: op-tee@lists.trustedfirmware.org op-tee@lists.trustedfirmware.org, Ferreira Joao MTANA Joao.Ferreira@mt.com Subject: Re: rcu_preempt detected stalls on CPUs/tasks Hi Thomas,
On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi,
I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
rcu_preempt detected stalls on CPUs/tasks
from Linux 6.6.90 (mainline)
Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can use timers, but then it's using the EL3 physical timer. So OP-TEE should stay off the EL1 physical timer. Sumit, what's your view?
Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
I'm surprised we haven't seen more of this issue.
Cheers, Jens
Regards
Thomas
Hi Lars,
On Wed, Aug 20, 2025 at 2:39 PM Lars Persson Lars.Persson@axis.com wrote:
Hi
One reason for not seeing this issue more is that the NXP SDK ships with the NS_TIMER_SWITCH=1 patch reverted.
The current SDK for example has this commit (it was there already 2020 on older branches):
https://github.com/nxp-imx/imx-atf/commit/c73b052c4d57a10b9bfcd9002e8730088d...
Thanks, that's good to know.
Cheers, Jens
/Lars
From: Jens Wiklander jens.wiklander@linaro.org Date: Wednesday, 20 August 2025 at 13:10 To: Stauffer Thomas MTANA Thomas.Stauffer@mt.com, Sumit Garg sumit.garg@kernel.org Cc: op-tee@lists.trustedfirmware.org op-tee@lists.trustedfirmware.org, Ferreira Joao MTANA Joao.Ferreira@mt.com Subject: Re: rcu_preempt detected stalls on CPUs/tasks
Hi Thomas,
On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi,
I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
rcu_preempt detected stalls on CPUs/tasks
from Linux 6.6.90 (mainline)
Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can use timers, but then it's using the EL3 physical timer. So OP-TEE should stay off the EL1 physical timer. Sumit, what's your view?
Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
I'm surprised we haven't seen more of this issue.
Cheers, Jens
Regards
Thomas
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
* Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct * OP-TEE uses the secure timer (physical/virtual) -> this is correct * ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
Thomas
Hi Thomas,
On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
OP-TEE uses the secure timer (physical/virtual) -> this is correct
Thanks for confirming.
ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
This and Lars's findings clearly indicate that we shouldn't set NS_TIMER_SWITCH=1. I'll propose a patch.
Cheers, Jens
FYI, here's the patch https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/42078
Thanks, Jens
On Wed, Aug 20, 2025 at 3:50 PM Jens Wiklander jens.wiklander@linaro.org wrote:
Hi Thomas,
On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
OP-TEE uses the secure timer (physical/virtual) -> this is correct
Thanks for confirming.
ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
This and Lars's findings clearly indicate that we shouldn't set NS_TIMER_SWITCH=1. I'll propose a patch.
Cheers, Jens
On Wed, Aug 20, 2025 at 04:46:06PM +0200, Jens Wiklander wrote:
FYI, here's the patch https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/42078
Thanks Jens, but as we discussed on review of this patch, I have posted a more complete fix here [1] for OP-TEE ftrace to work along with removing context management of non-secure EL1 physical timer register.
[1] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/42085
-Sumit
On Wed, Aug 20, 2025 at 3:50 PM Jens Wiklander jens.wiklander@linaro.org wrote:
Hi Thomas,
On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
OP-TEE uses the secure timer (physical/virtual) -> this is correct
Thanks for confirming.
ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
This and Lars's findings clearly indicate that we shouldn't set NS_TIMER_SWITCH=1. I'll propose a patch.
Cheers, Jens
On 8/20/25 8:50 AM, Jens Wiklander wrote:
Hi Thomas,
On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
OP-TEE uses the secure timer (physical/virtual) -> this is correct
Thanks for confirming.
ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
This and Lars's findings clearly indicate that we shouldn't set NS_TIMER_SWITCH=1. I'll propose a patch.
Same conclusion we came to last year, we disable it for our(TI) platforms for the same reason[0], to prevent stalling the Linux during OP-TEE ops.
Andrew
[0] https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/25895
Cheers, Jens
On Wed, Aug 20, 2025 at 11:04:15AM -0500, Andrew Davis via OP-TEE wrote:
On 8/20/25 8:50 AM, Jens Wiklander wrote:
Hi Thomas,
On Wed, Aug 20, 2025 at 3:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi Jens,
I analyzed this a little bit further since last time I wrote. Here what my "believe" at the moment is
Linux uses the non secure timer in arch_timer (physical/virtual) -> this is correct
OP-TEE uses the secure timer (physical/virtual) -> this is correct
Thanks for confirming.
ARM Trusted Firmware by default enables NS_TIMER_SWITCH=1 with opteed, this IMHO unnecessarily stores/restores time registers, setting NS_TIMER_SWITCH=0 seems to solve the issue, my personal tests and also xtest did not show me any issue so far
All this with some uncertainty, I read through quite some code, but I could have missed a case, where something may go wrong I did not see.
Latencies I tested with cycletest. With NS_TIMER_SWITCH=1 this skyrockets (and explains all the other negative consequences) with NS_TIMER_SWITCH=0, everything is back to normal, even doing "heavy" operation like creating 4096 bit RSA keys with OP-TEE.
This and Lars's findings clearly indicate that we shouldn't set NS_TIMER_SWITCH=1. I'll propose a patch.
Same conclusion we came to last year, we disable it for our(TI) platforms for the same reason[0], to prevent stalling the Linux during OP-TEE ops.
I am unsure why folks choose to fix this problem in a platform specific manner (upstream or downstream) since it's a generic platform agnostic problem. Atleast I should be CCed on the problem report and fix proposed since I added that NS_TIMER_SWITCH=1 for OP-TEE in the first place. Also, this means nobody is able to enable ftrace on NXP and TI platforms until now.
-Sumit
On Wed, Aug 20, 2025 at 01:10:07PM +0200, Jens Wiklander wrote:
Hi Thomas,
On Mon, Aug 4, 2025 at 7:09 PM Stauffer Thomas MTANA via OP-TEE op-tee@lists.trustedfirmware.org wrote:
Hi,
I'm running OP-TEE 4.5 with PKCS11TA and ATF lts-v2.12.4 on an iMX8MP. When I create new rsa 4096 bit keypair with OP-TEE, I often get
rcu_preempt detected stalls on CPUs/tasks
from Linux 6.6.90 (mainline)
Also PID 0 is sometimes blocked for more than 30 seconds. When I create a RT task with even higher priority, this process is also blocked up to 2 seconds. For a test I disabled saving/restoring the NS timer register in ATF (arm-trusted-firmware/lib/el3_runtime/aarch64/context_mgmt.c), this seems to get completely rid of the problem. Neither creating nor signing leads to any issue anymore. This hack may lead to other problems I do not fully understand yet. I "believe" that at least since ARMv8, the CPU have their own timers for secure/non-secure world, but I would assume that ATF implements this correctly already.
I'm starting to suspect that we're setting NS_TIMER_SWITCH to 1 in services/spd/opteed/opteed.mk based on a misunderstanding. OP-TEE can use timers, but then it's using the EL3 physical timer. So OP-TEE should stay off the EL1 physical timer. Sumit, what's your view?
I had to research the history why I added it in the first place. It was basically added to save and restore cntkctl_el1 register which is needed for ftrace to work correctly. Have a look here [1]. So your current proposed patch will break ftrace.
However, as a side effect all the EL1 physical timer registers got saved and restored which is a problem as mentioned here. So the correct fix here would be to make NS_TIMER_SWITCH more granular to separate out the cntkctl_el1 register save and restore.
[1] https://github.com/OP-TEE/optee_os/commit/edaf8c38f534497a65a460f0348a81d3e2...
-Sumit
Maybe I'm completely wrong here (assuming that it cannot be I'm the first person having this issue on this platform). Hint in any direction would be helpful.
I'm surprised we haven't seen more of this issue.
Cheers, Jens
Regards
Thomas
op-tee@lists.trustedfirmware.org