Hi Nicola,
You are right — the critical section in the top-level function does correctly protect the lock mechanism.
I did some additional debugging and here’s what I’m observing:
This looks like the scenario we discussed in other mailing threads (related to the CONFIG_TFM_SCHEDULE_WHEN_NS_INTERRUPTED option).
However, I’m wondering if I’m missing something. In the idle partition code, I see the following comment and call:
/*
* There could be other Partitions becoming RUNNABLE after wake up.
* This is a dummy psa_wait to let SPM check possible scheduling.
* It does not expect any signals.
*/
if (psa_wait(PSA_WAIT_ANY,
PSA_POLL) ==
0) {
My understanding from the comment is that this “dummy” psa_wait() is intended to trigger scheduling if another partition became runnable
while we were idle.
But in tfm_spm_partition_psa_wait(), the polling path appears to just return the currently asserted signals for the calling partition:
/*
* After new signal(s) are available, the return value will be updated in
* PendSV and blocked thread gets to run.
*/
if (timeout
== PSA_BLOCK) {
signal =
backend_wait_signals(partition,
signal_mask);
if (signal
== (psa_signal_t)0)
{
signal = (psa_signal_t)STATUS_NEED_SCHEDULE;
}
}
else {
signal =
partition->signals_asserted
& signal_mask;
}
Since the idle partition won’t have any asserted signals, this poll call will always return 0 and never request scheduling — even if another
partition has become runnable in the meantime.
This seems to explain why, after returning to the idle partition and executing these poll operations, the runnable partition is still
not being scheduled.
Am I interpreting this correctly? If so, what would be the recommended solution (e.g. does adjusting the idle “dummy wait” behavior, or
something else)?
Best regards,
Bohdan Hunko
Cypress Semiconductor Ukraine LLC
Senior Engineer
CSS ICW SW INT BFS SFW
Mobile: +380995019714
Bohdan.Hunko@infineon.com
From: Nicola Mazzucato <Nicola.Mazzucato@arm.com>
Sent: Monday, 22 December 2025 16:59
To: Hunko Bohdan (CSS ICW SW INT BFS SFW) <Bohdan.Hunko@infineon.com>
Cc: Kozemchuk Ivan (CSS ICW SW INT BFS SFW) <Ivan.Kozemchuk@infineon.com>; Kytsun Hennadiy (CSS ICW SW INT BFS SFW) <Hennadiy.Kytsun@infineon.com>; tf-m@lists.trustedfirmware.org
Subject: Re: Race condition in SPM scheduler lock logic
|
Caution: This e-mail originated outside Infineon
Technologies. Please be cautious when sharing information or opening attachments especially from unknown senders. Refer to our intranet
guide to help you identify Phishing email. |
Thank you Bohdan,
I am still a bit confused about the setup, because that section in SPM always executes in privileged execution. If the calling partition is not privileged, then the SVC
handler will take place to elevate execution.
Thanks
Best regards,
Nick
From:
Bohdan.Hunko@infineon.com <Bohdan.Hunko@infineon.com>
Sent: 19 December 2025 10:31
To: Nicola Mazzucato <Nicola.Mazzucato@arm.com>
Cc: Ivan.Kozemchuk@infineon.com <Ivan.Kozemchuk@infineon.com>;
Hennadiy.Kytsun@infineon.com <Hennadiy.Kytsun@infineon.com>;
tf-m@lists.trustedfirmware.org <tf-m@lists.trustedfirmware.org>
Subject: RE: Race condition in SPM scheduler lock logic
Hi Nicola,
We don’t do anything special, the IRQ priority is Normal, nothing unusual.
Looking into the code one thing that comes to mind is that tfm_arch_thread_fn_call can be called from unprivileged partition thus interrupt masking will not take effect.
I believe this explains the behavior described in previous mail.
If so then not only this code is effected, but other multithread issues may occur in different places of tfm_arch_thread_fn_call.
Bohdan Hunko
Cypress Semiconductor Ukraine LLC
Senior Engineer
CSS ICW SW INT BFS SFW
Mobile: +380995019714
Bohdan.Hunko@infineon.com
From: Nicola Mazzucato <Nicola.Mazzucato@arm.com>
Sent: Friday, 19 December 2025 11:59
To: Hunko Bohdan (CSS ICW SW INT BFS SFW) <Bohdan.Hunko@infineon.com>
Cc: Kozemchuk Ivan (CSS ICW SW INT BFS SFW) <Ivan.Kozemchuk@infineon.com>; Kytsun Hennadiy (CSS ICW SW INT BFS SFW) <Hennadiy.Kytsun@infineon.com>; Anton
Komlev via TF-M <tf-m@lists.trustedfirmware.org>
Subject: Re: Race condition in SPM scheduler lock logic
|
Caution: This e-mail originated outside Infineon
Technologies. Please be cautious when sharing information or opening attachments especially from unknown senders. Refer to our intranet
guide to help you identify Phishing email. |
Hi Bohdan,
The sequence you provided seems reasonable, however "backend_abi_leaving_spm" and the subsequent "arch_release_sched_lock" execute with all interrupts disabled, so there
are no interrupts that should change the scheduler_lock in between [1].
A pending interrupt would execute as soon as L:91, and then would correctly set the PendSV.
Can you please share a bit more about your interrupt configurations, priorities etc?
Am I missing something else?
Thanks
Best regards,
Nick
[1]
From: Nicola Mazzucato via TF-M <tf-m@lists.trustedfirmware.org>
Sent: 17 December 2025 08:37
To: tf-m@lists.trustedfirmware.org <tf-m@lists.trustedfirmware.org>;
Bohdan.Hunko@infineon.com <Bohdan.Hunko@infineon.com>
Cc: Ivan.Kozemchuk@infineon.com <Ivan.Kozemchuk@infineon.com>;
Hennadiy.Kytsun@infineon.com <Hennadiy.Kytsun@infineon.com>
Subject: [TF-M] Re: Race condition in SPM scheduler lock logic
Thanks Bohdan for reporting this.
Let me have a look and try to reproduce it.
Best regards,
Nick
From: Bohdan.Hunko--- via TF-M <tf-m@lists.trustedfirmware.org>
Sent: 16 December 2025 20:54
To: tf-m@lists.trustedfirmware.org <tf-m@lists.trustedfirmware.org>
Cc: Ivan.Kozemchuk@infineon.com <Ivan.Kozemchuk@infineon.com>;
Hennadiy.Kytsun@infineon.com <Hennadiy.Kytsun@infineon.com>
Subject: [TF-M] Race condition in SPM scheduler lock logic
Hi all,
I have found a bug in SPM scheduler lock logic – this bug is extremely hard to reproduce as it requires precise conditions and timings, but here is the description
of the bug scenario:
i.
"ldr r1, =scheduler_lock \n"
"ldr r0, [r1, #0] \n"
ii.
At this point r0 holds
scheduler_lock is = SCHEDULER_LOCKED
iii.
After these instructions are executed FLIH interrupt arrives
iv.
Execution continues, now
scheduler_lock is = SCHEDULER_ATTEMPTED
But the next line of code in arch_release_sched_lock is
"movs r2, #"M2S(SCHEDULER_UNLOCKED)" \n"/* Unlock scheduler */
This effectively overwrites scheduler_lock from SCHEDULER_ATTEMPTED to
SCHEDULER_UNLOCKED
This means that following SRM scheduling logic will not trigger PendSV and just return to idle_partition – effectively resulting in a hang of a system.
Looks like the solution is to wrap lock logic in critical section. But may be there is other things that can be done to better fix this issue.
Let me know if there are other details that may be helpful to fix this bug.
Bohdan Hunko
Cypress Semiconductor Ukraine LLC
Senior Engineer
CSS ICW SW INT BFS SFW
Mobile: +380995019714
Bohdan.Hunko@infineon.com