Hi Roman,

Sorry for the really long delay.

After spending more time with the issue, I now think that the fix you proposed should be the best way to solve this. I created patch in gerrit based on the diff you sent. Please feel free to review: https://review.trustedfirmware.org/c/TF-M/trusted-firmware-m/+/27396

Best Regards,
Mate

From: Mate Toth-Pal <Mate.Toth-Pal@arm.com>
Sent: 23 February 2024 16:04
To: Bohdan.Hunko@infineon.com <Bohdan.Hunko@infineon.com>; tf-m@lists.trustedfirmware.org <tf-m@lists.trustedfirmware.org>; Roman.Mazurak@infineon.com <Roman.Mazurak@infineon.com>; Mate Toth-Pal <Mate.Toth-Pal@arm.com>
Cc: nd <nd@arm.com>
Subject: Re: Scheduling bug
 
Hi Roman,

I finally managed to reproduce the issue you reported. I can confirm that the fix you provided indeed fixes this issue.

However I believe the root cause of this issue is the way how the privileged SPM Thread mode code context is not managed the same way compared to the SP's contexts. This same thing seems to be the root cause of the other issue you reported on the misconfiguration of the CONTROL.nPRIV bit when Thread mode SPM execution is interrupted.

I plan to come up with a more general fix next week.

Thank you,
Mate


From: Mate Toth-Pal via TF-M <tf-m@lists.trustedfirmware.org>
Sent: 08 February 2024 16:36
To: Bohdan.Hunko@infineon.com <Bohdan.Hunko@infineon.com>; tf-m@lists.trustedfirmware.org <tf-m@lists.trustedfirmware.org>; Roman.Mazurak@infineon.com <Roman.Mazurak@infineon.com>
Cc: nd <nd@arm.com>
Subject: [TF-M] Re: Scheduling bug
 
Hi Roman,

Thank you for reporting this issue.

We are currently looking at reproducing it on our side. I expect to come back to you next week about it.

Best Regards,
Mate


From: Roman.Mazurak--- via TF-M <tf-m@lists.trustedfirmware.org>
Sent: 02 February 2024 16:21
To: Bohdan.Hunko@infineon.com <Bohdan.Hunko@infineon.com>; Ken Liu <Ken.Liu@arm.com>; tf-m@lists.trustedfirmware.org <tf-m@lists.trustedfirmware.org>
Cc: nd <nd@arm.com>
Subject: [TF-M] Re: Scheduling bug
 

Hi Ken,

 

I tested with 2.0.0 release. There are my conclusions.

 

tfm_arch_set_context_ret_code set result (for psa_wait, …) by using SP stored in sp member of context_ctrl_t structure. It means that before calling tfm_arch_set_context_ret_code TF-M should update SP for thread (partition).

 

thrd_next update result by calling tfm_arch_set_context_ret_code, if query_state_cb (pointing to query_state implemented by backend_ipc.c) returns THRD_STATE_RET_VAL_AVAIL. For all partition except active (interrupted by PendSV) context (values in context_ctrl_t) is correct. But context is invalid for active partition, because it is updated by PendSV after exit from ipc_schedule and thrd_next.

 

As result call for psa_wait would fail if some secure IRQ will update signals_asserted after tfm_spm_partition_psa_wait and before PendSV will call thrd_next. SLIH interrupt can assert such signal (call to backend_assert_signal) in spm_handle_interrupt.

 

I made a simple fix:

diff --git a/secure_fw/spm/core/backend_ipc.c b/secure_fw/spm/core/backend_ipc.c

index 3a968a3..c8b0742 100644

--- a/secure_fw/spm/core/backend_ipc.c

+++ b/secure_fw/spm/core/backend_ipc.c

@@ -454,9 +454,14 @@

     /* Protect concurrent access to current thread/component and thread status */

     CRITICAL_SECTION_ENTER(cs);

 

-    pth_next = thrd_next();

     p_curr_ctx = (struct context_ctrl_t *)(CURRENT_THREAD->p_context_ctrl);

 

+    /* Update SP for current thread, so tfm_arch_set_context_ret_code can use up to date

+     * value of stack context to return value via R0.*/

+    p_curr_ctx->sp = __get_PSP() - sizeof(struct tfm_additional_context_t);

+

+    pth_next = thrd_next();

+

     AAPCS_DUAL_U32_SET(ctx_ctrls, (uint32_t)p_curr_ctx, (uint32_t)p_curr_ctx);

 

     p_part_curr = GET_CURRENT_COMPONENT();

 

Regards,

Roman.

 

From: Bohdan.Hunko--- via TF-M <tf-m@lists.trustedfirmware.org>
Sent: Wednesday, November 8, 2023 22:22
To: Ken.Liu@arm.com; tf-m@lists.trustedfirmware.org
Cc: nd@arm.com
Subject: [TF-M] Re: Scheduling bug

 

Caution: This e-mail originated outside Infineon Technologies. Do not click on links or open attachments unless you validate it is safe.

 

Hi Ken,

 

I was using 1.8.0 Release

 

Looks like this issue was fixed in new design although we would like to test it and confirm this when we migrate to newest release.

 

We will get back on this to you.

 

Regards,

Bohdan Hunko

 

Cypress Semiconductor Ukraine

Engineer

CSUKR CSS ICW SW FW

Mobile: +38099 50 19 714
Bohdan.Hunko@infineon.com

 

 

From: Ken Liu via TF-M <tf-m@lists.trustedfirmware.org>
Sent: Tuesday, November 7, 2023 03:58
To: tf-m@lists.trustedfirmware.org
Cc: nd <nd@arm.com>
Subject: [TF-M] Re: Scheduling bug

 

Caution: This e-mail originated outside Infineon Technologies. Do not click on links or open attachments unless you validate it is safe.

 

Hi Bohdan,

 

Can you share us the version you are working on?

 

Looks like the patch is based on a previous version of TFM, and this issue is resolved in the latest code base.

 

The reason to do such abstraction is just because of such similar issues – the logic was coupled tightly because of synching the partition status, thread status and context update which caused maintenance confusion and effort. And we refactored this part to decouple the logic:

 

SPM: Add STATUS_NEED_SCHEDULE to manage scheduler (21054) · Gerrit Code Review (trustedfirmware.org)

 

Hope this helps, thanks.

 

/Ken

 

From: Bohdan.Hunko--- via TF-M <tf-m@lists.trustedfirmware.org>
Sent: Monday, November 6, 2023 10:01 PM
To: tf-m@lists.trustedfirmware.org
Subject: [TF-M] Scheduling bug

 

Hi

 

There seem to be scheduling bug we have found in SPM.

This bug is related to handling of interrupts that arrives during SVC call and assert signed for partition.

 

Steps to reproduce:

 

  1. Call psa_wait() from partition (e.g. mailbox partition)
  2. During execution of SVC handler generate Interrupt that asserts signal of that partition (e.g. mailbox partition signal ) (adding long delay in SVC handler or adding breakpoint in SVC handler helps to easier reproduce this )
  3. Following sequence happens:
    1. Mailbox IRQ has lower priority than SVC thus SVC is not preempted.
    2. SVC sees that mailbox partition is blocked (as it is waiting for signal and no signals are pending)
    3. SVC triggers pendSV

                                                               i.      Mailbox IRQ and pendSV are both pending

    1. Mailbox IRQ has higher priority than pendSV thus Mailbox IRQ is executed
    2. Mailbox IRQ calls  spm_handle_interrupt

                                                               i.      Signal is asserted thus spm_handle_interrupt in thrd_next calls query_state_cb which returns THRD_STATE_RET_VAL_AVAIL and thus tfm_arch_set_context_ret_code is called

                                                             ii.      tfm_arch_set_context_ret_code sets return code using OLD value of partition PSP (as it was never updated, as it is updated in PendSV)

    1. Mailbox IRQ return, pendsv is started and it runs mailbox partition

                                                               i.      Mailbox partition has 0 as signal because return value was written to wrong location is stack

 

Patch I have attached to the mail solves this problem for us BUT it seems more like a workaround than a proper fix(

 

Anyways it would be nice to have this problem review by SPM experts and have proper fix (maybe we have other places with same problem…)

 

Regards,

Bohdan Hunko

 

Cypress Semiconductor Ukraine

Engineer

CSUKR CSS ICW SW FW

Mobile: +38099 50 19 714
Bohdan.Hunko@infineon.com