Hi Chris,

 

Dug for a while, should be caused by the scheduler locking mechanism applied recently.

 

After moving SPM into thread mode, the PSA API is not permitted to be nested. Hence a mechanism is applied when PSA API is under calling, it is similar to boosting the thread priority to the highest (to make it possible for an interrupt preemption) - but we did it at the very beginning of PendSV handler since we think PendSV is only used for scheduling purposes, but missed that RPC message is also processed here. With this mechanism, if the RPC message interrupt preempts an ongoing PSA API, the PendSV is skipped since we think the current thread has the highest priority there is no need for scheduling. hence the incoming message is delayed/missed.

 

Level 2/3 is SVC based PSA API, the scheduler lock is not applied because PSA API runs in the handler mode, there is no chance for the scheduling.

 

The solution is to move the checking on this locking flag into 'do_schedule' instead of the beginning of the PendSV.

 

But if we can move the RPC logic into a partition and makes PendSV only serves scheduler, the solution now we have is neat and efficient.

 

The patch is under creation, will add you as the reviewer when it is created.

 

BR.

 

/Ken

 

From: TF-M <tf-m-bounces@lists.trustedfirmware.org> On Behalf Of chris.brand--- via TF-M
Sent: Tuesday, November 9, 2021 8:54 AM
To: tf-m@lists.trustedfirmware.org
Subject: [TF-M] Another PSoC problem

 

Build command is:

cmake -S . -B output -G"Unix Makefiles" -DTFM_PLATFORM=cypress/psoc64 -DTFM_TOOLCHAIN_FILE=toolchain_GNUARM.cmake -DTEST_NS_MULTI_CORE=ON -DTFM_ISOLATION_LEVEL=1

 

The result is a hang at this point:

> Executing 'MULTI_CLIENT_CALL_HEAVY_TEST'

  Description: 'Multiple outstanding NS PSA client calls heavyweight test'

Totally 5 threads for test start

Each thread run 0x20 rounds tests

 

Some experimentation shows that:

It happens with both gcc and armclang (unable to test IAR).

It doesn’t always happen, but does seem to hang more often than it succeeds.

It doesn’t happen with TFM_ISOLATION_LEVEL=2.

 

It looks like this test was passing consistently before 5e68b11764673ee32bae0de8ecf3cde45cc55ea1, so I guess this is another scheduling issue. There’s not a lot of code that differs with TFM_LVL, so I wonder if there’s a race condition that is always present but just doesn’t happen to get hit at TFM_LVL=2…?

 

BTW, being able to build just the one test is extremely useful!

 

Chris Brand

 

Cypress Semiconductor (Canada), Inc.

Sr Prin Software Engr

CSCA CSS ICW SW PSW 1

Office: +1 778 234 0515

Chris.Brand@infineon.com