Re: [TF-A] Deadlock in SGI RAS handling - TF-A

14 Jul 2020


      Hi Raghu,
The Exception Handling framework (EHF) was designed to provide prioritization between the EL3 interrupts and EA although EA will always have highest priority since it cannot be blocked. In the case you describe, the driver in EL3 which is handling the RAS errors would need to ensure serialization of the events delivered to the S-EL0 payload somehow. This could be either via holding the event in a queue in EL3 till EL0 is done with processing the first event or the EL0 payload is capable of re-entry and can manage a queue internally.  In case of MM, I suppose re-entry is not an option and hence a holding queue in EL3 driver needs to be implemented.
The current implementation in sgi_ras.c doesn't do this currently as this was a PoC to showcase the RAS flow.
Best Regards
Soby Mathew
...
-----Original Message-----
From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu K
via TF-A
Sent: 13 July 2020 22:28
To: tf-a@lists.trustedfirmware.org
Subject: [TF-A] Deadlock in SGI RAS handling
Hi All,
I was going through some code in sgi_ras.c and was wondering if the
situation mentioned below could cause a deadlock or if i'm missing
something. It seems like it is possible to deadlock if we enter MM in S-
EL0(say through an MM_COMMUNICATE SMC or perhaps an initial RAS
interrupt) followed by a SYNC EA or ASYNC EA on the same core. sgi_ras.c
seems like it registers the same handler for both interrupts and aborts.
While interrupts can be blocked/masked, SYNC EA's cannot be blocked(not
that i know of), and i don't see SErrors being blocked on the path to the EA
handler and entry to MM. If this situation does occur, it seems like we could
deadlock when the EA attempts to enter MM again in the interrupt handler.
Is there something that would prevent this situation from happening?
Thanks
Raghu
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a