Olivier, can you elaborate on the problem with “Presently a Group0 interrupt traps to SEL2 and is delegated to EL3 to the same sort of SPMD platform handler. I wonder if this is a possible case in your system and leading to the same problem?”. Isnt that exactly how the group0 SMC based handling designed?

From: Olivier Deprez <Olivier.Deprez@arm.com>
Date: Thursday, June 8, 2023 at 10:14 AM
To: Raghupathy Krishnamurthy <raghupathyk@nvidia.com>, Varun Wadekar <vwadekar@nvidia.com>, TF-A Mailing List <tf-a@lists.trustedfirmware.org>, Nicolas Benech <nbenech@nvidia.com>
Subject: Re: EHF and SPMD G0 interrupt handling issues

External email: Use caution opening links or attachments

Hi Varun and Raghu,

Thanks both for the detailed replies and investigations.

I appreciate Group0 interrupt handling - while SPMD/SEL2 SPMC are present - is a fresh proposal that hasn't been deployed yet so hitting the real world usage scenario. Moreover 2 worlds RAS scenarios in this same configuration is not designed nor tested in the reference software stack (I'm not aware of downstream design deployments). Those are partly reasons why we did not consider SPD=spmd EL3_EXCEPTION_HANDLING=1 so far. The pitfall is that this doesn't trigger a build error, but a runtime misbehaviour as you hinted. But I agree we need to do something to support RAS in general in coming weeks.

While we figure out the correct design, would following change help platforms?

https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/21406

This should match your suggestion of omitting the SPMD interrupt registration when EL3_EXCEPTION_HANDLING=1.

I believe that's acceptable in the current situation and shouldn't break our test cases AFAIU.

It remains a question, what is the expected behaviour for a Group0 interrupt occurring while the secure world runs? Presently a Group0 interrupt traps to SEL2 and is delegated to EL3 to the same sort of SPMD platform handler. I wonder if this is a possible case in your system and leading to the same problem?

Coming to the compatibility and deployment concerns. There is an inherent assumption that platforms will deploy TF-A and Hafnium v2.9 at the same time

Yes this is known, and comes down to the EL3/SEL2 interface stability. ABI additions over spec iterations are a challenge to support.

Testing mix and matched SPMD/SPMC versions isn't investigated too far, mostly because of differing partner deployment models. We agreed with our tech mgt matching SPMD and SPMC versions is the most common and easiest model to support by the reference software stack. This also aligns with the fact EL3+SEL2 are the same TCB and 'likely' to evolve at same time to follow bug fixes and feature additions.

In order to understand better how we could improve, can you tell a bit more about the typical scenario?

Is the SPMD likely to be ahead compared to the SPMC e.g. SPMD v2.x + SPMC v2.y / x >= y?

How much of a minor version difference would that be?

Regards,

Olivier.

From: Raghupathy Krishnamurthy <raghupathyk@nvidia.com>
Sent: 06 June 2023 21:26
To: Varun Wadekar <vwadekar@nvidia.com>; Olivier Deprez <Olivier.Deprez@arm.com>; TF-A Mailing List <tf-a@lists.trustedfirmware.org>; Nicolas Benech <nbenech@nvidia.com>
Subject: RE: EHF and SPMD G0 interrupt handling issues

I see the issue Varun has. https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/19897 introduced a change where SPMD unconditionally registers for INTR_TYPE_EL3. If we compile both EHF and SPMD_SPM_AT_EL2, we have an issue. This combination used to work with https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16047 because EHF could be enabled with SPMD, but we cannot with the latest change to support group0 interrupts.

Fundamentally, we need to disable the use of EHF (which I think Varun is saying is problematic because we use it). I had posted a comment here: https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/19897/comment/bd92bd80_7163eab8/ because of the below situation so that a platform could explicitly compile it out, when used with EHF.

Varun, if we remove/compile out https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/19897, would it fix all 3 problems? I think it does.

From: Varun Wadekar <vwadekar@nvidia.com>
Sent: Tuesday, June 6, 2023 11:24 AM
To: Raghupathy Krishnamurthy <raghupathyk@nvidia.com>; Olivier Deprez <Olivier.Deprez@arm.com>; TF-A Mailing List <tf-a@lists.trustedfirmware.org>; Nicolas Benech <nbenech@nvidia.com>
Subject: Re: EHF and SPMD G0 interrupt handling issues

Hi,

Thanks for the links. I agree that things look good on paper, but the ground reality does not match the plan, IMO.

For platforms that enable SPMD_AT_SEL2 and EHF, the INTR_TYPE_EL3 handler is registered twice - ehf.c and spmd_main.c. I covered this in (1) from my list.

The RAS library uses EHF and its functions are not accessible to the platform to call from the 'plat_spmd_handle_group0_interrupt' handler. This interrupt handling now creates an unwanted and longer chain from the interrupt handler to the actual RAS handler in the platform port. I covered this in (2) and (3) from my list. Platforms might have to recreate something similar to EHF within their platform ports if they are asked to remove support for EHF altogether.

Coming to the compatibility and deployment concerns. There is an inherent assumption that platforms will deploy TF-A and Hafnium v2.9 at the same time. This assumption has led to the design choices in the code where we used static macros instead of runtime mechanisms to detect the availability of the support. I am not a big fan of increasing dependencies between independent SW components as it creates unwanted work for platforms and increases TTM.

The long-term approach should be to ensure that SPMD and EHF work in all possible combinations. The short-term approach should be to fix this issue by either reverting the change or introducing a workaround.

-Varun

From: Raghupathy Krishnamurthy <raghupathyk@nvidia.com>
Sent: Tuesday, June 6, 2023 4:45 PM
To: Olivier Deprez <Olivier.Deprez@arm.com>; TF-A Mailing List <tf-a@lists.trustedfirmware.org>; Varun Wadekar <vwadekar@nvidia.com>; Nicolas Benech <nbenech@nvidia.com>
Subject: RE: EHF and SPMD G0 interrupt handling issues

Agree with Olivier. We should line up to FF-A spec recommendation.

Varun, if there are other issues caused by this happy to sync internally. +@Nicolas Benech for vis (Nico, FYI – this is on public mailing list)

-Raghu

From: Olivier Deprez <Olivier.Deprez@arm.com>
Sent: Tuesday, June 6, 2023 7:53 AM
To: TF-A Mailing List <tf-a@lists.trustedfirmware.org>; Varun Wadekar <vwadekar@nvidia.com>
Cc: Raghupathy Krishnamurthy <raghupathyk@nvidia.com>
Subject: Re: EHF and SPMD G0 interrupt handling issues

External email: Use caution opening links or attachments

Hi Varun,

for platforms with SPMD_SPM_AT_SEL2=1. These platforms already use EHF for servicing RAS interrupts today.

Can you please have a look at https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16047 ?

and https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16047/6/docs/getting_started/build-options.rst#464

The model, by the FF-A specification, is to permit G0 interrupts to trap to EL3 when NWd runs.

A G0 interrupt is routed to a SP through the SPMD/SPMC by the use of EL3-SP direct messages:

https://review.trustedfirmware.org/q/topic:%22el3_direct_msg%22+(status:open%20OR%20status:merged)

When SEL1/0 runs, G0 interrupts are first trapped to SEL2 and forwarded to EL3 by the FFA_EL3_INTR_HANDLE ABI.

I appreciate the legacy capability to let G0 interrupts trap to EL3 while SWd runs is not possible/recommended with this design.

This might indeed break earlier implementations; would it make sense aligning SW stacks to the latest of the FF-A spec recommendations?

I let Raghu chime in if need be.

Regards,

Olivier.

From: Varun Wadekar via TF-A <tf-a@lists.trustedfirmware.org>
Sent: 06 June 2023 13:12
To: TF-A Mailing List <tf-a@lists.trustedfirmware.org>
Subject: [TF-A] EHF and SPMD G0 interrupt handling issues

Hi,

We are in the process of upgrading the downstream TF-A to v2.9 for platforms with SPMD_SPM_AT_SEL2=1. These platforms already use EHF for servicing RAS interrupts today.

I noticed that v2.9 has added G0 interrupt handling support to the SPMD. But I think the SPMD support still needs some work as it does not play nicely with EHF.

I have found the following issues with the implementation.

SPMD and EHF both register handlers for G0 interrupts. But the interrupt management framework only allows one handler for INTR_TYPE_EL3.
The RAS framework still uses EHF and the SPMD interrupt handler breaks that functionality.
The SPMD handler calls into the platform which does not have any means to invoke the RAS interrupt handler.

IMO, we should make SPMD a client of the EHF instead of creating yet another way for interrupt handling. For now, I register SPMD's G0 interrupt handler only if EL3_EXCEPTION_HANDLING=0, as a workaround.

Thoughts?