New subject: Query SPD/SPMD behavior with EHF

7 Oct 2020

      Hi Sandeep,
A few comments inline from a SW architecture/FF-A perspective.
...
On 5 Oct 2020, at 12:28, Sandeep Tripathy via TF-A tf-a@lists.trustedfirmware.org wrote:
Hi Olivier,
Appreciate the details. I have a different perception of G0
interrupts and their relevance to RAS/ critical events.
Comments in line.
Thanks
Sandeep
On Fri, Oct 2, 2020 at 6:47 PM Olivier Deprez Olivier.Deprez@arm.com wrote:
...
Hi Sandeep,
Here are a few more details.
The reasoning differs when considering pre-Armv8.4 platforms (1) vs Armv8.4 platforms onwards with secure virtualization enabled (2).
Case (1):
EHF framework unifies EL3 exceptions delivered via different vectors and allows them to be handled in a common way. It is also allowing exception delegation handling to lower secure ELs. This framework although primarily used for RAS, is also used for SDEI and platform EL3 interrupts. EL3's role in this case is about trapping and routing the event to appropriate the component (when the interrupt/exception is not handled solely at EL3).
The interoperability between EHF and a Trusted OS is not accurately defined apart from this guidance in EHF documentation:
"In order for S-EL1 software to handle Non-secure interrupts while having EHF enabled, the dispatcher must adopt a model where Non-secure interrupts are received at EL3, but are then synchronously handled over to S-EL1."
Until then for the specific RAS handling scenario, this was delegated to a StandaloneMM partition running at S-EL0 (through the SPM-MM implementation) and not necessarily delegated to a TOS.
Reliability is provided by the feature of G0 interrupt that it can not
be masked by lower ELs.  Such interrupt being handled at EL3 or being
delegated to other components does not impact the
reliable feature of G0 interrupt. Sure its handling must be offloaded
to other components to keep EL3 firmware light. But If it were just
about handling an interrupt then it could have been entirely handled
in each state without even requiring an EL3 interrupt type.
Reliability in RAS is a different concept. RAS error interrupts do not provide reliability. They report unreliable operation.
Routing RAS interrupts to EL3 is an implementation choice called Firmware First Handling (FFH). Indeed, the interrupts could be routed to a lower EL which is called Kernel first handling (KFH).
For e.g. an implementation could decide to handle corrected errors Kernel first. Uncorrected errors could be routed to a platform controller instead of firmware or be routed to both. There is no single solution.
With FFH, the main requirement is that an uncorrected error must be handled even if the Normal world is not in a position to do so. There are non-technical requirements too but lets not go there. So I don’t think there is a requirement that "no lower EL" should be able to mask the interrupt.
EL3/S-EL1 and EL3/S-EL2 are at the same privilege level as far as access to the physical address space is concerned. G0 interrupts could be routed to EL3 but they can be disabled by S-EL1 or S-EL2 by programming the GIC distributor.
The main point being that software in all privileged exception levels in the Secure world must be trusted to handle RAS errors in the Normal world. Routing G0 interrupts to EL3 is not a silver bullet.
When support for FFH was added to TF-A, there was no use case to put software in S-EL1. This EL is owned by TF-A which deploys a simple shim layer. The EHF was developed with this assumption in mind.
If your requirement is to put a Trusted OS in S-EL1 and continue doing RAS error handling, then the requirements of the Trusted OS w.r.t the interrupt routing model must be factored in. Hence, the question about what exactly are your requirements.
I can understand the desire to reuse EHF but it cannot come at the cost of not meeting the TOS requirements. It needs a SW architecture discussion first. It might be possible to preempt S-EL1 and route RAS errors to EL3 in some cases. A cooperative model (2) between S-EL1 and EL3 (as Olivier described) is what most Trusted OSs implement today. It would be good to understand why that would not work for RAS.
...
...
In order to better help you, we would need more information on the scenario you intend to achieve, and the environment (Arm architecture version and extensions, GIC version).
Or maybe your question was out of curiosity for the longer term approach (2) as described below?
As per sbsa level III spec: sbsa non secure watchdog WS1 (reset) must
be targeted at EL3.  The patch in review ref:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/5495
And we would want a watchdog interrupt to preempt all execution
context. I would expect the same with any RAS or SDEI critical prio
events.
Thanks for pointing this out!
SBSA applies to the Server segment. It was reasonable to assume that Secure firmware almost entirely resides in EL3. Hence the guidance. We will look at rewording this in a future release. The intent is that since it is a Non-secure Watchdog, the WS1 signal must not be masked by the Normal world.
The BSA applies to all segments. It leaves routing of WS1 implementation defined as long as the Normal world cannot mask it. It could be routed to S-EL1 or S-EL2 if that fulfils the requirement.
...
Another misc application of our platform is to be able to forcefully
turn off/ halt/ just ping any core at any execution context (S/NS).
These motivated me to leverage EHF. But the idea of dropping EHF in
future designs makes me think now !
Our current system is pre Armv8.4. We will stick to case (1). Case
(2) ie SPMD was just my quoricity.  However, I felt PSA-FFA may
replace TOS specific SPDs someday.
Making SPMD relevant in this discussion even with pre 8.4 systems.
Because at least the TOS will have to follow one policy.
The SPMD is indeed meant to replace TOS specific SPDs. It is meant to cater for the RAS use case as well. From an FF-A perspective, a cooperative approach is simpler. I would like to understand why this would not work for RAS error interrupts as well. Reuse of EHF is an implementation level discussion and I don’t think that is off the table even with (2).
...
...
Case (2):
As a general rule, it is preferred that EL3 reduces its footprint and minimises platform specific handling code.
Agreed.  Applies to case(1) aswell and heavy lifting to be delegated
to lower ELs in either security states. My concern is on 'Taking' the
interrupt handling (mjust)can be delegated.
It would certainly be desirable to reuse the EHF. However, it is not possible to delegate the heavy lifting to preempted software in S-EL1 or S-EL2 without significantly increasing their complexity. This is not the current direction of travel of FF-A.
...
...
EHF framework would most probably not be enabled at all.
The priority logic provided by the GIC PMR register to mask NS interrupts cannot really work as before because all of trusted EL3/S-EL2 and untrusted S-EL1 SPs can manipulate this register.
This is a limitation. This can be taken care of by cooperative
software design. ie. S_EL2/S_EL1 will not set PMR out of its range.
And the platform defines what's EL3 priority range.
GIC_LOWEST_EL3_PRI.
This falls under the solution space. It would be good to understand what is it you want to run on S-EL1/S-EL2 first.
...
...
Any secure/non-secure interrupt triggered while running SEL1/SEL0 is trapped first by the S-EL2 firmware (or the so-called SPMC). This translates into SCR_EL3.FIQ/IRQ=0 in the secure world.
Group1NS interrupts are redirected to SPMD for routing to NWd.
A Group0 interrupt is possibly redirected to a platform driver into an S-EL1 secure partition (e.g. a RAS handling service).
Hence it does no longer hold true that Group0 interrupts are necessarily qualified as "EL3 interrupts".
It is still possible to redirect Group0 interrupts from S-EL2 to EL3 and be handled there, but as said, this is a less preferred approach.
Either way when NWd runs (with SCR_EL3.FIQ=1/IRQ=0), a Group1S/Group0 secure interrupt is trapped at EL3 and routed to SPMD then SPMC.
The SPMC can take the decision to resume the secure partition which registered the corresponding secure INTID.
This design does mean that SDEI interrupt handling would need SPMC and BL31 collaboration and this is something we are working on.
I understood this scheme. But it means RAS interrupts and other
critical events will always have blackout periods even with proper
software design.
RAS interrupts will have blackout periods even if a SMC is handled entirely in EL3. How is routing them to S-EL1 or S-EL2 any different?
Afaiu, the RAS architecture spec does not lay down any time limits on by when an error must be reported. All RAS errors are not critical errors. Even critical errors e.g. uncontainable errors report something that has already happened. With unrecoverable errors, ESBs ensure that the problem is contained to a particular EL or Security state.
Could you elaborate on what timing requirements you have and why a cooperative model would cause problems?
...
Whereas with the other routing model scheme the reliability of EHF
handlers can be retained with the constraint of  PMR ranges. There may
be something
I am missing.
I don’t think “reliability” is an argument here. It is about reusing the EHF in EL3. It is not off the table but we cannot overlook other evolutions in the software and hardware architecture since the EHF was written.
Let me know what you think.
Cheers,
Achin
...
...
Hope this helps.
Regards,
Olivier.

From: TF-A tf-a-bounces@lists.trustedfirmware.org on behalf of Olivier Deprez via TF-A tf-a@lists.trustedfirmware.org
Sent: 28 September 2020 14:01
To: Sandeep Tripathy; Soby Mathew
Cc: tf-a@lists.trustedfirmware.org; nd
Subject: Re: [TF-A] Query SPD/SPMD behavior with EHF
Hi Sandeep,
Your question is very valid and we're discussing options internally.
We will come back to you with a consolidated answer shortly.
Regards,
Olivier.

From: Sandeep Tripathy
Sent: Monday, September 28, 2020 05:28
To: Soby Mathew
Cc: Dan Handley; tf-a@lists.trustedfirmware.org; nd; Olivier Deprez
Subject: Re: [TF-A] Query SPD/SPMD behavior with EHF
Thanks Soby and Dan for confirmation on TSPD. I can see a few more gaps
in the related area.
"The EL3 interrupts (G0 interrupts) should be able to pre-empt Fast
SMC i.e. any execution context for that matter ".
This should apply to all SPDs including SPMD. However I learned from
@Oliver that SPMD/SPMC design traps FIQs to S_EL2.
In that case a RAS interrupt can be masked by S_EL2 software (eg:
Hafnium). Probably by design it will be ensured that S_EL2 will never
mask the physical FIQ ?
S_EL2 FIQ handler will exit to EL3/SPMD by SMC call. And depending on
the pending interrupt type either it can exit to NWd OR invoke el3 fiq
vector handler synchronously ?
Are there limitations if we trap fiq to EL3 instead ?
Thanks
Sandeep
On Fri, Sep 18, 2020 at 6:26 PM Soby Mathew Soby.Mathew@arm.com wrote:
...
Hi Sandeep
...
Except during yielding SMC  ‘disable_intr_rm_local(INTR_TYPE_NS, SECUE);’  is in effect.  Intention is to avoid NS interrupt preempt secure execution (Fast SMC).
But I think that will also disable G0 interrupt as both NS interrupt and G0 interrupt are on FIQ.
EHF already ensures this by GIC PMR adjustment. So disabling routing model seems unnecessary in this case.
This is my understanding from the code please confirm if this is correct.
The EL3 interrupts (G0 interrupts) should be able to pre-empt Fast SMC. Hence the usage of GIC PMR to mask the NS interrupts. As Dan says, the TSP_NS_INTR_ASYNC_PREEMPT predates the EHF design and it seems there is a problem as you describe.
...
EHF already ensures this by GIC PMR adjustment. So disabling routing model seems unnecessary in this case.
This is my understanding from the code please confirm if this is correct.
You are right. Routing model manipulation is not required when EL3 interrupts are present as GIC PMR manipulation should take care of the required behaviour for yielding vs atomic SMC. You also need to ensure it works as expected when EL3 interrupts are not enabled and when EHF is disabled.
Best Regards
Soby Mathew
...
-----Original Message-----
From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Sandeep
Tripathy via TF-A
Sent: 17 September 2020 16:53
To: Dan Handley Dan.Handley@arm.com
Cc: tf-a@lists.trustedfirmware.org
Subject: Re: [TF-A] Query TSPD behavior with EHF
Hi Dan,
 I am not sure if this is mentioned anywhere in any documents but I think
EHF handlers should be able to preempt all execution contexts at lower ELs
and lower ELs should never be able to mask such interrupts.
If the behavioral expectation is set the implementation can be fixed.
Thanks
Sandeep
On Thu, Sep 17, 2020 at 7:57 PM Dan Handley via TF-A <tf-
a@lists.trustedfirmware.org> wrote:
...
A correction...
...
-----Original Message-----
From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Dan
Handley via TF-A
Sent: 17 September 2020 15:14
>> 
>> I want to handle something similar in OP-TEED along with EHF
>> depending on
> what is the expected behavior.
>> 
Hmm, I thought OP-TEED was more like the
TSP_NS_INTR_ASYNC_PREEMPT=0
...
...
case, where NS interrupts are routed to S-EL1 while processing a
yielding SMC in S- EL1? Perhaps that's a better TSPD config for you to
follow?
...
...
Sorry, if EL3_EXCEPTION_HANDLING=1 then obviously NS interrupts are
routed to EL3 first, but the TSPD re-enables NS interrupts before handing
over to the TSP to handle yielding calls, via a call to
ehf_allow_ns_preemption.
...
Right, that is the case for yielding SMC handling where both NS interrupts
and EL3/G0 interrupts can preempt the S_EL1/S_EL2 context.
But I would expect the same routing model even for 'Fast SMC' unlike what is
happening in TSPD.
...
Dan.
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- 
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a