On Thu, Jul 02, 2020 at 02:15:47PM +0000, Soby Mathew wrote:
So the fix as we currently understand would involve the following sequence
:
a. On Entry to EL3, save the incoming SCTLR_EL1.M and TCR_EL1.EPDx bits and set them (ensure TCR_EL1.EPDx =1 prior to SCTLR_EL1.M =1 using isb())
b. Prior to Exit from EL3, after the target context is restored, restore the SCTLR_EL1.M and TCR_EL1.EPDx bits.
The above sequence now means that any use of AT instruction targeted at
lower EL from EL3 that require PTW will fault. So prior to use of AT,
ensure the PTW are re-enabled and disabled back again after the AT
instructions.
If the above sequence is agreed upon to resolve the errata, then we can
work on a patch for the same. I suspect current el1 register save and
restore sequence in TF-A is a bit unwieldy and we may need to analyze all
the entry points to EL3 to ensure we cover everything.
Looks good to me, but there's still one niggle that I don't know how to
solve. If EL2 has been audited not to have any executable AT instructions,
it may not have a software workaround. However, if a secure interrupt
is taken from EL2 to EL3 while EL2 is the middle of a world switch, then
there is a small window where an AT instruction present at EL3 cold be
speculatively executed before you've had a chance to mess with SCTLR_EL1.
Fun! Maybe it's worth documenting this somewhere?
Will
On 7/2/20 7:15 AM, Soby Mathew wrote:
Thanks Andrew for raising the issue and Marc and Raghu for your inputs.
The effect of stage 2 when Stage 1 is disabled is something that was overlooked in the workaround implementation. After some discussions this is what we currently understand regarding the problem.
1. EL3 does not modify the EL2 stage 1 translations and hence any speculative execution of AT execution in EL3 and resulting PTW and TLB caching should not be a problem as we always return to the same context in EL2. TF-A in EL3 need not worry about this. Also, we expect that future CPUs having v8.4 extensions for S-EL2 will have already corrected this errata.
2. Whenever the EL1/0 Stage 1 translations are switched in EL3 as part of switching the worlds, it should take Stage 2 into account. Hence disabling the Stage 1 MMU is not effective, as Marc pointed out, because speculative PTW can still take place using Stage 2 identity mapping. After discussion with James, we also realized that flipping the SCR_EL3.NS bit could take Stage 2 out of context (Stage 2 is not valid in secure world) leaving Stage 1 pointing to invalid translations.
As discussed in previous emails, on Entry to EL3, if TCR_EL1.EPDx = 1 and SCTLR_EL3.M = 1, any speculative AT PTWs for EL1 context can be prevented (including Stage 2).
So the fix as we currently understand would involve the following sequence :
a. On Entry to EL3, save the incoming SCTLR_EL1.M and TCR_EL1.EPDx bits and set them (ensure TCR_EL1.EPDx =1 prior to SCTLR_EL1.M =1 using isb())
b. Prior to Exit from EL3, after the target context is restored, restore the SCTLR_EL1.M and TCR_EL1.EPDx bits.
The above sequence now means that any use of AT instruction targeted at lower EL from EL3 that require PTW will fault. So prior to use of AT, ensure the PTW are re-enabled and disabled back again after the AT instructions.
If the above sequence is agreed upon to resolve the errata, then we can work on a patch for the same. I suspect current el1 register save and restore sequence in TF-A is a bit unwieldy and we may need to analyze all the entry points to EL3 to ensure we cover everything.
Best Regards
Soby Mathew
-----Original Message-----
From: TF-A <tf-a-bounces@lists.trustedfirmware.org> On Behalf Of Andrew
Scull via TF-A
Sent: 02 July 2020 09:20
To: Raghu K <raghu.ncstate@icloud.com>
Cc: android-kvm@google.com; willdeacon@google.com; Marc Zyngier
<mzyngier@google.com>; tf-a@lists.trustedfirmware.org
Subject: Re: [TF-A] Erroneous speculative AT workaround
On Wed, Jul 01, 2020 at 05:47:00PM -0700, Raghu K wrote:
This is interesting. It appears that there is no way on entry to EL3
to guarantee that the out-of-context(el2 and el1) translation regimes
are in a consistent state and on every entry into EL3, we have to
conservatively assume that it is in an inconsistent state. This is
because of the situation Andrew mentioned(interrupts to EL3 can occur at any
time).
If this is the case, on EL3 entry:
1) For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx
= 0
This would still be racing against any potential speculative execution of an AT
instruction upon the switch to EL3, IIUC. The window would be much smaller
but not entirely eliminated.
For KVM, this would be enough as KVM will have already applied this
workaround (with Marc's corrections) whenever it is going to enter an
inconsistent state. However, other EL2 software may choose to handle the
errata differently, possibly going to the lengths of ensuring that no AT
instruction is ever mapped executable.
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a