Thanks Andrew for raising the issue and Marc and Raghu for your inputs. The effect of stage 2 when Stage 1 is disabled is something that was overlooked in the workaround implementation. After some discussions this is what we currently understand regarding the problem.
1. EL3 does not modify the EL2 stage 1 translations and hence any speculative execution of AT execution in EL3 and resulting PTW and TLB caching should not be a problem as we always return to the same context in EL2. TF-A in EL3 need not worry about this. Also, we expect that future CPUs having v8.4 extensions for S-EL2 will have already corrected this errata.
2. Whenever the EL1/0 Stage 1 translations are switched in EL3 as part of switching the worlds, it should take Stage 2 into account. Hence disabling the Stage 1 MMU is not effective, as Marc pointed out, because speculative PTW can still take place using Stage 2 identity mapping. After discussion with James, we also realized that flipping the SCR_EL3.NS bit could take Stage 2 out of context (Stage 2 is not valid in secure world) leaving Stage 1 pointing to invalid translations.
As discussed in previous emails, on Entry to EL3, if TCR_EL1.EPDx = 1 and SCTLR_EL3.M = 1, any speculative AT PTWs for EL1 context can be prevented (including Stage 2).
So the fix as we currently understand would involve the following sequence :
a. On Entry to EL3, save the incoming SCTLR_EL1.M and TCR_EL1.EPDx bits and set them (ensure TCR_EL1.EPDx =1 prior to SCTLR_EL1.M =1 using isb()) b. Prior to Exit from EL3, after the target context is restored, restore the SCTLR_EL1.M and TCR_EL1.EPDx bits.
The above sequence now means that any use of AT instruction targeted at lower EL from EL3 that require PTW will fault. So prior to use of AT, ensure the PTW are re-enabled and disabled back again after the AT instructions.
If the above sequence is agreed upon to resolve the errata, then we can work on a patch for the same. I suspect current el1 register save and restore sequence in TF-A is a bit unwieldy and we may need to analyze all the entry points to EL3 to ensure we cover everything.
Best Regards Soby Mathew
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Andrew Scull via TF-A Sent: 02 July 2020 09:20 To: Raghu K raghu.ncstate@icloud.com Cc: android-kvm@google.com; willdeacon@google.com; Marc Zyngier mzyngier@google.com; tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Erroneous speculative AT workaround
On Wed, Jul 01, 2020 at 05:47:00PM -0700, Raghu K wrote:
This is interesting. It appears that there is no way on entry to EL3 to guarantee that the out-of-context(el2 and el1) translation regimes are in a consistent state and on every entry into EL3, we have to conservatively assume that it is in an inconsistent state. This is because of the situation Andrew mentioned(interrupts to EL3 can occur at any
time).
If this is the case, on EL3 entry:
- For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx
= 0
This would still be racing against any potential speculative execution of an AT instruction upon the switch to EL3, IIUC. The window would be much smaller but not entirely eliminated.
For KVM, this would be enough as KVM will have already applied this workaround (with Marc's corrections) whenever it is going to enter an inconsistent state. However, other EL2 software may choose to handle the errata differently, possibly going to the lengths of ensuring that no AT instruction is ever mapped executable. -- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a