This is interesting. It appears that there is no way on entry to EL3 to guarantee that the out-of-context(el2 and el1) translation regimes are in a consistent state and on every entry into EL3, we have to conservatively assume that it is in an inconsistent state. This is because of the situation Andrew mentioned(interrupts to EL3 can occur at any time).
If this is the case, on EL3 entry: 1) For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx = 0 2) Set whatever bits we need to for EL2 and S2 translations to not succeed(What are these?) 3) DSB, to ensure no speculative AT can be issued until completion of DSB, so any AT that occurs will not fill the TLB with bad translations.
On exit(right before ERET), we need to restore the registers saved on entry, and have the ERET followed by a DSB so that there can be no speculative execution of AT instructions.
Thanks Raghu
On 7/1/20 5:54 AM, Marc Zyngier via TF-A wrote:
Hi Manish,
On Wed, 1 Jul 2020 at 13:14, Manish Badarkhe Manish.Badarkhe@arm.com wrote:
Hi Andrew,
As per current implementation, in “el1_sysregs_context_restore” routine do below things:
1. TCR_EL1.EPD0 = 1 2. TCR_EL1.EPD1 = 1 3. SCTR_EL1.M = 0 4. Isb
Code snippet: mrs x9, tcr_el1 orr x9, x9, #TCR_EPD0_BIT orr x9, x9, #TCR_EPD1_BIT msr tcr_el1, x9 mrs x9, sctlr_el1 bic x9, x9, #SCTLR_M_BIT msr sctlr_el1, x9 isb This is to avoid PTW through while updating system registers at step 5
Unfortunately, this doesn't prevent anything.
If SCTLR_EL1.M is clear, TCR_EL1.EPDx don't mean much (S1 MMU is disabled, no S1 page table walk), and you can still have S2 PTWs (using an idmap for S1) and creating TLB corruption if these entry alias with any S1 mapping that exists at EL1.
Which is why KVM does *set* SCTLR_EL1.M, which prevents the use of a 1:1 mapping at S1, and at which point the TCR_EL1.EPDx bits are actually useful in preventing a PTW.
5. Restore all system registers for El1 except SCTLR_EL1 and TCR_EL1 6. isb() 7. restore SCTLR_EL1 and TCR_EL1
Code Snippet: ldr x9, [x0, #CTX_SCTLR_EL1] -> saved value from "el2_sysregs_context_save" msr sctlr_el1, x9 ldr x9, [x0, #CTX_TCR_EL1] msr tcr_el1, x9
As per above steps. SCTLR_EL1 get restored back with actual settings at step 7. Similar flow is present for “el2_sysregs_context_restore” to restore SCTLR_EL1 register.
In conclusion, this routine temporarily clear M bit of SCTLR_EL1 to avoid speculation but restored it back to its original setting while leaving back to its caller. Please let us know whether this align with KVM workaround for speculative AT erratum.
It doesn't, unfortunately. I believe this code actively creates problems on a system that is affected by speculative AT execution.
I don't understand your rationale for touching SCTLR_EL2.M either if you are not context-switching the EL2 S1 state: as far as I understand no affected cores have S-EL2, so no switch should happen at this stage.
Thanks,
M.
On Thu, 2 Jul 2020 at 01:49, Raghu K raghu.ncstate@icloud.com wrote:
This is interesting. It appears that there is no way on entry to EL3 to guarantee that the out-of-context(el2 and el1) translation regimes are in a consistent state and on every entry into EL3, we have to conservatively assume that it is in an inconsistent state. This is because of the situation Andrew mentioned(interrupts to EL3 can occur at any time).
If this is the case, on EL3 entry:
- For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx = 0
TCR_EL1.EPDx have to be set to *1* (you want to *disable* PTWs).
- Set whatever bits we need to for EL2 and S2 translations to not
succeed(What are these?)
Why would you *ever* touch these?
A S2 translation doesn't happen independently of a S1 translation. It is always the continuation of a S1 translation (and a disabled S1 MMU counts as a translation). There is no AT S2 instruction either, so EL3 has no purpose touching HCR_EL2 at all.
As for disabling EL2 S1 translation, what purpose does it serve? Affected cores do not have a secure EL2, so there is no reason for EL3 to touch SCTLR_EL2 either. Things would be different if you had S-EL2 and had to context-switch it.
- DSB, to ensure no speculative AT can be issued until completion of
DSB, so any AT that occurs will not fill the TLB with bad translations.
I really don't get what you're aiming at with this DSB, as I don't think AT is at all influenced by a DSB, at least not from an architectural perspective.
On exit(right before ERET), we need to restore the registers saved on entry, and have the ERET followed by a DSB so that there can be no speculative execution of AT instructions.
Again, I don't understand this DSB.
M.
On Wed, Jul 01, 2020 at 05:47:00PM -0700, Raghu K wrote:
This is interesting. It appears that there is no way on entry to EL3 to guarantee that the out-of-context(el2 and el1) translation regimes are in a consistent state and on every entry into EL3, we have to conservatively assume that it is in an inconsistent state. This is because of the situation Andrew mentioned(interrupts to EL3 can occur at any time).
If this is the case, on EL3 entry:
- For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx = 0
This would still be racing against any potential speculative execution of an AT instruction upon the switch to EL3, IIUC. The window would be much smaller but not entirely eliminated.
For KVM, this would be enough as KVM will have already applied this workaround (with Marc's corrections) whenever it is going to enter an inconsistent state. However, other EL2 software may choose to handle the errata differently, possibly going to the lengths of ensuring that no AT instruction is ever mapped executable.
tf-a@lists.trustedfirmware.org