Hi All,
In Arm, we are experimenting with running OP-TEE under Hafnium as the SPMC in S-EL2. We have been debugging this Stage 2 fault that OP-TEE runs into a during a test to share memory (xtest 1003). It seems this is due to a bug in Hafnium but want to be sure before posting a fix. Some thoughts below to this end. Apologies for the verbosity but I hope you will appreciate it is required.
The fault occurs when OP-TEE tries to access a memory region that was shared with it by the OP-TEE driver in Linux i.e. the driver has called FFA_MEM_SHARE to share the memory, OP-TEE has called FFA_MEM_RETRIEVE_REQ to map it in its S2 and Hf has called FFA_MEM_RETRIVE_RESP to describe the IPA range to OP-TEE. So, the S2 tables are created correctly before OP-TEE tries to use them.
The S2 fault is a L3 Translation fault. The L3 descriptor in S2 tables is NULL when the fault occurs. So this makes sense. This also implies that the translation is not cached in the TLBs.
The key thing is that the fault only occurs when cache state modelling is turned on in the FVP_Base_RevC-2xAEMv8A model we are using for development. The fault occurs both when the S2 tables are created and accessed on the same PE as well as different PEs. It does not matter whether the PEs are in the same or different clusters. The fault occurs both with and without a Hypervisor (Hf) in the Normal world. So presence of Hf in EL2 is not a factor.
We noticed that Hf marks its internal memory as outer-shareable. See [1] and [2]. It uses inner-shareable for S2 PTWs though. See [3]. This is a mismatch of memory attributes as per Page 2563 in ARM DDI 0487F.b. The start of the text is quoted below.
"The rules about mismatched attributes given in Mismatched memory attributes…”
And indeed, the fault is not seen if we mark Hf’s internal memory as inner shareable to match the PTWs. The DSBs after creating the S2 tables in [4] are for inner-shareable access types. It seems that the inner-shareable PTW is unable to observe the outer shareable page table write. Using the inner shareable attributes for the internal memory makes the write observable.
Alternatively, if we change shareability of PTWs in VTCR_EL2 to outer shareable then the fault is no longer observed. It is not clear how the PTWs and page table writes are synchronised in this case without a DSB OSH. This is not a violation of the architecture afaiu.
It seems that it would be worth aligning these attributes.
The next bit is why Hf uses the outer shareable attribute for internal memory in the first place. The recommendation seems to be to use inner-shareable. See [5] and [6].
So we are wondering if this should be fixed too. Please let me know if we have misunderstood anything so far. Happy to post a patch if not or provide more information.
Cheers, Achin
[1] https://hafnium.googlesource.com/hafnium/+/refs/heads/master/src/arch/aarch6... [2] https://hafnium.googlesource.com/hafnium/+/refs/heads/master/src/mm.c#1043 [3] https://hafnium.googlesource.com/hafnium/+/refs/heads/master/src/arch/aarch6... [4] https://hafnium.googlesource.com/hafnium/+/refs/heads/master/src/arch/aarch6... [5] "Shareable Normal memory” in Pg. 154 in ARM DDI 0487F.b [6] https://linux-arm-kernel.infradead.narkive.com/RZHvk1cT/question-how-can-we-...