Hi All,
The next TF-A Tech Forum is scheduled for Thu 21st May 2020 17:00 - 18:00 (BST). A reoccurring meeting invite has been sent out to the subscribers of this TF-A mailing list. If you don’t have this please let me know.
Agenda:
* Firmware Configuration Framework (FConf) Update - Presented by Manish Badarkhe and Madhukar Pappireddy
* Discussion on how we have leveraged FConf framework in making statically configured components of TF-A to be dynamic.
A number of such components have already been identified which potentially use FConf framework. Patches for such
components are either merged or in-review
* Design discussion on CoT descriptors movement to device tree using FConf framework.
* Optional TF-A Mailing List Topic Discussions
Thanks
Joanna
Hi Florian,
I can give some general answers to your questions and hopefully get to
answering your real questions with further discussion. Please feel free
to ask more questions if this does not help.
>> I noticed that TF-A is designed to load FIP Image but the U-Boot
Environment have a different format. How to access the QSPI NOR memory?
NXP specific driver?
[RK] TF-A generally provides frameworks to do most things and does not
provide device specific drivers but for SPI NOR, there are drivers under
drivers/mtd/nor that could potentially be used. You will likely need to
implement the appropriate platform hooks for the SPI bus itself to use
it. The TF-A IO framework has the io_mtd layer that can be hooked into a
nor device. drivers/st/spi/stm32_qspi.c seems to use all of this and
should be a good example(perhaps some one from ST can chime in too). if
you hook you SPI driver to the TF-A layers correctly, there should be
nothing preventing you from accessing the uboot environment partition.
Also, TF-A's primary/default format for firmware images is FIP but
definitely does not preclude a platform from using it's own format.
>>The bl2 is designed to load only one FIP image, is it possible to add
an additional entry ?
[RK] Not really. BL2 is fairly generic and you should be able to hook
any image format to it. If you must use the FIP format, you can map your
images to existing image id's to create a FIP with any image that you
have. Once again, FIP is fairly flexible and generally treats firmware
images in the package as blob's.
>>if we want to use a secure boot in the future
[RK] Like with the above things, the Trusted Board Boot framework is
flexible and you should be able to implement secure boot on FIP and
other formats. TF-A provides an implementation of the TBBR specification
for secure boot.
It should be possible to implement the algorithm below, if you have an
appropriate platform port, from what i gather in your question below.
Thanks
-Raghu
On 5/13/20 5:40 AM, florian.manoel--- via TF-A wrote:
>
> Hello TF-A mail list,
>
> I’m new here, so I quickly introduce myself.
> I am Florian Manoel, working as firmware developer at Siemens in
> Karlsruhe, Germany. Recently, we decided to start some development
> based on ARM processor, the theme TF-A is new for us.
>
> Currently, we have a custom board equipped with the processor NXP
> Layerscape LS1043a. So far, everything is working as planned, the
> PreBootLoader (TF-A) boots the bootloader (U-Boot) that’s boots linux
> on top.
> However, we want to have the possibility to boot an alternative u-boot
> FIP image, I explain:
> We use as boot source a QSPI NOR memory. In this memory are stored
> ‘bl2_qspi.pbl’, 2 times ‘fip.bin’, the u-boot environment and some
> micro-code.
> We want to select the u-boot image to be booted according to the value
> of a specific variable stored in the u-boot environment ‘u-boot-select’.
> The algo is, for my eye, relatively simple :
>
> Start
> - Check value of the u-boot variable ‘u-boot-select’
>
> - Check if the corresponding u-boot image is valid, if not select the
> alternative one
> - Boot selected u-boot image
> End
>
>
> We want this functionality in case an u-boot image is broken (ex:
> power OFF during U-Boot update).
>
> I already went through the source code and it rose more question than
> it answered. For example:
> I noticed that TF-A is designed to load FIP Image but the U-Boot
> Environment have a different format. How to access the QSPI NOR
> memory? NXP specific driver?
> The bl2 is designed to load only one FIP image, is it possible to add
> an additional entry ?
> On top of it come the question of the reliability and what will happen
> if we want to use a secure boot in the future..
>
> I am looking for advice and support on this specific topic.
> Thanks for your support,
>
> Mit freundlichen Grüßen
> Florian Manoël
>
> Siemens AG
> Digital Industries
> Process Automation
> Software House Khe
> DI PA CI R&D 2
> Östliche Rheinbrückenstr. 50
> 76187 Karlsruhe, Deutschland
> Tel.: +49 721 595-1433
> mailto:florian.manoel@siemens.com
> www.siemens.com/ingenuityforlife <https://siemens.com/ingenuityforlife>
>
> Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim
> Hagemann Snabe; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch,
> Klaus Helmrich, Cedrik Neike, Ralf P. Thomas; Sitz der Gesellschaft:
> Berlin und München, Deutschland; Registergericht: Berlin
> Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
>
>
Hi Raghu and Louis,
On 4/7/20 12:14 PM, Louis Mayencourt via TF-A wrote:
> I do agree with you: case 2 and 3 are similar (wrongly formed DTB) and
> should lead to the same behavior.
>
> A mandatory property miss or a hit with a structurally incorrect node
> means that the DTB doesn't follow the provided binding document. Such a
> DTB shouldn't be considered as valid and should trigger a build failure
> and/or a code panic.
That's what still confuses me... Agree on cases 2 and 3 triggering a
build failure if possible, but not a code panic. A code panic stays in a
release build. With what we've been discussing so far, it would seem
more appropriate to me to have debug assertions to catch cases 2 and 3.
These debug assertions can help catching structural problems in the DTB
during the development phase and can be eliminated for a production
build, leaving no checks whatsoever in the code.
This is the strategy we've been using so far in TF-A. For lots of
platform interfaces, the generic code includes debug assertions to check
the correct implementation of these interfaces by platform integrators.
For example, checking the range of their return values. I would say this
is deeply embedded into the threat model TF-A uses today. See
https://trustedfirmware-a.readthedocs.io/en/latest/process/coding-guideline…
On one hand, it makes sense to me. On the other hand, I take Raghu's
point that it would be unrealistic to assume that 100% of code has been
covered by tests. This is very hard to achieve in practice, especially
to cover all error cases ; thus, it seems utopian to assume that all
debug assertions have been exercised during development and can be
safely removed.
TF-A does provide a way to keep debug assertions in a release build
(using the ENABLE_ASSERTIONS build flag) if platform integrators judge
they would rather keep them but this is not the default behaviour.
Regards,
Sandrine
Hello TF-A mail list,
I'm new here, so I quickly introduce myself.
I am Florian Manoel, working as firmware developer at Siemens in Karlsruhe, Germany. Recently, we decided to start some development based on ARM processor, the theme TF-A is new for us.
Currently, we have a custom board equipped with the processor NXP Layerscape LS1043a. So far, everything is working as planned, the PreBootLoader (TF-A) boots the bootloader (U-Boot) that's boots linux on top.
However, we want to have the possibility to boot an alternative u-boot FIP image, I explain:
We use as boot source a QSPI NOR memory. In this memory are stored 'bl2_qspi.pbl', 2 times 'fip.bin', the u-boot environment and some micro-code.
We want to select the u-boot image to be booted according to the value of a specific variable stored in the u-boot environment 'u-boot-select'.
The algo is, for my eye, relatively simple :
Start
- Check value of the u-boot variable 'u-boot-select'
- Check if the corresponding u-boot image is valid, if not select the alternative one
- Boot selected u-boot image
End
We want this functionality in case an u-boot image is broken (ex: power OFF during U-Boot update).
I already went through the source code and it rose more question than it answered. For example:
I noticed that TF-A is designed to load FIP Image but the U-Boot Environment have a different format. How to access the QSPI NOR memory? NXP specific driver?
The bl2 is designed to load only one FIP image, is it possible to add an additional entry ?
On top of it come the question of the reliability and what will happen if we want to use a secure boot in the future..
I am looking for advice and support on this specific topic.
Thanks for your support,
Mit freundlichen Grüßen
Florian Manoël
Siemens AG
Digital Industries
Process Automation
Software House Khe
DI PA CI R&D 2
Östliche Rheinbrückenstr. 50
76187 Karlsruhe, Deutschland
Tel.: +49 721 595-1433
mailto:florian.manoel@siemens.com
www.siemens.com/ingenuityforlife<https://siemens.com/ingenuityforlife>
Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Jim Hagemann Snabe; Vorstand: Joe Kaeser, Vorsitzender; Roland Busch, Klaus Helmrich, Cedrik Neike, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
Hello all,
As the trustedfirmware.org project maintenance process [1] is now live,
it would be good if we start adopting it in our development flow for TF-A.
I would like to highlight the main changes that will have an impact on
our day-to-day work on the project.
1. Patch submitters to explicit choose their reviewers
------------------------------------------------------
All patches should now have dedicated reviewers. The patch submitter is
responsible for adding them in the reviewers field of their Gerrit review.
Each patch should have 2 types of reviewers:
- Code owners.
- Maintainers.
There needs to be 1 code owner per module modified by the patch as well
as 1 maintainer.
The maintainers and code owners are listed here:
https://trustedfirmware-a.readthedocs.io/en/latest/about/maintainers.html
Ideally, we would have at least 2 code owners per module so that they
can review each other's patches. Unfortunately we're not there yet
(especially for platform ports) so we need a work around for patches
submitted by the sole code owner of a module itself.
In this scenario, I would like to suggest we leave it to the patch
submitter to decide on a case by case basis whether they want to
nominate someone to do the detailed technical review or skip it
entirely. In any case, a maintainer will still need to review and
approve the patch.
If this proves not to be working well over time (either because it
creates unnecessary review bottlenecks or lowers the code quality too
much), we can revisit that in the future.
If you've got patches in review right now, may I please request you to
add reviewers accordingly? The sooner we start adopting this process the
better, as it will allow us to see how this works in practice and come
up with adjustments if need be.
2. Reviewers to provide feedback in a timely manner
---------------------------------------------------
If someone asks you to do a review, please try to do it in a timely
manner. There is no timeline guidelines set just yet for TF-A but I
think a good rule of thumb would be to aim to provide feedback in a
week's time. This does not mean that the review has to be completed in a
week (complex patches might need a lot of discussion and/or rounds of
review & rework), just that there's some progress and activity at least
once per week.
If for some reason, you know you won't be able to honour a review
request, please say so on Gerrit ASAP so that the patch submitter can
choose another reviewer.
3. What's next?
---------------
In the coming weeks or months, we'd like to:
- Extend the list of code owners.
- Extend the list of maintainers.
- Come up with TF-A specific contribution guidelines that complement the
tf.org process [1]. We already have some here [2] but would like to
expand them and possibly revisit some of them. Obviously, this will be a
community effort, much like the tf.org process was, and all TF-A
contributors will have a say in defining this so that we end up with
something that works for everyone (as much as possible).
Best regards,
Sandrine
[1]
https://developer.trustedfirmware.org/w/collaboration/project-maintenance-p…
[2]
https://trustedfirmware-a.readthedocs.io/en/latest/process/contributing.html
Hello,
While reviewing all the compiler flags used by TF-A, we couldn't find information for the following options in the GCC manual [1]
* --fatal-warnings (https://review.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/…)
* -fno-stack-protector (https://review.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/…)
Can someone please help me understand if these options are still valid?
Thanks.
[1] https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Option-Index.html#Option-Index
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
Hi all,
On 5/5/20 9:04 AM, Sandrine Bailleux via TF-A wrote:
> I've received very little feedback on version 2 of the proposal, which
> hints that we are reaching an agreement. Thus, I plan to finalize the
> proposal this week. This can then become part of our development flow
> for all trustedfirmware.org projects.
>
> Thanks again for all the inputs!
The project maintenance process is now live. The document has been moved
here (with a few minor edits to turn it from a proposal to an effective
process):
https://developer.trustedfirmware.org/w/collaboration/project-maintenance-p…
Thanks!
Regards,
Sandrine
Hi,
Please find the latest report on new defect(s) introduced to ARM-software/arm-trusted-firmware found with Coverity Scan.
1 new defect(s) introduced to ARM-software/arm-trusted-firmware found with Coverity Scan.
New defect(s) Reported-by: Coverity Scan
Showing 1 of 1 defect(s)
** CID 358027: Insecure data handling (TAINTED_SCALAR)
________________________________________________________________________________________________________
*** CID 358027: Insecure data handling (TAINTED_SCALAR)
/common/fdt_wrappers.c: 295 in fdt_get_reg_props_by_name()
289
290 index = fdt_stringlist_search(dtb, node, "reg-names", name);
291 if (index < 0) {
292 return index;
293 }
294
>>> CID 358027: Insecure data handling (TAINTED_SCALAR)
>>> Passing tainted variable "index" to a tainted sink.
295 return fdt_get_reg_props_by_index(dtb, node, index, base, size);
296 }
297
298 /*******************************************************************************
299 * This function gets the stdout path node.
300 * It reads the value indicated inside the device tree.
________________________________________________________________________________________________________
To view the defects in Coverity Scan visit, https://u2389337.ct.sendgrid.net/ls/click?upn=nJaKvJSIH-2FPAfmty-2BK5tYpPkl…
Hello Stuart, Alexei,
Chiming-in here on Ampere's behalf...
We analysed this proposal internally. And we see a number issues with this, some of which was already raised by Raghu in the previous threads.
Here is a summary of the main issues that we see.
* Only supporting mbedtls, and this is fixed config at compile time.
* We propose that there should be a variable for the algorithm to be used, which can be setup at initialization time.
* This solution relies on taking the hash directly from the digest as the measurement, instead of the computed hash. This is not safe, especially considering measured boot may use a different hash bank, so digest hash may not be correct/valid.
* Only measuring the BL2 image, per the ARM SBSG we must be measuring and logging *all* images/boot phases
* BL31
* BL32 (all secure partitions)
* BL33 (UEFI or any other non-secure boot loader)
* Once we ERET into BL33, the measure boot flow continues and is owned by that boot loader
* Only see support for PCR0, any/all unsigned config data must be logged to PCR1.
* Passing PCRs to non-secure software before logging is not compliant with TCG Static-Root-of-Trust Measurement (SRTM) requirements
* It was discussed before in separate conversations… especially in systems where you are talked about two different signing domains where BL33 is a different trust/signing domain.
* BL33 should only do hash-log-extend… there is no need for BL33 to be aware of the current PCR value (beyond what is provided in the boot event log).
* Based on comments on the mail thread, there seem to be bad assumptions/expectations around TPM accessibility from non-secure world.
* Expecting SPI/I2C TPMs to be directly accessed from non-secure world instead of abstracting hardware details via the TCG CRB interface (which has been already standardized as the defacto mechanism for ARM on past mobile, client, and server solutions).
* CRB will "just work" for Aptio/EDK2/Linux/Windows/Hyper-V/VMWare
* NOTE: This goes back to what is a “productizable” TPM solution. We want it to be turn-key solution for customers without having to support/develop proprietary drivers.
-Vivek/Harb
Hi All,
The next TF-A Tech Forum is scheduled for Thu 7th May 2020 17:00 - 18:00 (BST). A reoccurring meeting invite has been sent out to the subscribers of this TF-A mailing list. If you don’t have this please let me know.
This meeting will be chaired by Bipin Ravi.
Agenda:
* Overview of a new TF-A Build System based on Cmake by Javier Almansa Sobrino
* Optional TF-A Mailing List Topic Discussions
Thanks
Joanna
> -----Original Message-----
> From: TF-A <tf-a-bounces(a)lists.trustedfirmware.org> On Behalf Of Raghu
> Krishnamurthy via TF-A
> Sent: 30 April 2020 02:33
> To: Manish Badarkhe <Manish.Badarkhe(a)arm.com>; tf-
> a(a)lists.trustedfirmware.org
> Cc: nd <nd(a)arm.com>
> Subject: Re: [TF-A] Need input on Errata implementation
>
> Hi Manish,
>
> Really appreciate you for taking time to respond to my concerns/questions.
>
> What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic
> information like GET_SOC_INFORMATION. This is a simple SMC and there is no
> call to context save or context restore. During the SMC call, if there is a
> speculative AT instruction on a lower EL(say NS-EL2), there could be a bad
> cached translation. Do you not need to apply the errata in this situation ? If
> not, why?
>
> >>We can't simply apply this errata on reset and just leave the system.
>
> [RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that
> cpu_ops are only called during reset and power down.
> CPU_E_HANDLER_FUNC is called at runtime due EA's.
>
> >>We thought of taking different approach for this errata
> implementation >>where anybody disable this workaround using macro as
> this errata is >>applicable for most of the CPUs (by default enabled) and can't
> be >>placed in cpu_ops.
>
> [RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason
> the errata framework exists is to apply CPU specific erratas by checking for
> them dynamically. Different stepping's of the same CPU's may or may not have
> the errata and typically you check the MIDR to know if the errata applies or
> not. Linux does not apply the errata to all CPU's since "most" CPU's have the
> issue. They check for its existence at runtime and only then apply it. TF-A
> should not hold itself to a lower standard.
Hi Raghu
I guess this depends on what the errata workaround involves. Since this workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I think Manish has some ideas for this.
Best Regards
Soby Mathew
>
> -Raghu
>
> On 4/29/20 1:35 AM, Manish Badarkhe wrote:
> > Hi Raghu
> >
> > Just to add/correct one more thing from my previous emails that this errata
> workaround proposed is
> > applied to both normal and secure world switches to EL3.
> >
> > Thanks
> > Manish Badarkhe
> >
> > On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-
> bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org>
> wrote:
> >
> > Hi Raghu
> >
> > On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate(a)icloud.com> wrote:
> >
> > Hi Manish,
> >
> > Thanks.
> >
> > >> we don’t have any AT instances in minimum execution window after
> context switching from S-EL(1/2)
> > >> to EL3 and before updating TCR register.
> >
> > 1) What is the minimum execution window? Does that not change based
> on micro-architecture?
> > Not sure about exact minimum execution window. IMO, it really depend
> upon when "context_save" gets called after
> > entering into EL3 from S-EL1/2. It may changed upon micro-architecture.
> Need some experts comment here.
> >
> > 2) Do we know that the "execution window" is exactly the same for all
> the CPU's this errata applies to?
> > It may be but we should not worry on that if we don’t have any AT
> instruction execution in that window.
> >
> > Also, it appears we are only talking about switching from S-EL1/2 to EL3.
> The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. There
> also seems to be an assumption in the patch you submitted that this errata
> happens only during a so called context-switch. From my reading, the cortex-
> Ax errata notices don't limit the errata to occur only during "context-switches"
> in the "conditions" section and can occur while executing ANY code, although
> the work around section does muddy the waters a bit.
> >
> > In Linux, at NS-EL2 this workaround is already in place. Hence we just
> thought of considering cases from Secure EL side to put this workaround.
> > Yes, errata should not limit to particular conditional section but this
> particular errata is not straight-forward like another errata placed in the code
> currently. We can't simply apply this errata on reset and just leave the system.
> >
> > Back to problem, AT instruction speculative execution using out-of-
> context regime that results in page table walk and generate the incorrect
> > translation which are cached in TLB. To avoid this issue we thought of
> disabling PTW for that particular EL.
> > for e.g. If AT instruction execution for EL1 present in EL3 then we have to
> make sure speculative behaviour of this AT should not result in incorrect
> translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always
> without going back and forth to/from lower EL) then in that case
> > there is no need of this workaround.
> > Hence we thought to put this workaround over boundary context of
> context switches. When "context save" (close to EL3 entry) happened we
> meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and
> continue EL3 execution with PTW disabled ensuring we should not cache any
> incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close
> to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S-
> EL2) except TCR and then restore TCR.
> >
> > 3) Has there been any work done to actually reproduce this issue and
> also to see that this actually fixes the issue?
> > No this issue is hard to reproduce.
> >
> > 4) Has the CPU errata framework(cpu_ops etc.) been considered to
> possibly implement the errata? Sprinkling erratas through common framework
> code does not seem like a good idea.
> > We thought of taking different approach for this errata implementation
> where anybody disable this workaround using macro as this errata is
> applicable for most of the CPUs (by default enabled) and can't be placed in
> cpu_ops.
> >
> > Thanks
> > Raghu
> >
> > Thanks
> > Manish Badarkhe
> >
> > On 4/28/20 1:44 AM, Manish Badarkhe wrote:
> > > Hi Raghu
> > >
> > > Please see my replies inline.
> > >
> > > Regards
> > > Manish Badarkhe
> > >
> > > On 28/04/2020, 11:29, "Raghu Krishnamurthy"
> <raghu.ncstate(a)icloud.com> wrote:
> > >
> > > Hi Manish,
> > >
> > > Understood.
> > >
> > > >>Hence before entering in EL3, we ensured that PTW is disabled
> (at
> > > context save)
> > >
> > > The context save and restore functions are executed in EL3. So how
> are
> > > you disabling PTW before entering EL3 ?
> > >
> > > Yes, I put it wrongly. We thought "context_save/restore" is best place
> to disable PTW without much affecting the
> > > code because we don’t have any AT instances in minimum execution
> window after context switching from S-EL(1/2)
> > > to EL3 and before updating TCR register.
> > >
> > > -Raghu
> > >
> > > Thanks
> > > Manish Badarkhe
> > >
> > > On 4/27/20 10:53 PM, Manish Badarkhe wrote:
> > > > Hi Raghu
> > > >
> > > > This workaround is specifically need for speculative AT instruction
> behaviour in out of context regime.
> > > > That means executing AT instruction for lower ELs (S-EL1/S-EL2) in
> higher EL i.e. EL3.
> > > >
> > > > Behaviour of AT instruction is unaltered when it get executed in
> same regime (when AT instruction executed for same EL
> > > > where it is executing) and there is no possibility to execute AT
> instruction for higher EL in lower EL.
> > > >
> > > > Hence before entering in EL3, we ensured that PTW is disabled (at
> context save) and restore PTW back during
> > > > exit of EL3. (at context restore).
> > > >
> > > > Thanks
> > > > Manish Badarkhe
> > > >
> > > > On 28/04/2020, 01:23, "Raghu K" <raghu.ncstate(a)icloud.com>
> wrote:
> > > >
> > > > Hi Manish,
> > > >
> > > > >>Hence proposed solution will work as it is
> > > >
> > > > [RK]If you are sure go ahead. I'm not convinced, but that may
> be because
> > > > i don't understand the errata fully/correctly.
> > > >
> > > > >>This workaround is very specific during context switching
> > > >
> > > > [RK] Context switching has many meanings depending on the
> context(OS,
> > > > hypervisor, TF-A world switch etc). The errata document i saw
> does not
> > > > elaborate on this. Perhaps clarifying this will help understand
> why the
> > > > solution you proposed will work.
> > > >
> > > > The solution below in points 2 and 3 have the same problem on
> entry and
> > > > exit, mentioned in my first email. Before you call
> > > > el1_sysregs_context_save, an AT instruction could have
> speculatively
> > > > executed through speculation of branches that occur BEFORE
> you call this
> > > > function, when TCR still has the enable bit set. The fact that you
> don't
> > > > have an AT instruction in the context save routine or any
> routine for
> > > > that matter, does not guarantee that the hardware did not
> speculate
> > > > through some other means to reach an AT instruction. The
> same applies to
> > > > the context_restore routines. There is no guarantee right after
> you
> > > > finish the restore routing(and hence TCR has the enable bit set),
> that
> > > > the CPU cannot speculate to an AT instruction.
> > > > So i'm not clear how you can say for certain that there was no
> > > > speculative AT instruction with the proposal below.
> > > >
> > > > Thanks
> > > > Raghu
> > > >
> > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote:
> > > > > Hi All,
> > > > >
> > > > > Just update/correct details.
> > > > >
> > > > > Thanks
> > > > > Manish Badarkhe
> > > > >
> > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe
> via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-
> a(a)lists.trustedfirmware.org> wrote:
> > > > >
> > > > > Hi Raghu
> > > > >
> > > > > Please ignore my answer on question 2.
> > > > >
> > > > > With internal discussion came to below conclusion:
> > > > > 1. This workaround is very specific during context
> switching.
> > > > > 2 . If you check in context save routine
> (el1_sysregs_context_save or el2_sysregs_context_save),
> > > > > As per proposed solution, First step performed is to
> disable page table walk and we don’t have
> > > > > any AT instruction execution in context save routine.
> > > > > This ensures that there will be no possibility of
> speculative AT instruction execution without TCR update.
> > > > > 3. If you check in context restore routine
> (el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > > > As per proposed solution, first step performed is to
> disable page table walk and we don’t have any
> > > > > AT instruction execution in context restore routine.
> > > > > This ensures that there will be no possibility of
> speculative AT instruction execution without TCR update.
> > > > >
> > > > > Hence proposed solution will work as it is ensuring no
> caching of translations in TLB while speculative AT instruction execution.
> > > > >
> > > > > Thanks
> > > > > Manish Badarkhe
> > > > >
> > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish Badarkhe
> via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-
> a(a)lists.trustedfirmware.org> wrote:
> > > > >
> > > > > Hi Raghu
> > > > >
> > > > > Please see my answers inline
> > > > >
> > > > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-
> A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-
> a(a)lists.trustedfirmware.org> wrote:
> > > > >
> > > > > Hi Manish,
> > > > >
> > > > > Before I agree or disagree with the suggested fix, the
> following would
> > > > > be interesting to know/discuss. Please feel free to
> correct me if i've
> > > > > misunderstood something.
> > > > >
> > > > > 1) Are "speculative" AT instructions subject to TCR_ELx
> control bits for
> > > > > all the listed CPU's? I imagine the answer is yes but
> would be good to
> > > > > get confirmation. I could not find any evidence in the
> instruction
> > > > > description or psuedocode in the ARMv8 ARM. It is
> possible to play many
> > > > > tricks on speculative execution of instructions such as
> skipping checks
> > > > > and doing them only when the CPU knows the
> instruction will be
> > > > > committed. If this is the case, changing TCR_ELx bits
> may not work. The
> > > > > errata document is vague about how to fix it.
> > > > >
> > > > > The speculative AT instruction may behave as you
> mentioned. We need more
> > > > > opinion on this.
> > > > > Proposed fix I mentioned by referring linux workaround
> for the same errata.
> > > > > Linux workaround is available in mainline kernel as
> below:
> > > > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
> v5.7-rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > >
> > > > > 2) Assuming the answer to question 1 is yes, your
> proposal may not work
> > > > > as is. In the worst case, as soon as you enter EL3, the
> very first thing
> > > > > that may happen, before you ever operate/write to
> TCR_ELx, is a
> > > > > speculative AT instruction that caches a bad translation
> in the TLB's.
> > > > > The same thing can happen on the exit path. As soon as
> you restore the
> > > > > TCR_ELx register, the first thing that can happen is a
> speculative AT
> > > > > that caches a bad translation. However, the el3_exit
> path does have DSB
> > > > > before ERET, so we will not speculate to an AT
> instruction if there are
> > > > > no branches between the instruction that sets TCR_ELx
> and the ERET.
> > > > > Somewhere in between, it looks like we will need a
> TLBI NSH to be
> > > > > certain there are no bad translation cached. This
> obviously has a
> > > > > potential performance cost on the lower EL's. Every
> entry into EL3
> > > > > flushes the TLB for lower EL's.
> > > > >
> > > > > Yes, this seems to be valid case during entry and exit path.
> > > > > I am not quite sure in that case where we need to avoid
> PTW.
> > > > > Also "TLBI NSH" works but it may cause performance
> issue.
> > > > > Need some more opinion/thoughts on this.
> > > > >
> > > > > Just thinking, can sequence mentioned for context save
> does not ensure that
> > > > > PTW is disabled?
> > > > > Something as below as last step in ELx(1/2) context save
> (elaborated more):
> > > > > > ·Save TCR register with PTW enable (EPD=0). (Just to
> enable PTW during
> > > > > > restore context). Do not operate TCR_EL1x register
> here just save its value to restore.
> > > > > > This ensures that during entry in EL3 there will be no
> chance of PTW
> > > > > >. while executing AT instruction.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Raghu
> > > > >
> > > > > Thanks
> > > > > Manish Badarkhe
> > > > >
> > > > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote:
> > > > > >
> > > > > > Hi All
> > > > > >
> > > > > > We are trying to implement errata which is applicable
> for below CPUs:
> > > > > >
> > > > > > <CPUs> : <Errata No.>
> > > > > >
> > > > > > Cortex-A53: 1530924
> > > > > >
> > > > > > Cortex-A76: 1165522
> > > > > > Cortex-A72: 1319367
> > > > > > Cortex-A57: 1319537
> > > > > > Cortex-A55: 1530923
> > > > > >
> > > > > > *Errata Description:*
> > > > > >
> > > > > > A speculative Address Translation (AT) instruction
> translates using
> > > > > > registers that are associated with an out-of-context
> translation
> > > > > > regime and caches the resulting translation in the TLB.
> A subsequent
> > > > > > translation request that is generated when the out-
> of-context
> > > > > > translation regime is current uses the previous cached
> TLB entry
> > > > > > producing an incorrect virtual to physical mapping.
> > > > > >
> > > > > > *Probable solution is to implement below fix in
> context.S file:*
> > > > > >
> > > > > > *During ELx (1 or 2) context save:*
> > > > > >
> > > > > > ·Operate TCR_ELx(1/2) to disable page table walk by
> operating EPD bits
> > > > > >
> > > > > > oThis will avoid any page table walk for S-EL1 or S-EL2.
> This will
> > > > > > help in avoiding caching of translations in TLB
> > > > > >
> > > > > > for S-EL1/S-EL2 in EL3.
> > > > > >
> > > > > > ·Save all system registers (which is already available)
> except TCR
> > > > > >
> > > > > > ·Clear EPD bits of TCR and then save. (Just to enable
> PTW during
> > > > > > restore context).
> > > > > >
> > > > > > *During ELx (1 or 2) context restore:*
> > > > > >
> > > > > > * Operate TCR_ELx(1/2) to disable page table walk
> by operating EPD bits
> > > > > > * Restore all system registers (which are saved
> during context save)
> > > > > > except TCR register.
> > > > > > * Restore TCR_ELx(1/2) register (which enable back
> PTW).
> > > > > >
> > > > > > With above we ensured that there will be no page
> table walk for S-EL1
> > > > > > and S-EL2 in EL3.
> > > > > >
> > > > > > is this proper other way to fix this problem? Need
> some suggestion/use
> > > > > > cases where and all we need this workaround in TF-A
> code.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Manish Badarkhe
> > > > > >
> > > > > > IMPORTANT NOTICE: The contents of this email and
> any attachments are
> > > > > > confidential and may also be privileged. If you are not
> the intended
> > > > > > recipient, please notify the sender immediately and
> do not disclose
> > > > > > the contents to any other person, use it for any
> purpose, or store or
> > > > > > copy the information in any medium. Thank you.
> > > > > >
> > > > >
> > > > > --
> > > > > TF-A mailing list
> > > > > TF-A(a)lists.trustedfirmware.org
> > > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > >
> > > > > IMPORTANT NOTICE: The contents of this email and any
> attachments are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > > > > --
> > > > > TF-A mailing list
> > > > > TF-A(a)lists.trustedfirmware.org
> > > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > >
> > > > > IMPORTANT NOTICE: The contents of this email and any
> attachments are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > > > > --
> > > > > TF-A mailing list
> > > > > TF-A(a)lists.trustedfirmware.org
> > > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > >
> > > > > IMPORTANT NOTICE: The contents of this email and any
> attachments are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > > >
> > > >
> > > > IMPORTANT NOTICE: The contents of this email and any
> attachments are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> > > >
> > >
> > > IMPORTANT NOTICE: The contents of this email and any attachments
> are confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> >
> >
> > --
> > TF-A mailing list
> > TF-A(a)lists.trustedfirmware.org
> > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> >
> --
> TF-A mailing list
> TF-A(a)lists.trustedfirmware.org
> https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Hi all,
I've received very little feedback on version 2 of the proposal, which
hints that we are reaching an agreement. Thus, I plan to finalize the
proposal this week. This can then become part of our development flow
for all trustedfirmware.org projects.
Thanks again for all the inputs!
Regards,
Sandrine Bailleux
Hi Francois,
On Mon, Apr 20, 2020 at 11:45:02AM +0000, François Ozog via TF-A wrote:
> Hi,
>
> I am trying to identify a mechanism to enforce a form of two-way
> isolation between BL33 runtime services in OS, for instance:
> - a pair of 2MB areas that could be RO by one entity and RW by the other
> - an execute only BL33 2MB area?
Stupid Q! Are you referring to isolation between EFI runtime services and the
OS?
It is not clear what you mean by BL33 runtime services?
cheers,
Achin
>
> This is similar to hypervisor except it only deals with memory, no
> vCPU, no GIC virtualization...
>
> Could EL3 or EL2 install protective mappings ? BL33 could ask either
> EL2 hypervisor or SecureMonitor to actually install them.
>
> Cordially,
>
> FF
> --
> TF-A mailing list
> TF-A(a)lists.trustedfirmware.org
> https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Raghu
Just to add/correct one more thing from my previous emails that this errata workaround proposed is
applied to both normal and secure world switches to EL3.
Thanks
Manish Badarkhe
On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Raghu
On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate(a)icloud.com> wrote:
Hi Manish,
Thanks.
>> we don’t have any AT instances in minimum execution window after context switching from S-EL(1/2)
>> to EL3 and before updating TCR register.
1) What is the minimum execution window? Does that not change based on micro-architecture?
Not sure about exact minimum execution window. IMO, it really depend upon when "context_save" gets called after
entering into EL3 from S-EL1/2. It may changed upon micro-architecture. Need some experts comment here.
2) Do we know that the "execution window" is exactly the same for all the CPU's this errata applies to?
It may be but we should not worry on that if we don’t have any AT instruction execution in that window.
Also, it appears we are only talking about switching from S-EL1/2 to EL3. The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex-Ax errata notices don't limit the errata to occur only during "context-switches" in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit.
In Linux, at NS-EL2 this workaround is already in place. Hence we just thought of considering cases from Secure EL side to put this workaround.
Yes, errata should not limit to particular conditional section but this particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just leave the system.
Back to problem, AT instruction speculative execution using out-of-context regime that results in page table walk and generate the incorrect
translation which are cached in TLB. To avoid this issue we thought of disabling PTW for that particular EL.
for e.g. If AT instruction execution for EL1 present in EL3 then we have to make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case
there is no need of this workaround.
Hence we thought to put this workaround over boundary context of context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S-EL2) except TCR and then restore TCR.
3) Has there been any work done to actually reproduce this issue and also to see that this actually fixes the issue?
No this issue is hard to reproduce.
4) Has the CPU errata framework(cpu_ops etc.) been considered to possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea.
We thought of taking different approach for this errata implementation where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops.
Thanks
Raghu
Thanks
Manish Badarkhe
On 4/28/20 1:44 AM, Manish Badarkhe wrote:
> Hi Raghu
>
> Please see my replies inline.
>
> Regards
> Manish Badarkhe
>
> On 28/04/2020, 11:29, "Raghu Krishnamurthy" <raghu.ncstate(a)icloud.com> wrote:
>
> Hi Manish,
>
> Understood.
>
> >>Hence before entering in EL3, we ensured that PTW is disabled (at
> context save)
>
> The context save and restore functions are executed in EL3. So how are
> you disabling PTW before entering EL3 ?
>
> Yes, I put it wrongly. We thought "context_save/restore" is best place to disable PTW without much affecting the
> code because we don’t have any AT instances in minimum execution window after context switching from S-EL(1/2)
> to EL3 and before updating TCR register.
>
> -Raghu
>
> Thanks
> Manish Badarkhe
>
> On 4/27/20 10:53 PM, Manish Badarkhe wrote:
> > Hi Raghu
> >
> > This workaround is specifically need for speculative AT instruction behaviour in out of context regime.
> > That means executing AT instruction for lower ELs (S-EL1/S-EL2) in higher EL i.e. EL3.
> >
> > Behaviour of AT instruction is unaltered when it get executed in same regime (when AT instruction executed for same EL
> > where it is executing) and there is no possibility to execute AT instruction for higher EL in lower EL.
> >
> > Hence before entering in EL3, we ensured that PTW is disabled (at context save) and restore PTW back during
> > exit of EL3. (at context restore).
> >
> > Thanks
> > Manish Badarkhe
> >
> > On 28/04/2020, 01:23, "Raghu K" <raghu.ncstate(a)icloud.com> wrote:
> >
> > Hi Manish,
> >
> > >>Hence proposed solution will work as it is
> >
> > [RK]If you are sure go ahead. I'm not convinced, but that may be because
> > i don't understand the errata fully/correctly.
> >
> > >>This workaround is very specific during context switching
> >
> > [RK] Context switching has many meanings depending on the context(OS,
> > hypervisor, TF-A world switch etc). The errata document i saw does not
> > elaborate on this. Perhaps clarifying this will help understand why the
> > solution you proposed will work.
> >
> > The solution below in points 2 and 3 have the same problem on entry and
> > exit, mentioned in my first email. Before you call
> > el1_sysregs_context_save, an AT instruction could have speculatively
> > executed through speculation of branches that occur BEFORE you call this
> > function, when TCR still has the enable bit set. The fact that you don't
> > have an AT instruction in the context save routine or any routine for
> > that matter, does not guarantee that the hardware did not speculate
> > through some other means to reach an AT instruction. The same applies to
> > the context_restore routines. There is no guarantee right after you
> > finish the restore routing(and hence TCR has the enable bit set), that
> > the CPU cannot speculate to an AT instruction.
> > So i'm not clear how you can say for certain that there was no
> > speculative AT instruction with the proposal below.
> >
> > Thanks
> > Raghu
> >
> > On 4/27/20 10:08 AM, Manish Badarkhe wrote:
> > > Hi All,
> > >
> > > Just update/correct details.
> > >
> > > Thanks
> > > Manish Badarkhe
> > >
> > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
> > >
> > > Hi Raghu
> > >
> > > Please ignore my answer on question 2.
> > >
> > > With internal discussion came to below conclusion:
> > > 1. This workaround is very specific during context switching.
> > > 2 . If you check in context save routine (el1_sysregs_context_save or el2_sysregs_context_save),
> > > As per proposed solution, First step performed is to disable page table walk and we don’t have
> > > any AT instruction execution in context save routine.
> > > This ensures that there will be no possibility of speculative AT instruction execution without TCR update.
> > > 3. If you check in context restore routine (el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > As per proposed solution, first step performed is to disable page table walk and we don’t have any
> > > AT instruction execution in context restore routine.
> > > This ensures that there will be no possibility of speculative AT instruction execution without TCR update.
> > >
> > > Hence proposed solution will work as it is ensuring no caching of translations in TLB while speculative AT instruction execution.
> > >
> > > Thanks
> > > Manish Badarkhe
> > >
> > > On 27/04/2020, 13:38, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
> > >
> > > Hi Raghu
> > >
> > > Please see my answers inline
> > >
> > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
> > >
> > > Hi Manish,
> > >
> > > Before I agree or disagree with the suggested fix, the following would
> > > be interesting to know/discuss. Please feel free to correct me if i've
> > > misunderstood something.
> > >
> > > 1) Are "speculative" AT instructions subject to TCR_ELx control bits for
> > > all the listed CPU's? I imagine the answer is yes but would be good to
> > > get confirmation. I could not find any evidence in the instruction
> > > description or psuedocode in the ARMv8 ARM. It is possible to play many
> > > tricks on speculative execution of instructions such as skipping checks
> > > and doing them only when the CPU knows the instruction will be
> > > committed. If this is the case, changing TCR_ELx bits may not work. The
> > > errata document is vague about how to fix it.
> > >
> > > The speculative AT instruction may behave as you mentioned. We need more
> > > opinion on this.
> > > Proposed fix I mentioned by referring linux workaround for the same errata.
> > > Linux workaround is available in mainline kernel as below:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
> > >
> > > 2) Assuming the answer to question 1 is yes, your proposal may not work
> > > as is. In the worst case, as soon as you enter EL3, the very first thing
> > > that may happen, before you ever operate/write to TCR_ELx, is a
> > > speculative AT instruction that caches a bad translation in the TLB's.
> > > The same thing can happen on the exit path. As soon as you restore the
> > > TCR_ELx register, the first thing that can happen is a speculative AT
> > > that caches a bad translation. However, the el3_exit path does have DSB
> > > before ERET, so we will not speculate to an AT instruction if there are
> > > no branches between the instruction that sets TCR_ELx and the ERET.
> > > Somewhere in between, it looks like we will need a TLBI NSH to be
> > > certain there are no bad translation cached. This obviously has a
> > > potential performance cost on the lower EL's. Every entry into EL3
> > > flushes the TLB for lower EL's.
> > >
> > > Yes, this seems to be valid case during entry and exit path.
> > > I am not quite sure in that case where we need to avoid PTW.
> > > Also "TLBI NSH" works but it may cause performance issue.
> > > Need some more opinion/thoughts on this.
> > >
> > > Just thinking, can sequence mentioned for context save does not ensure that
> > > PTW is disabled?
> > > Something as below as last step in ELx(1/2) context save (elaborated more):
> > > > ·Save TCR register with PTW enable (EPD=0). (Just to enable PTW during
> > > > restore context). Do not operate TCR_EL1x register here just save its value to restore.
> > > > This ensures that during entry in EL3 there will be no chance of PTW
> > > >. while executing AT instruction.
> > >
> > > Thanks
> > >
> > > Raghu
> > >
> > > Thanks
> > > Manish Badarkhe
> > >
> > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote:
> > > >
> > > > Hi All
> > > >
> > > > We are trying to implement errata which is applicable for below CPUs:
> > > >
> > > > <CPUs> : <Errata No.>
> > > >
> > > > Cortex-A53: 1530924
> > > >
> > > > Cortex-A76: 1165522
> > > > Cortex-A72: 1319367
> > > > Cortex-A57: 1319537
> > > > Cortex-A55: 1530923
> > > >
> > > > *Errata Description:*
> > > >
> > > > A speculative Address Translation (AT) instruction translates using
> > > > registers that are associated with an out-of-context translation
> > > > regime and caches the resulting translation in the TLB. A subsequent
> > > > translation request that is generated when the out-of-context
> > > > translation regime is current uses the previous cached TLB entry
> > > > producing an incorrect virtual to physical mapping.
> > > >
> > > > *Probable solution is to implement below fix in context.S file:*
> > > >
> > > > *During ELx (1 or 2) context save:*
> > > >
> > > > ·Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
> > > >
> > > > oThis will avoid any page table walk for S-EL1 or S-EL2. This will
> > > > help in avoiding caching of translations in TLB
> > > >
> > > > for S-EL1/S-EL2 in EL3.
> > > >
> > > > ·Save all system registers (which is already available) except TCR
> > > >
> > > > ·Clear EPD bits of TCR and then save. (Just to enable PTW during
> > > > restore context).
> > > >
> > > > *During ELx (1 or 2) context restore:*
> > > >
> > > > * Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
> > > > * Restore all system registers (which are saved during context save)
> > > > except TCR register.
> > > > * Restore TCR_ELx(1/2) register (which enable back PTW).
> > > >
> > > > With above we ensured that there will be no page table walk for S-EL1
> > > > and S-EL2 in EL3.
> > > >
> > > > is this proper other way to fix this problem? Need some suggestion/use
> > > > cases where and all we need this workaround in TF-A code.
> > > >
> > > > Thanks
> > > >
> > > > Manish Badarkhe
> > > >
> > > > IMPORTANT NOTICE: The contents of this email and any attachments are
> > > > confidential and may also be privileged. If you are not the intended
> > > > recipient, please notify the sender immediately and do not disclose
> > > > the contents to any other person, use it for any purpose, or store or
> > > > copy the information in any medium. Thank you.
> > > >
> > >
> > > --
> > > TF-A mailing list
> > > TF-A(a)lists.trustedfirmware.org
> > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > >
> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > --
> > > TF-A mailing list
> > > TF-A(a)lists.trustedfirmware.org
> > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > >
> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > --
> > > TF-A mailing list
> > > TF-A(a)lists.trustedfirmware.org
> > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > >
> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> >
> >
> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> >
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
This event has been changed.
Title: TF-A Tech Forum
We run an open technical forum call for anyone to participate and it is not
restricted to Trusted Firmware project members. It will operate under the
guidance of the TF TSC. Feel free to forward this invite to
colleagues. Invites are via the TF-A mailing list and also published on the
Trusted Firmware website. Details are
here: https://www.trustedfirmware.org/meetings/tf-a-technical-forum/Tr…
Firmware is inviting you to a scheduled Zoom meeting.Join Zoom
Meetinghttps://zoom.us/j/9159704974Meeting ID: 915 970 4974One tap
mobile+16465588656,,9159704974# US (New York)+16699009128,,9159704974# US
(San Jose)Dial by your location +1 646 558 8656
US (New York) +1 669 900 9128 US (San
Jose) 877 853 5247 US Toll-free
888 788 0099 US Toll-freeMeeting ID: 915 970 4974Find your
local number: https://zoom.us/u/ad27hc6t7h
When: Every 2 weeks from 18:00 to 19:00 on Thursday from Thu 7 May to Thu
30 Jul Central European Time - Paris (changed)
Where: Zoom
Calendar: tf-a(a)lists.trustedfirmware.org
Who:
(Guest list has been hidden at organiser's request)
Event details:
https://www.google.com/calendar/event?action=VIEW&eid=N3ZoNDBuZzZnM2k4cGszY…
Invitation from Google Calendar: https://www.google.com/calendar/
You are receiving this courtesy email at the account
tf-a(a)lists.trustedfirmware.org because you are an attendee of this event.
To stop receiving future updates for this event, decline this event.
Alternatively, you can sign up for a Google Account at
https://www.google.com/calendar/ and control your notification settings for
your entire calendar.
Forwarding this invitation could allow any recipient to send a response to
the organiser and be added to the guest list, invite others regardless of
their own invitation status or to modify your RSVP. Learn more at
https://support.google.com/calendar/answer/37135#forwarding
Hi Bin Wu,
Glad if this helped!
Hi Thomas,
Thanks for the heads up!
Regards,
Olivier.
________________________________________
From: 吴斌(郅隆) <zhilong.wb(a)alibaba-inc.com>
Sent: 21 April 2020 13:52
To: Thomas Abraham; Olivier Deprez; TF-A
Subject: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Dear All,
Thanks all your help again. Your professionalism and assistance impressed me.
BRs,
Bin Wu
------------------原始邮件 ------------------
发件人:Thomas Abraham <thomas.abraham(a)arm.com>
发送时间:Tue Apr 21 19:38:38 2020
收件人:Olivier Deprez <Olivier.Deprez(a)arm.com>, TF-A <tf-a-bounces(a)lists.trustedfirmware.org>, 吴斌(郅隆) <zhilong.wb(a)alibaba-inc.com>
主题:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hi,
Looking into the mail chain below, this is probably being tested on RD-N1-Edge platform. There was regression noticed in the dmc620 ras error handling in the code pushed to Linaro for RD-N1-Edge platform. This will be fixed later today and patches will be merged into Linaro repos. It should then be accessible using the usual repo init/sync commands.
Thanks,
Thomas.
> -----Original Message-----
> From: TF-A On Behalf Of Olivier
> Deprez via TF-A
> Sent: Tuesday, April 21, 2020 4:45 PM
> To: TF-A ; Raghu K via TF-A
> a(a)lists.trustedfirmware.org>; 吴斌(郅隆)
> Subject: Re: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
> 0xC4000061 and BL31 Crashed
>
> Hi Raghu,
>
> Yes you're right, we probably need few return code checks here and here. I
> may submit a patch and verify it doesn't break anything else.
>
> Hi Bin Wu,
>
> I had noticed the following sequence originating from linux sdei driver init
> down to TF-A:
>
> INFO: SDEI: Private events initialized on 81000100
> INFO: SDEI: Private events initialized on 81000200
> INFO: SDEI: Private events initialized on 81000300
> INFO: SDEI: Private events initialized on 81010000
> INFO: SDEI: Private events initialized on 81010100
> INFO: SDEI: Private events initialized on 81010200
> INFO: SDEI: Private events initialized on 81010300
> INFO: SDEI: > VER
> INFO: SDEI: < VER:1000000000000
> INFO: SDEI: > P_RESET():81000000
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81000200
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81000300
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81010000
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81010100
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81010200
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81010300
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > P_RESET():81000100
> INFO: SDEI: < P_RESET:0
> INFO: SDEI: > S_RESET():81000100
> INFO: SDEI: < S_RESET:0
> INFO: SDEI: > UNMASK:81000000
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81000100
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81000200
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81000300
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81010000
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81010100
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81010200
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > UNMASK:81010300
> INFO: SDEI: < UNMASK:0
> INFO: SDEI: > INFO(n:804, 0)
> INFO: SDEI: < INFO:0
> INFO: SDEI: > INFO(n:805, 0)
> INFO: SDEI: < INFO:0
>
> There is an Sdei Info request about events 804 and 805.
> Although I don't see any register or enable event service call, so I wonder if
> this demo code is missing something or expects that the platform
> implements such event definition natively.
>
> This does not look like flows described in https://trustedfirmware-
> a.readthedocs.io/en/latest/components/sdei.html
> for regular SDEI usage or explicit dispatch of events.
>
> Maybe we should involve Linaro ppl on the expected init sequence and
> dependency to TF-A (platform files).
>
> Regards,
> Olivier.
>
>
> ________________________________________
> From: TF-A on behalf of 吴斌(郅
> 隆) via TF-A
> Sent: 21 April 2020 08:45
> To: TF-A; Raghu K via TF-A
> Subject: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
> 0xC4000061 and BL31 Crashed
>
> Hi Olivier and All,
>
> Thank you so much for your help. It makes me understand the internals.
> The next step, I need to check this event_num(804) register flow in kernel
> side, am I right?
>
>
> BRs,
> Bin Wu
> ------------------原始邮件 ------------------
> 发件人:TF-A
> 发送时间:Tue Apr 21 09:51:49 2020
> 收件人:Raghu K via TF-A
> 主题:Re: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and
> BL31 Crashed
> Nice debug! Apart from the issue you pointed out, there is also the
> issue with not checking the return code. The ras handler should really
> be checking or panic'ing if there is an unexpected error code from
> spm_sp_call and sdei_dispatch_event.
>
> -Raghu
>
> On 4/20/20 2:37 PM, Olivier Deprez via TF-A wrote:
> > Hi Bin Wu,
> >
> > Here's an early observation. On receiving the RAS fiq interrupt the
> following occurs:
> >
> > ehf_el3_interrupt_handler => sgi_ras_intr_handler => spm_sp_call
> (enters/exit the SP to handle the injected RAS error) => sdei_dispatch_event
> >
> > se = get_event_entry(map);
> > if (!can_sdei_state_trans(se, DO_DISPATCH))
> > return -1;
> >
> > p *map
> > $6 = {ev_num = 804, intr = 0, map_flags = 112, reg_count = 0, lock = {lock =
> 0}}
> > p *se
> > $4 = {ep = 0, arg = 0, affinity = 0, reg_flags = 0, state = 0 '\0'}
> >
> > sdei_dispatch_event exits in error at this stage, this does not seem a
> correct behavior.
> > The SDEI handler is not called in NS world and context remains unchanged.
> > The interrupt handler blindly returns to S-EL1 SP context at same location
> where it last exited.
> > sgi_ras_intr_handler => ehf_el3_interrupt_handler => vector_entry
> fiq_aarch64 => el3_exit => re-enters the SP with X0=0xC4000061
> > SP then exits but the EL3 context has not been setup for SP entry leading
> to crash.
> >
> > IMO there is an issue around mapping SDEI event number to RAS interrupt
> number leading to sdei_dispatch_event exiting early.
> >
> > Regards,
> > Olivier.
> >
> >
> > ________________________________________
> > From: TF-A on behalf of Matteo Carlini via TF-A
> > Sent: 14 April 2020 10:41
> > To: 吴斌(郅隆); tf-a(a)lists.trustedfirmware.org; Thomas Abraham; Deepak
> Pandey
> > Cc: nd
> > Subject: Re: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061
> and BL31 Crashed
> >
> > Looping-in Thomas & Deepak, responsible for the RD-N1 landing team
> platforms releases. They might be able to help.
> >
> > Thanks
> > Matteo
> >
> > From: TF-A On Behalf Of ??(??) via TF-A
> > Sent: 14 April 2020 06:47
> > To: TF-A ; Raghu Krishnamurthy via TF-A
> > Subject: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061
> and BL31 Crashed
> >
> > Hi RagHu,
> >
> > Really appreciate your help.
> >
> > I was downloaded this software stack from git.linaro.org. This software
> stack include ATF, kernel, edk2 and so on.
> > The user guide i used from linaro is:https://git.linaro.org/landing-
> teams/working/arm/arm-reference-
> platforms.git/about/docs/rdn1edge/user-guide.rst#obtaining-the-rd-n1-
> edge-and-rd-n1-edge-dual-fast-model
> >
> > 1) What platform you are running on? Can this issue be reproduced
> > outside your testing environment, perhaps on FVP or QEMU?
> > A: I am running on ARM N1-Edge FVP platform. It can reproduced on this
> FVP platform.
> >
> > 2) What version of TF-A and StandaloneMM is being used? Preferably the
> > commit-id, so that we can be sure we are looking at the same code.
> > A: TF-A: https://git.linaro.org/landing-teams/working/arm/arm-tf.git
> tag:RD-INFRA-20191024-RC0
> > StandloneMM seems build from edk2 & edk2-platform. so i just put edk2
> and edk2-platform version information. if anything i missed, please let me
> know.
> > edk2: https://git.linaro.org/landing-teams/working/arm/edk2.git tag:RD-
> INFRA-20191024-RC0
> > edk2-platform: https://git.linaro.org/landing-teams/working/arm/edk2-
> platforms.git tag:RD-INFRA-20191024-RC0
> >
> > 3) What version of the kernel and sdei driver is being used?
> > A: kernel-release: https://git.linaro.org/landing-
> teams/working/arm/kernel-release.git tag:RD-INFRA-20191024-RC0
> > The sdei driver was included in kernel, do i need to provide sdei driver
> version? If need please let me know.
> > 4) I can't tell from looking at the log but do you know if writing 0x123
> > to sde_ras_poison causes a DMC620 interrupt or an SError or external
> > abort through memory access ?
> > A: Sorry, linaro only refered it will inject the DMC-620 single-bit RAS error.
> So I am also not sure which exception type it will trigger.
> >
> > BRs,
> > Bin Wu
> >
> > ------------------原始邮件 ------------------
> > 发件人:TF-A >
> > 发送时间:Tue Apr 14 01:25:47 2020
> > 收件人:Raghu Krishnamurthy via TF-A >
> > 主题:Re: [TF-A] [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31
> Crashed
> > Hello,
> >
> > >>Does BL31 need to send 0xC4000061 event to BL32 again?
> >
> > I don't think it will. It is really odd that
> > 0xC4000061(SP_EVENT_COMPLETE_AARCH64) ever reaches the BL32/MM
> handler.
> > This is from looking at the upstream code quickly but it definitely
> > depends on the platform you are running, what version of TF-A you are
> > using, build options used. Is it possible that the unhandled exception
> > is occurring after successful handling of the DMC620 error but there is
> > a following issue that occurs right after, causing the crash?
> > From the register dump it looks like there was an Instruction abort
> > exception at address 0 while running in EL3. Something seems to have
> > gone seriously wrong to have 0xC4000061 ever go back to BL32 and to get
> > an instruction abort at address 0.
> >
> > >>Does current TF-A support to run RAS test? It seems BL31 will crash.
> > See above. The answer really depends on the factors mentioned above.
> >
> > The following would be helpful to know:
> > 1) What platform you are running on? Can this issue be reproduced
> > outside your testing environment, perhaps on FVP or QEMU?
> > 2) What version of TF-A and StandaloneMM is being used? Preferably the
> > commit-id, so that we can be sure we are looking at the same code.
> > 3) What version of the kernel and sdei driver is being used?
> > 4) I can't tell from looking at the log but do you know if writing 0x123
> > to sde_ras_poison causes a DMC620 interrupt or an SError or external
> > abort through memory access ?
> >
> > Thanks
> > Raghu
> >
> >
> > On 4/13/20 12:16 AM, 吴斌(郅隆) via TF-A wrote:
> >> Dear Friends,
> >>
> >> I am using TF-A to test RAS feature.
> >> When I triggered DMC620 RAS error in Linux(echo 0x123 >
> >> /sys/kernel/debug/sdei_ras_poison).
> >> BL32 will recieve
> >> UnRecognized Event - 0xC4000061(SP_EVENT_COMPLETE_AARCH64) and
> finally
> >> BL31 crashed.
> >>
> >> In my understanding, this 0xC4000061 should consumed by BL31, not
> send
> >> it to BL32 again.
> >>
> >> A piece of error log as below:
> >>
> >> *************************************
> >>
> >> CperWrite - CperAddress@0xFF610064
> >> CperWrite - 1 Section@FFBE91A8, Length 80, SectionType@FFBE9138
> >> CperWrite - Got Error Section: Platform Memory.
> >> MmEntryPoint Done
> >> Received delegated event
> >> X0 : 0xC4000061
> >> X1 : 0x0
> >> X2 : 0x0
> >> X3 : 0x0
> >> Received event - 0xC4000061 on cpu 0
> >> UnRecognized Event - 0xC4000061
> >> Failed delegated event 0xC4000061, Status 0x2
> >> Unhandled Exception in EL3.
> >> x30 = 0x0000000000000000
> >> x0 = 0x00000000ff007e00
> >> x1 = 0xfffffffffffffffe
> >> x2 = 0x00000000600003c0
> >> x3 = 0x0000000000000000
> >> x4 = 0x0000000000000000
> >> x5 = 0x0000000000000000
> >> x6 = 0x00000000ff015080
> >> x7 = 0x0000000000000000
> >> x8 = 0x00000000c4000061
> >> x9 = 0x0000000000000021
> >> x10 = 0x0000000000000040
> >> x11 = 0x00000000ff00f2b0
> >> x12 = 0x00000000ff0118c0
> >> x13 = 0x0000000000000002
> >> x14 = 0x00000000ff016b70
> >> x15 = 0x00000000ff003f20
> >> x16 = 0x0000000000000044
> >> x17 = 0x00000000ff010430
> >> x18 = 0x0000000000000e3c
> >> x19 = 0x0000000000000000
> >> More error log please refer to attachment.
> >>
> >> My question is,
> >> 1. Does BL31 need to send 0xC4000061 event to BL32 again?
> >> 2. Does current TF-A support to run RAS test? It seems BL31 will crash.
> >>
> >> Appreciate your help.
> >>
> >> BRs,
> >> Bin Wu
> >>
> > --
> > TF-A mailing list
> > TF-A(a)lists.trustedfirmware.org
> > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
>
> --
> TF-A mailing list
> TF-A(a)lists.trustedfirmware.org
> https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> --
> TF-A mailing list
> TF-A(a)lists.trustedfirmware.org
> https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi All,
Just update/correct details.
Thanks
Manish Badarkhe
On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Raghu
Please ignore my answer on question 2.
With internal discussion came to below conclusion:
1. This workaround is very specific during context switching.
2 . If you check in context save routine (el1_sysregs_context_save or el2_sysregs_context_save),
As per proposed solution, First step performed is to disable page table walk and we don’t have
any AT instruction execution in context save routine.
This ensures that there will be no possibility of speculative AT instruction execution without TCR update.
3. If you check in context restore routine (el1_sysregs_context_restore or el2_sysregs_context_restore),
As per proposed solution, first step performed is to disable page table walk and we don’t have any
AT instruction execution in context restore routine.
This ensures that there will be no possibility of speculative AT instruction execution without TCR update.
Hence proposed solution will work as it is ensuring no caching of translations in TLB while speculative AT instruction execution.
Thanks
Manish Badarkhe
On 27/04/2020, 13:38, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Raghu
Please see my answers inline
On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Manish,
Before I agree or disagree with the suggested fix, the following would
be interesting to know/discuss. Please feel free to correct me if i've
misunderstood something.
1) Are "speculative" AT instructions subject to TCR_ELx control bits for
all the listed CPU's? I imagine the answer is yes but would be good to
get confirmation. I could not find any evidence in the instruction
description or psuedocode in the ARMv8 ARM. It is possible to play many
tricks on speculative execution of instructions such as skipping checks
and doing them only when the CPU knows the instruction will be
committed. If this is the case, changing TCR_ELx bits may not work. The
errata document is vague about how to fix it.
The speculative AT instruction may behave as you mentioned. We need more
opinion on this.
Proposed fix I mentioned by referring linux workaround for the same errata.
Linux workaround is available in mainline kernel as below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
2) Assuming the answer to question 1 is yes, your proposal may not work
as is. In the worst case, as soon as you enter EL3, the very first thing
that may happen, before you ever operate/write to TCR_ELx, is a
speculative AT instruction that caches a bad translation in the TLB's.
The same thing can happen on the exit path. As soon as you restore the
TCR_ELx register, the first thing that can happen is a speculative AT
that caches a bad translation. However, the el3_exit path does have DSB
before ERET, so we will not speculate to an AT instruction if there are
no branches between the instruction that sets TCR_ELx and the ERET.
Somewhere in between, it looks like we will need a TLBI NSH to be
certain there are no bad translation cached. This obviously has a
potential performance cost on the lower EL's. Every entry into EL3
flushes the TLB for lower EL's.
Yes, this seems to be valid case during entry and exit path.
I am not quite sure in that case where we need to avoid PTW.
Also "TLBI NSH" works but it may cause performance issue.
Need some more opinion/thoughts on this.
Just thinking, can sequence mentioned for context save does not ensure that
PTW is disabled?
Something as below as last step in ELx(1/2) context save (elaborated more):
> ·Save TCR register with PTW enable (EPD=0). (Just to enable PTW during
> restore context). Do not operate TCR_EL1x register here just save its value to restore.
> This ensures that during entry in EL3 there will be no chance of PTW
>. while executing AT instruction.
Thanks
Raghu
Thanks
Manish Badarkhe
On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote:
>
> Hi All
>
> We are trying to implement errata which is applicable for below CPUs:
>
> <CPUs> : <Errata No.>
>
> Cortex-A53: 1530924
>
> Cortex-A76: 1165522
> Cortex-A72: 1319367
> Cortex-A57: 1319537
> Cortex-A55: 1530923
>
> *Errata Description:*
>
> A speculative Address Translation (AT) instruction translates using
> registers that are associated with an out-of-context translation
> regime and caches the resulting translation in the TLB. A subsequent
> translation request that is generated when the out-of-context
> translation regime is current uses the previous cached TLB entry
> producing an incorrect virtual to physical mapping.
>
> *Probable solution is to implement below fix in context.S file:*
>
> *During ELx (1 or 2) context save:*
>
> ·Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
>
> oThis will avoid any page table walk for S-EL1 or S-EL2. This will
> help in avoiding caching of translations in TLB
>
> for S-EL1/S-EL2 in EL3.
>
> ·Save all system registers (which is already available) except TCR
>
> ·Clear EPD bits of TCR and then save. (Just to enable PTW during
> restore context).
>
> *During ELx (1 or 2) context restore:*
>
> * Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
> * Restore all system registers (which are saved during context save)
> except TCR register.
> * Restore TCR_ELx(1/2) register (which enable back PTW).
>
> With above we ensured that there will be no page table walk for S-EL1
> and S-EL2 in EL3.
>
> is this proper other way to fix this problem? Need some suggestion/use
> cases where and all we need this workaround in TF-A code.
>
> Thanks
>
> Manish Badarkhe
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
>
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Raghu
Please ignore my answer on question 2.
With internal discussion came to below conclusion:
1. This workaround is very specific during context switching.
2 . If you check in context save routine (el1_sysregs_context_save or el2_sysregs_context_save),
First step performed is to disable page table walk and Also, we don’t have any AT instruction execution in that context save routing.
This ensures that there will be no possibility of speculative AT instruction execution without TCR update.
3. If you check in context save routine (el1_sysregs_context_restore or el2_sysregs_context_restore),
first step performed is to disable page table walk and Also, we don’t have any AT instruction execution in that path.
This ensures that there will be no possibility of speculative AT instruction execution without TCR update.
Hence proposed solution will work as it is ensuring no caching of translations in TLB while speculative AT instruction execution.
Thanks
Manish Badarkhe
On 27/04/2020, 13:38, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Raghu
Please see my answers inline
On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Manish,
Before I agree or disagree with the suggested fix, the following would
be interesting to know/discuss. Please feel free to correct me if i've
misunderstood something.
1) Are "speculative" AT instructions subject to TCR_ELx control bits for
all the listed CPU's? I imagine the answer is yes but would be good to
get confirmation. I could not find any evidence in the instruction
description or psuedocode in the ARMv8 ARM. It is possible to play many
tricks on speculative execution of instructions such as skipping checks
and doing them only when the CPU knows the instruction will be
committed. If this is the case, changing TCR_ELx bits may not work. The
errata document is vague about how to fix it.
The speculative AT instruction may behave as you mentioned. We need more
opinion on this.
Proposed fix I mentioned by referring linux workaround for the same errata.
Linux workaround is available in mainline kernel as below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
2) Assuming the answer to question 1 is yes, your proposal may not work
as is. In the worst case, as soon as you enter EL3, the very first thing
that may happen, before you ever operate/write to TCR_ELx, is a
speculative AT instruction that caches a bad translation in the TLB's.
The same thing can happen on the exit path. As soon as you restore the
TCR_ELx register, the first thing that can happen is a speculative AT
that caches a bad translation. However, the el3_exit path does have DSB
before ERET, so we will not speculate to an AT instruction if there are
no branches between the instruction that sets TCR_ELx and the ERET.
Somewhere in between, it looks like we will need a TLBI NSH to be
certain there are no bad translation cached. This obviously has a
potential performance cost on the lower EL's. Every entry into EL3
flushes the TLB for lower EL's.
Yes, this seems to be valid case during entry and exit path.
I am not quite sure in that case where we need to avoid PTW.
Also "TLBI NSH" works but it may cause performance issue.
Need some more opinion/thoughts on this.
Just thinking, can sequence mentioned for context save does not ensure that
PTW is disabled?
Something as below as last step in ELx(1/2) context save (elaborated more):
> ·Save TCR register with PTW enable (EPD=0). (Just to enable PTW during
> restore context). Do not operate TCR_EL1x register here just save its value to restore.
> This ensures that during entry in EL3 there will be no chance of PTW
>. while executing AT instruction.
Thanks
Raghu
Thanks
Manish Badarkhe
On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote:
>
> Hi All
>
> We are trying to implement errata which is applicable for below CPUs:
>
> <CPUs> : <Errata No.>
>
> Cortex-A53: 1530924
>
> Cortex-A76: 1165522
> Cortex-A72: 1319367
> Cortex-A57: 1319537
> Cortex-A55: 1530923
>
> *Errata Description:*
>
> A speculative Address Translation (AT) instruction translates using
> registers that are associated with an out-of-context translation
> regime and caches the resulting translation in the TLB. A subsequent
> translation request that is generated when the out-of-context
> translation regime is current uses the previous cached TLB entry
> producing an incorrect virtual to physical mapping.
>
> *Probable solution is to implement below fix in context.S file:*
>
> *During ELx (1 or 2) context save:*
>
> ·Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
>
> oThis will avoid any page table walk for S-EL1 or S-EL2. This will
> help in avoiding caching of translations in TLB
>
> for S-EL1/S-EL2 in EL3.
>
> ·Save all system registers (which is already available) except TCR
>
> ·Clear EPD bits of TCR and then save. (Just to enable PTW during
> restore context).
>
> *During ELx (1 or 2) context restore:*
>
> * Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
> * Restore all system registers (which are saved during context save)
> except TCR register.
> * Restore TCR_ELx(1/2) register (which enable back PTW).
>
> With above we ensured that there will be no page table walk for S-EL1
> and S-EL2 in EL3.
>
> is this proper other way to fix this problem? Need some suggestion/use
> cases where and all we need this workaround in TF-A code.
>
> Thanks
>
> Manish Badarkhe
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
>
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Raghu
Please see my answers inline
On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Manish,
Before I agree or disagree with the suggested fix, the following would
be interesting to know/discuss. Please feel free to correct me if i've
misunderstood something.
1) Are "speculative" AT instructions subject to TCR_ELx control bits for
all the listed CPU's? I imagine the answer is yes but would be good to
get confirmation. I could not find any evidence in the instruction
description or psuedocode in the ARMv8 ARM. It is possible to play many
tricks on speculative execution of instructions such as skipping checks
and doing them only when the CPU knows the instruction will be
committed. If this is the case, changing TCR_ELx bits may not work. The
errata document is vague about how to fix it.
The speculative AT instruction may behave as you mentioned. We need more
opinion on this.
Proposed fix I mentioned by referring linux workaround for the same errata.
Linux workaround is available in mainline kernel as below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
2) Assuming the answer to question 1 is yes, your proposal may not work
as is. In the worst case, as soon as you enter EL3, the very first thing
that may happen, before you ever operate/write to TCR_ELx, is a
speculative AT instruction that caches a bad translation in the TLB's.
The same thing can happen on the exit path. As soon as you restore the
TCR_ELx register, the first thing that can happen is a speculative AT
that caches a bad translation. However, the el3_exit path does have DSB
before ERET, so we will not speculate to an AT instruction if there are
no branches between the instruction that sets TCR_ELx and the ERET.
Somewhere in between, it looks like we will need a TLBI NSH to be
certain there are no bad translation cached. This obviously has a
potential performance cost on the lower EL's. Every entry into EL3
flushes the TLB for lower EL's.
Yes, this seems to be valid case during entry and exit path.
I am not quite sure in that case where we need to avoid PTW.
Also "TLBI NSH" works but it may cause performance issue.
Need some more opinion/thoughts on this.
Just thinking, can sequence mentioned for context save does not ensure that
PTW is disabled?
Something as below as last step in ELx(1/2) context save (elaborated more):
> ·Save TCR register with PTW enable (EPD=0). (Just to enable PTW during
> restore context). Do not operate TCR_EL1x register here just save its value to restore.
> This ensures that during entry in EL3 there will be no chance of PTW
>. while executing AT instruction.
Thanks
Raghu
Thanks
Manish Badarkhe
On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote:
>
> Hi All
>
> We are trying to implement errata which is applicable for below CPUs:
>
> <CPUs> : <Errata No.>
>
> Cortex-A53: 1530924
>
> Cortex-A76: 1165522
> Cortex-A72: 1319367
> Cortex-A57: 1319537
> Cortex-A55: 1530923
>
> *Errata Description:*
>
> A speculative Address Translation (AT) instruction translates using
> registers that are associated with an out-of-context translation
> regime and caches the resulting translation in the TLB. A subsequent
> translation request that is generated when the out-of-context
> translation regime is current uses the previous cached TLB entry
> producing an incorrect virtual to physical mapping.
>
> *Probable solution is to implement below fix in context.S file:*
>
> *During ELx (1 or 2) context save:*
>
> ·Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
>
> oThis will avoid any page table walk for S-EL1 or S-EL2. This will
> help in avoiding caching of translations in TLB
>
> for S-EL1/S-EL2 in EL3.
>
> ·Save all system registers (which is already available) except TCR
>
> ·Clear EPD bits of TCR and then save. (Just to enable PTW during
> restore context).
>
> *During ELx (1 or 2) context restore:*
>
> * Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
> * Restore all system registers (which are saved during context save)
> except TCR register.
> * Restore TCR_ELx(1/2) register (which enable back PTW).
>
> With above we ensured that there will be no page table walk for S-EL1
> and S-EL2 in EL3.
>
> is this proper other way to fix this problem? Need some suggestion/use
> cases where and all we need this workaround in TF-A code.
>
> Thanks
>
> Manish Badarkhe
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
>
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Manish,
Before I agree or disagree with the suggested fix, the following would
be interesting to know/discuss. Please feel free to correct me if i've
misunderstood something.
1) Are "speculative" AT instructions subject to TCR_ELx control bits for
all the listed CPU's? I imagine the answer is yes but would be good to
get confirmation. I could not find any evidence in the instruction
description or psuedocode in the ARMv8 ARM. It is possible to play many
tricks on speculative execution of instructions such as skipping checks
and doing them only when the CPU knows the instruction will be
committed. If this is the case, changing TCR_ELx bits may not work. The
errata document is vague about how to fix it.
2) Assuming the answer to question 1 is yes, your proposal may not work
as is. In the worst case, as soon as you enter EL3, the very first thing
that may happen, before you ever operate/write to TCR_ELx, is a
speculative AT instruction that caches a bad translation in the TLB's.
The same thing can happen on the exit path. As soon as you restore the
TCR_ELx register, the first thing that can happen is a speculative AT
that caches a bad translation. However, the el3_exit path does have DSB
before ERET, so we will not speculate to an AT instruction if there are
no branches between the instruction that sets TCR_ELx and the ERET.
Somewhere in between, it looks like we will need a TLBI NSH to be
certain there are no bad translation cached. This obviously has a
potential performance cost on the lower EL's. Every entry into EL3
flushes the TLB for lower EL's.
Thanks
Raghu
On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote:
>
> Hi All
>
> We are trying to implement errata which is applicable for below CPUs:
>
> <CPUs> : <Errata No.>
>
> Cortex-A53: 1530924
>
> Cortex-A76: 1165522
> Cortex-A72: 1319367
> Cortex-A57: 1319537
> Cortex-A55: 1530923
>
> *Errata Description:*
>
> A speculative Address Translation (AT) instruction translates using
> registers that are associated with an out-of-context translation
> regime and caches the resulting translation in the TLB. A subsequent
> translation request that is generated when the out-of-context
> translation regime is current uses the previous cached TLB entry
> producing an incorrect virtual to physical mapping.
>
> *Probable solution is to implement below fix in context.S file:*
>
> *During ELx (1 or 2) context save:*
>
> ·Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
>
> oThis will avoid any page table walk for S-EL1 or S-EL2. This will
> help in avoiding caching of translations in TLB
>
> for S-EL1/S-EL2 in EL3.
>
> ·Save all system registers (which is already available) except TCR
>
> ·Clear EPD bits of TCR and then save. (Just to enable PTW during
> restore context).
>
> *During ELx (1 or 2) context restore:*
>
> * Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
> * Restore all system registers (which are saved during context save)
> except TCR register.
> * Restore TCR_ELx(1/2) register (which enable back PTW).
>
> With above we ensured that there will be no page table walk for S-EL1
> and S-EL2 in EL3.
>
> is this proper other way to fix this problem? Need some suggestion/use
> cases where and all we need this workaround in TF-A code.
>
> Thanks
>
> Manish Badarkhe
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
>
Hi All
We are trying to implement errata which is applicable for below CPUs:
<CPUs> : <Errata No.>
Cortex-A53: 1530924
Cortex-A76: 1165522
Cortex-A72: 1319367
Cortex-A57: 1319537
Cortex-A55: 1530923
Errata Description:
A speculative Address Translation (AT) instruction translates using registers that are associated with an out-of-context translation regime and caches the resulting translation in the TLB. A subsequent translation request that is generated when the out-of-context translation regime is current uses the previous cached TLB entry producing an incorrect virtual to physical mapping.
Probable solution is to implement below fix in context.S file:
During ELx (1 or 2) context save:
· Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
o This will avoid any page table walk for S-EL1 or S-EL2. This will help in avoiding caching of translations in TLB
for S-EL1/S-EL2 in EL3.
· Save all system registers (which is already available) except TCR
· Clear EPD bits of TCR and then save. (Just to enable PTW during restore context).
During ELx (1 or 2) context restore:
* Operate TCR_ELx(1/2) to disable page table walk by operating EPD bits
* Restore all system registers (which are saved during context save) except TCR register.
* Restore TCR_ELx(1/2) register (which enable back PTW).
With above we ensured that there will be no page table walk for S-EL1 and S-EL2 in EL3.
is this proper other way to fix this problem? Need some suggestion/use cases where and all we need this workaround in TF-A code.
Thanks
Manish Badarkhe
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi,
On 4/21/20 7:23 AM, Soby Mathew via TF-A wrote:
> My view is that smaller patches are easier to review and we should try to break up the patches to logical chucks where possible. I haven't taken a look at the patches myself but I am sure there will be ways to break it up for ease of review.
I would like to strongly echo this. I find big patches so hard to
review. There is only so much things the human brain can comprehend in
one go. Smaller patches are just easier to reason about, they focus on
one thing and it is easier to get your head around them because the
entire patch and the interaction it may have with other components
"fits" in one's head. Thus, it is much easier to reach a good level of
confidence at review time.
Also, I believe there is a natural tendency of getting discouraged at
the sight of big patches, smaller patches have a much better chance of
getting reviewed quickly. They can also be dealt with incrementally. Say
in a 10 patch stack, it may happen that the 5 first are good to go,
while the sixth is more controversial and requires more discussion. In
this case, being able to merge the 5 patches is a first step in the
right direction.
Ideally, one should think about how to split the patches in a logical,
manageable way early during development. It is true that if it is an
afterthought, breaking up a huge patch down into smaller ones is a lot
of work. This is why it needs doing upfront IMO.
Cheers,
Sandrine