Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
> >>>>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: >>>>> improve performance by avoiding single byte writes") causing >>>>> BL31 boot failure on Renesas RZ/G2 platforms. >>>>> >>>>> On 08/10/2020 20:05, Biju Das wrote: >>>>>> Then on target, found that BL31 is failed to boot[3], which >>>>>> is fixed by reverting the commit 75fab6496e5fce9a11 >>>>>> ("libc: memset: improve performance by avoiding single byte >>>>>> writes"), see >>>>> the logs [3]. >>>>> >>>>> Mmmh, interesting. Can you build with "DEBUG=1" to get more >>>>> output from BL31? >>>> >>>> Sure Will do and provide feedback. >> >>>>> I see some calls to memset() from code in drivers/renesas/rcar. >>>>> Can you add some debug prints at the top of memset() to
dump
>>>>> the parameters on each call? To see what breaks it? >>>> >>>> OK. >> >> I believe commenting [1] fixed the issue which corresponds to >> memset > parameters[2]. >> Looks like calling rcar_log_init-->memset from [3] is causing the
issue.
> > Many thanks for debugging this, that's helpful! > > But without your patch it is working fine. Only thing is, you > have added extra code to >> improve the speed. I am not sure how it is impacting here. >> Currently I don't > have access to Lauterbach to debug further. Any thoughts ??? > > So this RCAR_BL31_LOG_BASE address is mapped as device
memory,
what
> kind of device is this? SRAM? Or something non-volatile, even? > Even if it's SRAM, sometimes the controller managing this does > not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG
MAP_REGION_FLAT(RCAR_BL31_LOG_BASE,
\ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
> In general letting the compiler access device memory directly is > asking for trouble, every access should go through MMIO accessors. > Linux has > memset_io() for those kind of problems. > > Can you try the attached patch? That should downgrade the stores > to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset()
and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case
here.
Issue is with your original commit[1], it is entering infinite loop. I
commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/ li b/libc/memset.c#n43
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked
fine for me every time.
Do you see the actual problem?
Can you send me the resulting binary: build/rcar/release/libc/memset.o? Just to check what your compiler made of it?
Here it is [1] https://gitlab.com/bijudas/debug
Thanks, that is actually identical to what my compiler produced, so that means toolchain-wise we are on the same page.
Just a suggestion, Why can't we use finite for loops with pointer arithmetic
instead of the while loops?
- first for loop with ptr64 and ptr64 +(remainingcount >> 3) and
- second for loop with ptr and ptr + (remainingcount & 7)
But what would that change?
Hmm. The result same, it didn't fix the issue.
Please see the code generated by compiler for working case and non
working case.
Just to clarify: with "working" you mean "commenting the last while loop"?
Yes. I just want to highlight that some kind of issue is happening just before the while loop.
Which breaks the algorithm, as we would lose the tail bytes to be filled, in case the end of the to-be-filled region is not 64-bit aligned? So it's not an option to just skip that. It just happens to work in your case because the end *is* 64-bit aligned, so the final loop is not needed.
Agreed.
And besides, I still don't see the problem. I stepped through the disassembly, with (<aligned ptr>, 0, 81840) for the parameters and don't see why there would be a problem. Once it's done with the 64-bit write section, both x2 and x6 are 0, so the cmp at 0x48 sets the Z bit, and "b.eq 68" branches to the ret. So the whole final loop is skipped, as intended.
Please see the line inserted by compiler for non working case. 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8
Yes, that has a particular purpose (which is not very obvious because this is - Os optimised): It is used to *subtract* the already filled bytes from the original length, to then serve as the limit for the final while loop - which is actually turned in a for loop in assembly! There is no msub in AArch64, so the compiler uses madd and makes one of the factors negative.
So, can you please double check that: memset(0x44040010, 0, 81840); is really the call that hangs your system?
Yes, This is called from console.S-->rcar_log_int-->memset
And that it really does because so it's being stuck in the last while loop?
After further investigation, not sure it is related to while loop or some kind of corruption?, since I don't have access to JTAG further debugging. Mostly some kind of corruption leading to crash.
The new code introduced, some increased code size, which is causing bl31 boot failure, when we call memset from console.S
Case 1) console.S -->rcar_log_int -->memset --> bl31 not booting.
Case 2) If I comment, rcar_log_init from console.S --> bl31 boots fine.
Case 3) If I comment, rcar_log_init from console.S and call rcar_log_init from bl31_early_platform_setup2 --> bl31 boots fine.
Any thoughts?
And can you make this RCAR_BL31_LOG_BASE region MT_NORMAL memory and see if the problem still persists?
Yes , still the problem persists. The current mapping "MT_DEVICE" is as per "Use of coherent memory in TF-A" section as mentioned in [1]. By changing the region as MT_NORMAL also didn't help , see the patch [2]. Please share your opinion on this?
[1] https://github.com/ARM-software/arm-trusted-firmware/blob/master/docs/design... [2] diff --git a/include/lib/xlat_tables/xlat_tables_v2.h b/include/lib/xlat_tables/xlat_tables_v2.h index 359b9839a..c50727a42 100644 --- a/include/lib/xlat_tables/xlat_tables_v2.h +++ b/include/lib/xlat_tables/xlat_tables_v2.h @@ -86,6 +86,7 @@ #define MT_DEVICE U(0) #define MT_NON_CACHEABLE U(1) #define MT_MEMORY U(2) +#define MT_NORMAL U(4) /* Values up to 7 are reserved to add new memory types in the future */
#define MT_RO (U(0) << MT_PERM_SHIFT) diff --git a/plat/renesas/rcar/aarch64/platform_common.c b/plat/renesas/rcar/aarch64/platform_common.c index b0a88cb6b..744c79c98 100644 --- a/plat/renesas/rcar/aarch64/platform_common.c +++ b/plat/renesas/rcar/aarch64/platform_common.c @@ -64,7 +64,7 @@ const uint8_t version_of_renesas[VERSION_OF_RENESAS_MAXLEN]
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \ - MT_DEVICE | MT_RW | MT_SECURE) + MT_NORMAL | MT_RW | MT_SECURE)
Sure, having no tail loop simplifies the algorithm, but is wrong. So what is the point here?
Agreed, I just want to give some pointers related to this issue, so that we can narrow down the issue. that's all.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647