Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 09/10/2020 20:45, Biju Das wrote:
Hi,
I have found the issue. Please see inline.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
>>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve >>> performance by avoiding single byte writes") causing BL31 boot >>> failure on Renesas RZ/G2 platforms. >>> >>> On 08/10/2020 20:05, Biju Das wrote: >>>> Then on target, found that BL31 is failed to boot[3], which is >>>> fixed by reverting the commit 75fab6496e5fce9a11 >>>> ("libc: memset: improve performance by avoiding single byte >>>> writes"), see >>> the logs [3]. >>> >>> Mmmh, interesting. Can you build with "DEBUG=1" to get more >>> output from BL31? >> >> Sure Will do and provide feedback.
>>> I see some calls to memset() from code in drivers/renesas/rcar. >>> Can you add some debug prints at the top of memset() to dump >>> the parameters on each call? To see what breaks it? >> >> OK.
I believe commenting [1] fixed the issue which corresponds to memset
parameters[2].
Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't
have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory,
what
kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG
MAP_REGION_FLAT(RCAR_BL31_LOG_BASE,
\ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset() and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case
here.
Issue is with your original commit[1], it is entering infinite loop. I
commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/li b/libc/memset.c#n43
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked fine for me every time. Do you see the actual problem?
Can you send me the resulting binary: build/rcar/release/libc/memset.o? Just to check what your compiler made of it?
Here it is [1] https://gitlab.com/bijudas/debug
Just a suggestion, Why can't we use finite for loops with pointer arithmetic instead of the while loops? 1) first for loop with ptr64 and ptr64 +(remainingcount >> 3) and 2) second for loop with ptr and ptr + (remainingcount & 7)
Please see the code generated by compiler for working case and non working case.
Please see the line inserted by compiler for non working case. 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8
biju@biju-VirtualBox:~/work/test/trusted-firmware-a$ /usr/share/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -S build/rzg/release/libc/memset.o
build/rzg/release/libc/memset.o: file format elf64-littleaarch64
Disassembly of section .text.memset:
0000000000000000 <memset>: 0:b4000342 cbzx2, 68 <memset+0x68> 4:12001c21 andw1, w1, #0xff 8:aa0003e5 movx5, x0 c:f24008a6 andsx6, x5, #0x7 10:54000261 b.ne5c <memset+0x5c> // b.any 14:d3781c24 ubfizx4, x1, #8, #8 18:92401c23 andx3, x1, #0xff 1c:aa030083 orrx3, x4, x3 20:d2800004 movx4, #0x0 // #0 24:aa034063 orrx3, x3, x3, lsl #16 28:aa038063 orrx3, x3, x3, lsl #32 2c:cb040047 subx7, x2, x4 30:f1001cff cmpx7, #0x7 34:540001c8 b.hi6c <memset+0x6c> // b.pmore 38:d343fc43 lsrx3, x2, #3 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8 40:8b030ca5 addx5, x5, x3, lsl #3 44:9b040862 maddx2, x3, x4, x2 48:eb06005f cmpx2, x6 4c:540000e0 b.eq68 <memset+0x68> // b.none 50:382668a1 strbw1, [x5, x6] 54:910004c6 addx6, x6, #0x1 58:17fffffc b48 <memset+0x48> 5c:380014a1 strbw1, [x5], #1 60:f1000442 subsx2, x2, #0x1 64:54fffd41 b.nec <memset+0xc> // b.any 68:d65f03c0 ret 6c:f82468a3 strx3, [x5, x4] 70:91002084 addx4, x4, #0x8 74:17ffffee b2c <memset+0x2c>
Disassembly with commenting last while loop (Working case) --------------------------------------------
biju@biju-VirtualBox:~/work/test/trusted-firmware-a$ /usr/share/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -S memset.o
memset.o: file format elf64-littleaarch64
Disassembly of section .text.memset:
0000000000000000 <memset>: 0:b4000262 cbzx2, 4c <memset+0x4c> 4:12001c21 andw1, w1, #0xff 8:aa0003e3 movx3, x0 c:f2400864 andsx4, x3, #0x7 10:54000181 b.ne40 <memset+0x40> // b.any 14:92401c25 andx5, x1, #0xff 18:d3781c21 ubfizx1, x1, #8, #8 1c:aa050021 orrx1, x1, x5 20:aa014021 orrx1, x1, x1, lsl #16 24:aa018021 orrx1, x1, x1, lsl #32 28:cb040045 subx5, x2, x4 2c:f1001cbf cmpx5, #0x7 30:540000e9 b.ls4c <memset+0x4c> // b.plast 34:f8246861 strx1, [x3, x4] 38:91002084 addx4, x4, #0x8 3c:17fffffb b28 <memset+0x28> 40:38001461 strbw1, [x3], #1 44:f1000442 subsx2, x2, #0x1 48:54fffe21 b.nec <memset+0xc> // b.any 4c:d65f03c0 ret
Regards, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647