Hi All,
I am porting Renesas RZ/G2M[1] platform to TF-A Master branch(commitid: eeb77da646844) . Initially faced a compilation error [2] which is fixed by using "#define" instead of "static const uint64_t". Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see the logs [3].
Q1) Have you seen this issue? Please share your thoughts on this issues.
[1] https://www.renesas.com/us/en/products/microcontrollers-microprocessors/rz/r...
[2] https://lists.trustedfirmware.org/pipermail/tf-a/2020-October/000763.html
[3] Failure logs ------------ NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BL2: FDT at 0xe632a648 NOTICE: BL2: v2.3(release):v2.3-778-g4fd58b045 NOTICE: BL2: Built : 18:14:35, Oct 8 2020 NOTICE: BL2: Normal boot NOTICE: BL2: dst=0xe6328300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) NOTICE: BL2: Booting BL31
[4]Working logs after reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") -------------------------------------- NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BL2: FDT at 0xe632a648 NOTICE: BL2: v2.3(release):v2.3-779-geba02ef39 NOTICE: BL2: Built : 18:16:47, Oct 8 2020 NOTICE: BL2: Normal boot NOTICE: BL2: dst=0xe6328300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) NOTICE: BL2: Booting BL31
U-Boot 2020.10-00535-gd0e9210a2f (Oct 08 2020 - 12:34:09 +0100)
CPU: Renesas Electronics R8A774A1 rev 1.3 Model: HopeRun HiHope RZ/G2M with sub board DRAM: 3.9 GiB MMC: mmc@ee100000: 0, mmc@ee160000: 1 Loading Environment from MMC... OK In: serial@e6e88000 Out: serial@e6e88000 Err: serial@e6e88000 Net: eth0: ethernet@e6800000 Hit any key to stop autoboot: 0
Regards, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 08/10/2020 20:05, Biju Das wrote:
Hi,
I am porting Renesas RZ/G2M[1] platform to TF-A Master branch(commitid: eeb77da646844) . Initially faced a compilation error [2] which is fixed by using "#define" instead of "static const uint64_t".
What is your exact fix? That compilation error sounds like a more serious issue, I am scratching my head how such a change would really fix things?
And do you see this with mainline? I tried: $ make bl31 PLAT=rcar LSI=M3 MBEDTLS_DIR=../mbedtls.git on origin/master and it built just fine.
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it? Are you using memset on some I/O memory, by any chance? And that doesn't support all access types, like 64-bit stores?
Cheers, Andre
Q1) Have you seen this issue? Please share your thoughts on this issues.
[1] https://www.renesas.com/us/en/products/microcontrollers-microprocessors/rz/r...
[2] https://lists.trustedfirmware.org/pipermail/tf-a/2020-October/000763.html
[3] Failure logs
NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BL2: FDT at 0xe632a648 NOTICE: BL2: v2.3(release):v2.3-778-g4fd58b045 NOTICE: BL2: Built : 18:14:35, Oct 8 2020 NOTICE: BL2: Normal boot NOTICE: BL2: dst=0xe6328300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) NOTICE: BL2: Booting BL31
[4]Working logs after reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes")
NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BL2: FDT at 0xe632a648 NOTICE: BL2: v2.3(release):v2.3-779-geba02ef39 NOTICE: BL2: Built : 18:16:47, Oct 8 2020 NOTICE: BL2: Normal boot NOTICE: BL2: dst=0xe6328300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) NOTICE: BL2: Booting BL31
U-Boot 2020.10-00535-gd0e9210a2f (Oct 08 2020 - 12:34:09 +0100)
CPU: Renesas Electronics R8A774A1 rev 1.3 Model: HopeRun HiHope RZ/G2M with sub board DRAM: 3.9 GiB MMC: mmc@ee100000: 0, mmc@ee160000: 1 Loading Environment from MMC... OK In: serial@e6e88000 Out: serial@e6e88000 Err: serial@e6e88000 Net: eth0: ethernet@e6800000 Hit any key to stop autoboot: 0
Regards, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Hi Andre,
Thanks for the feedback.
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote:
Hi,
I am porting Renesas RZ/G2M[1] platform to TF-A Master branch(commitid: eeb77da646844) . Initially faced a compilation error [2]
which is fixed by using "#define" instead of "static const uint64_t".
What is your exact fix? That compilation error sounds like a more serious issue, I am scratching my head how such a change would really fix things?
Looks like this issue is related to toolchain, I was using aarch64-linux-gcc-7 on U-buntu 18.04 Host toolchain. The compilation is error is fixed by the below changes.
#define BL31_RO_BASE BL_CODE_BASE #define BL31_RO_LIMIT BL_CODE_END #define BL31_COHERENT_RAM_BASE BL_COHERENT_RAM_BASE #define BL31_COHERENT_RAM_LIMIT BL_COHERENT_RAM_END
The warning fix should not lead to compilation error with the old toolchains. It should be backward compatible to the old toolchains right? Please share your views.
And do you see this with mainline? I tried:
Yes
$ make bl31 PLAT=rcar LSI=M3 MBEDTLS_DIR=../mbedtls.git on origin/master and it built just fine.
Thanks for the hint, After using the toolchain[1], I got the same results as your's. it compiles successfully. So the issue is related to using older tool chains. [1] https://developer.arm.com/tools-and-software/open-source-software/developer-...
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see
the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
Are you using memset on some I/O memory, by any chance? And that doesn't support all access types, like 64-bit stores?
Will check and let you know.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 09/10/2020 08:12, Biju Das wrote:
Hi Biju,
Thanks for the feedback.
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote:
Hi,
I am porting Renesas RZ/G2M[1] platform to TF-A Master branch(commitid: eeb77da646844) . Initially faced a compilation error [2]
which is fixed by using "#define" instead of "static const uint64_t".
What is your exact fix? That compilation error sounds like a more serious issue, I am scratching my head how such a change would really fix things?
Looks like this issue is related to toolchain, I was using aarch64-linux-gcc-7 on U-buntu 18.04 Host toolchain. The compilation is error is fixed by the below changes.
#define BL31_RO_BASE BL_CODE_BASE #define BL31_RO_LIMIT BL_CODE_END #define BL31_COHERENT_RAM_BASE BL_COHERENT_RAM_BASE #define BL31_COHERENT_RAM_LIMIT BL_COHERENT_RAM_END
The warning fix should not lead to compilation error with the old toolchains. It should be backward compatible to the old toolchains right? Please share your views.
And do you see this with mainline? I tried:
Yes
$ make bl31 PLAT=rcar LSI=M3 MBEDTLS_DIR=../mbedtls.git on origin/master and it built just fine.
Thanks for the hint, After using the toolchain[1], I got the same results as your's. it compiles successfully. So the issue is related to using older tool chains. [1] https://developer.arm.com/tools-and-software/open-source-software/developer-...
Ah, that's a good point. I originally tried on my x86 laptop, where I have a GCC 9 cross toolchain installed. But indeed on my arm64 desktop with Ubuntu 18.04 it fails, as you wrote.
I believe that should be fixed in the code.
Can you file an issue on https://developer.trustedfirmware.org/? Then we can have the discussion there.
Cheers, Andre
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see
the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
Are you using memset on some I/O memory, by any chance? And that doesn't support all access types, like 64-bit stores?
Will check and let you know.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote:
Hi,
I am porting Renesas RZ/G2M[1] platform to TF-A Master branch(commitid: eeb77da646844) . Initially faced a compilation error [2]
which is fixed by using "#define" instead of "static const uint64_t".
What is your exact fix? That compilation error sounds like a more serious issue, I am scratching my head how such a change would really fix
things?
Looks like this issue is related to toolchain, I was using aarch64-linux-gcc-7
on U-buntu 18.04 Host toolchain.
The compilation is error is fixed by the below changes.
#define BL31_RO_BASE BL_CODE_BASE #define BL31_RO_LIMIT BL_CODE_END #define BL31_COHERENT_RAM_BASE BL_COHERENT_RAM_BASE #define BL31_COHERENT_RAM_LIMIT BL_COHERENT_RAM_END
The warning fix should not lead to compilation error with the old
toolchains.
It should be backward compatible to the old toolchains right? Please share
your views.
And do you see this with mainline? I tried:
Yes
$ make bl31 PLAT=rcar LSI=M3 MBEDTLS_DIR=../mbedtls.git on origin/master and it built just fine.
Thanks for the hint, After using the toolchain[1], I got the same results as
your's. it compiles successfully. So the issue is related to using older tool chains.
[1] https://developer.arm.com/tools-and-software/open-source-
software/deve
loper-tools/gnu-toolchain/gnu-a/downloads
Ah, that's a good point. I originally tried on my x86 laptop, where I have a GCC 9 cross toolchain installed. But indeed on my arm64 desktop with Ubuntu 18.04 it fails, as you wrote.
I believe that should be fixed in the code.
Can you file an issue on https://developer.trustedfirmware.org/? Then we can have the discussion there.
Ok. Filed the issue https://developer.trustedfirmware.org/T842
Cheers, Biju
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see
the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
Are you using memset on some I/O memory, by any chance? And that doesn't support all access types, like 64-bit stores?
Will check and let you know.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote:
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see
the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
I believe commenting [1] fixed the issue which corresponds to memset parameters[2]. Looks like calling rcar_log_init-->memset from [3] is causing the issue. But without your patch it is working fine. Only thing is, you have added extra code to improve the speed. I am not sure how it is impacting here. Currently I don't have access to Lauterbach to debug further. Any thoughts ???
[1] https://elixir.bootlin.com/arm-trusted-firmware/latest/source/drivers/renesa... [2 ]NOTICE: BLDEBUG: dst=0x44040010, val=0, count=81840 [3] https://elixir.bootlin.com/arm-trusted-firmware/latest/source/drivers/renesa...
Regards, Biju
Are you using memset on some I/O memory, by any chance? And that doesn't support all access types, like 64-bit stores?
Will check and let you know.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote:
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see
the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
I believe commenting [1] fixed the issue which corresponds to memset parameters[2]. Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory, what kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
If that fixes it, we could either have a memset_io32() implementation, or you can revert to those memset() routines I see in drivers/renesas/rcar already, for instance by exporting emmc_memset().
Cheers, Andre
[1] https://elixir.bootlin.com/arm-trusted-firmware/latest/source/drivers/renesa... [2 ]NOTICE: BLDEBUG: dst=0x44040010, val=0, count=81840 [3] https://elixir.bootlin.com/arm-trusted-firmware/latest/source/drivers/renesa...
Regards, Biju
Are you using memset on some I/O memory, by any chance? And that doesn't support all access types, like 64-bit stores?
Will check and let you know.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote:
Then on target, found that BL31 is failed to boot[3], which is fixed by reverting the commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes"), see
the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
I believe commenting [1] fixed the issue which corresponds to memset
parameters[2].
Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't
have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory, what kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
If that fixes it, we could either have a memset_io32() implementation, or you can revert to those memset() routines I see in drivers/renesas/rcar already, for instance by exporting emmc_memset().
However I copied the same code emmc_memset to rcar_memset() and with that it works fine. Looks like byte access works OK. Any thoughts??
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 09/10/2020 19:26, Biju Das wrote:
Hi Biju,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 08/10/2020 20:05, Biju Das wrote: > Then on target, found that BL31 is failed to boot[3], which is > fixed by reverting the commit 75fab6496e5fce9a11 > ("libc: memset: improve performance by avoiding single byte > writes"), see the logs [3].
Mmmh, interesting. Can you build with "DEBUG=1" to get more output from BL31?
Sure Will do and provide feedback.
I see some calls to memset() from code in drivers/renesas/rcar. Can you add some debug prints at the top of memset() to dump the parameters on each call? To see what breaks it?
OK.
I believe commenting [1] fixed the issue which corresponds to memset
parameters[2].
Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't
have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory, what kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { - *ptr32++ = fill; + mmio_write_32((uintptr_t)ptr32, fill); + ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset() and with that it works fine. Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case here.
Cheers, Andre
Hi Andre,
I have found the issue. Please see inline.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve > performance by avoiding single byte writes") causing BL31 boot > failure on Renesas RZ/G2 platforms. > > On 08/10/2020 20:05, Biju Das wrote: >> Then on target, found that BL31 is failed to boot[3], which is >> fixed by reverting the commit 75fab6496e5fce9a11 >> ("libc: memset: improve performance by avoiding single byte >> writes"), see > the logs [3]. > > Mmmh, interesting. Can you build with "DEBUG=1" to get more > output from BL31?
Sure Will do and provide feedback.
> I see some calls to memset() from code in drivers/renesas/rcar. > Can you add some debug prints at the top of memset() to dump the > parameters on each call? To see what breaks it?
OK.
I believe commenting [1] fixed the issue which corresponds to memset
parameters[2].
Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't
have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory,
what
kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset() and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case here.
Issue is with your original commit[1], it is entering infinite loop. I commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/lib/libc/me...
Regards, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 09/10/2020 20:45, Biju Das wrote:
Hi,
I have found the issue. Please see inline.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve >> performance by avoiding single byte writes") causing BL31 boot >> failure on Renesas RZ/G2 platforms. >> >> On 08/10/2020 20:05, Biju Das wrote: >>> Then on target, found that BL31 is failed to boot[3], which is >>> fixed by reverting the commit 75fab6496e5fce9a11 >>> ("libc: memset: improve performance by avoiding single byte >>> writes"), see >> the logs [3]. >> >> Mmmh, interesting. Can you build with "DEBUG=1" to get more >> output from BL31? > > Sure Will do and provide feedback.
>> I see some calls to memset() from code in drivers/renesas/rcar. >> Can you add some debug prints at the top of memset() to dump the >> parameters on each call? To see what breaks it? > > OK.
I believe commenting [1] fixed the issue which corresponds to memset
parameters[2].
Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't
have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory,
what
kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset() and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case here.
Issue is with your original commit[1], it is entering infinite loop. I commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/lib/libc/me...
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked fine for me every time. Do you see the actual problem?
Can you send me the resulting binary: build/rcar/release/libc/memset.o? Just to check what your compiler made of it?
Cheers, Andre
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 09/10/2020 20:45, Biju Das wrote:
Hi,
I have found the issue. Please see inline.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
>>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve >>> performance by avoiding single byte writes") causing BL31 boot >>> failure on Renesas RZ/G2 platforms. >>> >>> On 08/10/2020 20:05, Biju Das wrote: >>>> Then on target, found that BL31 is failed to boot[3], which is >>>> fixed by reverting the commit 75fab6496e5fce9a11 >>>> ("libc: memset: improve performance by avoiding single byte >>>> writes"), see >>> the logs [3]. >>> >>> Mmmh, interesting. Can you build with "DEBUG=1" to get more >>> output from BL31? >> >> Sure Will do and provide feedback.
>>> I see some calls to memset() from code in drivers/renesas/rcar. >>> Can you add some debug prints at the top of memset() to dump >>> the parameters on each call? To see what breaks it? >> >> OK.
I believe commenting [1] fixed the issue which corresponds to memset
parameters[2].
Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to
improve the speed. I am not sure how it is impacting here. Currently I don't
have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory,
what
kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG
MAP_REGION_FLAT(RCAR_BL31_LOG_BASE,
\ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset() and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case
here.
Issue is with your original commit[1], it is entering infinite loop. I
commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/li b/libc/memset.c#n43
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked fine for me every time. Do you see the actual problem?
Can you send me the resulting binary: build/rcar/release/libc/memset.o? Just to check what your compiler made of it?
Here it is [1] https://gitlab.com/bijudas/debug
Just a suggestion, Why can't we use finite for loops with pointer arithmetic instead of the while loops? 1) first for loop with ptr64 and ptr64 +(remainingcount >> 3) and 2) second for loop with ptr and ptr + (remainingcount & 7)
Please see the code generated by compiler for working case and non working case.
Please see the line inserted by compiler for non working case. 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8
biju@biju-VirtualBox:~/work/test/trusted-firmware-a$ /usr/share/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -S build/rzg/release/libc/memset.o
build/rzg/release/libc/memset.o: file format elf64-littleaarch64
Disassembly of section .text.memset:
0000000000000000 <memset>: 0:b4000342 cbzx2, 68 <memset+0x68> 4:12001c21 andw1, w1, #0xff 8:aa0003e5 movx5, x0 c:f24008a6 andsx6, x5, #0x7 10:54000261 b.ne5c <memset+0x5c> // b.any 14:d3781c24 ubfizx4, x1, #8, #8 18:92401c23 andx3, x1, #0xff 1c:aa030083 orrx3, x4, x3 20:d2800004 movx4, #0x0 // #0 24:aa034063 orrx3, x3, x3, lsl #16 28:aa038063 orrx3, x3, x3, lsl #32 2c:cb040047 subx7, x2, x4 30:f1001cff cmpx7, #0x7 34:540001c8 b.hi6c <memset+0x6c> // b.pmore 38:d343fc43 lsrx3, x2, #3 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8 40:8b030ca5 addx5, x5, x3, lsl #3 44:9b040862 maddx2, x3, x4, x2 48:eb06005f cmpx2, x6 4c:540000e0 b.eq68 <memset+0x68> // b.none 50:382668a1 strbw1, [x5, x6] 54:910004c6 addx6, x6, #0x1 58:17fffffc b48 <memset+0x48> 5c:380014a1 strbw1, [x5], #1 60:f1000442 subsx2, x2, #0x1 64:54fffd41 b.nec <memset+0xc> // b.any 68:d65f03c0 ret 6c:f82468a3 strx3, [x5, x4] 70:91002084 addx4, x4, #0x8 74:17ffffee b2c <memset+0x2c>
Disassembly with commenting last while loop (Working case) --------------------------------------------
biju@biju-VirtualBox:~/work/test/trusted-firmware-a$ /usr/share/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -S memset.o
memset.o: file format elf64-littleaarch64
Disassembly of section .text.memset:
0000000000000000 <memset>: 0:b4000262 cbzx2, 4c <memset+0x4c> 4:12001c21 andw1, w1, #0xff 8:aa0003e3 movx3, x0 c:f2400864 andsx4, x3, #0x7 10:54000181 b.ne40 <memset+0x40> // b.any 14:92401c25 andx5, x1, #0xff 18:d3781c21 ubfizx1, x1, #8, #8 1c:aa050021 orrx1, x1, x5 20:aa014021 orrx1, x1, x1, lsl #16 24:aa018021 orrx1, x1, x1, lsl #32 28:cb040045 subx5, x2, x4 2c:f1001cbf cmpx5, #0x7 30:540000e9 b.ls4c <memset+0x4c> // b.plast 34:f8246861 strx1, [x3, x4] 38:91002084 addx4, x4, #0x8 3c:17fffffb b28 <memset+0x28> 40:38001461 strbw1, [x3], #1 44:f1000442 subsx2, x2, #0x1 48:54fffe21 b.nec <memset+0xc> // b.any 4c:d65f03c0 ret
Regards, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 10/10/2020 11:07, Biju Das wrote:
Hi Biju,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 09/10/2020 20:45, Biju Das wrote:
Hi,
I have found the issue. Please see inline.
On 09/10/2020 18:37, Biju Das wrote:
Hi Biju,
>>>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve >>>> performance by avoiding single byte writes") causing BL31 boot >>>> failure on Renesas RZ/G2 platforms. >>>> >>>> On 08/10/2020 20:05, Biju Das wrote: >>>>> Then on target, found that BL31 is failed to boot[3], which is >>>>> fixed by reverting the commit 75fab6496e5fce9a11 >>>>> ("libc: memset: improve performance by avoiding single byte >>>>> writes"), see >>>> the logs [3]. >>>> >>>> Mmmh, interesting. Can you build with "DEBUG=1" to get more >>>> output from BL31? >>> >>> Sure Will do and provide feedback. > >>>> I see some calls to memset() from code in drivers/renesas/rcar. >>>> Can you add some debug prints at the top of memset() to dump >>>> the parameters on each call? To see what breaks it? >>> >>> OK. > > I believe commenting [1] fixed the issue which corresponds to > memset parameters[2]. > Looks like calling rcar_log_init-->memset from [3] is causing the issue.
Many thanks for debugging this, that's helpful!
But without your patch it is working fine. Only thing is, you have added extra code to > improve the speed. I am not sure how it is impacting here. > Currently I don't have access to Lauterbach to debug further. Any thoughts ???
So this RCAR_BL31_LOG_BASE address is mapped as device memory,
what
kind of device is this? SRAM? Or something non-volatile, even? Even if it's SRAM, sometimes the controller managing this does not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG
MAP_REGION_FLAT(RCAR_BL31_LOG_BASE,
\ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
In general letting the compiler access device memory directly is asking for trouble, every access should go through MMIO accessors. Linux has memset_io() for those kind of problems.
Can you try the attached patch? That should downgrade the stores to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset() and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case
here.
Issue is with your original commit[1], it is entering infinite loop. I
commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/li b/libc/memset.c#n43
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked fine for me every time. Do you see the actual problem?
Can you send me the resulting binary: build/rcar/release/libc/memset.o? Just to check what your compiler made of it?
Here it is [1] https://gitlab.com/bijudas/debug
Thanks, that is actually identical to what my compiler produced, so that means toolchain-wise we are on the same page.
Just a suggestion, Why can't we use finite for loops with pointer arithmetic instead of the while loops?
- first for loop with ptr64 and ptr64 +(remainingcount >> 3) and
- second for loop with ptr and ptr + (remainingcount & 7)
But what would that change?
Please see the code generated by compiler for working case and non working case.
Just to clarify: with "working" you mean "commenting the last while loop"? Which breaks the algorithm, as we would lose the tail bytes to be filled, in case the end of the to-be-filled region is not 64-bit aligned? So it's not an option to just skip that. It just happens to work in your case because the end *is* 64-bit aligned, so the final loop is not needed.
And besides, I still don't see the problem. I stepped through the disassembly, with (<aligned ptr>, 0, 81840) for the parameters and don't see why there would be a problem. Once it's done with the 64-bit write section, both x2 and x6 are 0, so the cmp at 0x48 sets the Z bit, and "b.eq 68" branches to the ret. So the whole final loop is skipped, as intended.
Please see the line inserted by compiler for non working case. 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8
Yes, that has a particular purpose (which is not very obvious because this is -Os optimised): It is used to *subtract* the already filled bytes from the original length, to then serve as the limit for the final while loop - which is actually turned in a for loop in assembly! There is no msub in AArch64, so the compiler uses madd and makes one of the factors negative.
So, can you please double check that: memset(0x44040010, 0, 81840); is really the call that hangs your system? And that it really does because so it's being stuck in the last while loop?
And can you make this RCAR_BL31_LOG_BASE region MT_NORMAL memory and see if the problem still persists?
biju@biju-VirtualBox:~/work/test/trusted-firmware-a$ /usr/share/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -S build/rzg/release/libc/memset.o
build/rzg/release/libc/memset.o: file format elf64-littleaarch64
Disassembly of section .text.memset:
0000000000000000 <memset>: 0:b4000342 cbzx2, 68 <memset+0x68> 4:12001c21 andw1, w1, #0xff 8:aa0003e5 movx5, x0 c:f24008a6 andsx6, x5, #0x7 10:54000261 b.ne5c <memset+0x5c> // b.any 14:d3781c24 ubfizx4, x1, #8, #8 18:92401c23 andx3, x1, #0xff 1c:aa030083 orrx3, x4, x3 20:d2800004 movx4, #0x0 // #0 24:aa034063 orrx3, x3, x3, lsl #16 28:aa038063 orrx3, x3, x3, lsl #32 2c:cb040047 subx7, x2, x4 30:f1001cff cmpx7, #0x7 34:540001c8 b.hi6c <memset+0x6c> // b.pmore 38:d343fc43 lsrx3, x2, #3 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8 40:8b030ca5 addx5, x5, x3, lsl #3 44:9b040862 maddx2, x3, x4, x2 48:eb06005f cmpx2, x6 4c:540000e0 b.eq68 <memset+0x68> // b.none 50:382668a1 strbw1, [x5, x6] 54:910004c6 addx6, x6, #0x1 58:17fffffc b48 <memset+0x48> 5c:380014a1 strbw1, [x5], #1 60:f1000442 subsx2, x2, #0x1 64:54fffd41 b.nec <memset+0xc> // b.any 68:d65f03c0 ret 6c:f82468a3 strx3, [x5, x4] 70:91002084 addx4, x4, #0x8 74:17ffffee b2c <memset+0x2c>
Disassembly with commenting last while loop (Working case)
biju@biju-VirtualBox:~/work/test/trusted-firmware-a$ /usr/share/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-objdump -S memset.o
memset.o: file format elf64-littleaarch64
Disassembly of section .text.memset:
0000000000000000 <memset>: 0:b4000262 cbzx2, 4c <memset+0x4c> 4:12001c21 andw1, w1, #0xff 8:aa0003e3 movx3, x0 c:f2400864 andsx4, x3, #0x7 10:54000181 b.ne40 <memset+0x40> // b.any 14:92401c25 andx5, x1, #0xff 18:d3781c21 ubfizx1, x1, #8, #8 1c:aa050021 orrx1, x1, x5 20:aa014021 orrx1, x1, x1, lsl #16 24:aa018021 orrx1, x1, x1, lsl #32 28:cb040045 subx5, x2, x4 2c:f1001cbf cmpx5, #0x7 30:540000e9 b.ls4c <memset+0x4c> // b.plast 34:f8246861 strx1, [x3, x4] 38:91002084 addx4, x4, #0x8 3c:17fffffb b28 <memset+0x28> 40:38001461 strbw1, [x3], #1 44:f1000442 subsx2, x2, #0x1 48:54fffe21 b.nec <memset+0xc> // b.any 4c:d65f03c0 ret
Sure, having no tail loop simplifies the algorithm, but is wrong. So what is the point here?
Cheers, Andre
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
> >>>>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: >>>>> improve performance by avoiding single byte writes") causing >>>>> BL31 boot failure on Renesas RZ/G2 platforms. >>>>> >>>>> On 08/10/2020 20:05, Biju Das wrote: >>>>>> Then on target, found that BL31 is failed to boot[3], which >>>>>> is fixed by reverting the commit 75fab6496e5fce9a11 >>>>>> ("libc: memset: improve performance by avoiding single byte >>>>>> writes"), see >>>>> the logs [3]. >>>>> >>>>> Mmmh, interesting. Can you build with "DEBUG=1" to get more >>>>> output from BL31? >>>> >>>> Sure Will do and provide feedback. >> >>>>> I see some calls to memset() from code in drivers/renesas/rcar. >>>>> Can you add some debug prints at the top of memset() to
dump
>>>>> the parameters on each call? To see what breaks it? >>>> >>>> OK. >> >> I believe commenting [1] fixed the issue which corresponds to >> memset > parameters[2]. >> Looks like calling rcar_log_init-->memset from [3] is causing the
issue.
> > Many thanks for debugging this, that's helpful! > > But without your patch it is working fine. Only thing is, you > have added extra code to >> improve the speed. I am not sure how it is impacting here. >> Currently I don't > have access to Lauterbach to debug further. Any thoughts ??? > > So this RCAR_BL31_LOG_BASE address is mapped as device
memory,
what
> kind of device is this? SRAM? Or something non-volatile, even? > Even if it's SRAM, sometimes the controller managing this does > not support all AMBA accesses (hence it's a device, not memory).
It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG
MAP_REGION_FLAT(RCAR_BL31_LOG_BASE,
\ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
> In general letting the compiler access device memory directly is > asking for trouble, every access should go through MMIO accessors. > Linux has > memset_io() for those kind of problems. > > Can you try the attached patch? That should downgrade the stores > to 32-bit ones. Not a solution, but would help to isolate the problem.
Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
However I copied the same code emmc_memset to rcar_memset()
and
with that it works fine.
Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case
here.
Issue is with your original commit[1], it is entering infinite loop. I
commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/ li b/libc/memset.c#n43
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked
fine for me every time.
Do you see the actual problem?
Can you send me the resulting binary: build/rcar/release/libc/memset.o? Just to check what your compiler made of it?
Here it is [1] https://gitlab.com/bijudas/debug
Thanks, that is actually identical to what my compiler produced, so that means toolchain-wise we are on the same page.
Just a suggestion, Why can't we use finite for loops with pointer arithmetic
instead of the while loops?
- first for loop with ptr64 and ptr64 +(remainingcount >> 3) and
- second for loop with ptr and ptr + (remainingcount & 7)
But what would that change?
Hmm. The result same, it didn't fix the issue.
Please see the code generated by compiler for working case and non
working case.
Just to clarify: with "working" you mean "commenting the last while loop"?
Yes. I just want to highlight that some kind of issue is happening just before the while loop.
Which breaks the algorithm, as we would lose the tail bytes to be filled, in case the end of the to-be-filled region is not 64-bit aligned? So it's not an option to just skip that. It just happens to work in your case because the end *is* 64-bit aligned, so the final loop is not needed.
Agreed.
And besides, I still don't see the problem. I stepped through the disassembly, with (<aligned ptr>, 0, 81840) for the parameters and don't see why there would be a problem. Once it's done with the 64-bit write section, both x2 and x6 are 0, so the cmp at 0x48 sets the Z bit, and "b.eq 68" branches to the ret. So the whole final loop is skipped, as intended.
Please see the line inserted by compiler for non working case. 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8
Yes, that has a particular purpose (which is not very obvious because this is - Os optimised): It is used to *subtract* the already filled bytes from the original length, to then serve as the limit for the final while loop - which is actually turned in a for loop in assembly! There is no msub in AArch64, so the compiler uses madd and makes one of the factors negative.
So, can you please double check that: memset(0x44040010, 0, 81840); is really the call that hangs your system?
Yes, This is called from console.S-->rcar_log_int-->memset
And that it really does because so it's being stuck in the last while loop?
After further investigation, not sure it is related to while loop or some kind of corruption?, since I don't have access to JTAG further debugging. Mostly some kind of corruption leading to crash.
The new code introduced, some increased code size, which is causing bl31 boot failure, when we call memset from console.S
Case 1) console.S -->rcar_log_int -->memset --> bl31 not booting.
Case 2) If I comment, rcar_log_init from console.S --> bl31 boots fine.
Case 3) If I comment, rcar_log_init from console.S and call rcar_log_init from bl31_early_platform_setup2 --> bl31 boots fine.
Any thoughts?
And can you make this RCAR_BL31_LOG_BASE region MT_NORMAL memory and see if the problem still persists?
Yes , still the problem persists. The current mapping "MT_DEVICE" is as per "Use of coherent memory in TF-A" section as mentioned in [1]. By changing the region as MT_NORMAL also didn't help , see the patch [2]. Please share your opinion on this?
[1] https://github.com/ARM-software/arm-trusted-firmware/blob/master/docs/design... [2] diff --git a/include/lib/xlat_tables/xlat_tables_v2.h b/include/lib/xlat_tables/xlat_tables_v2.h index 359b9839a..c50727a42 100644 --- a/include/lib/xlat_tables/xlat_tables_v2.h +++ b/include/lib/xlat_tables/xlat_tables_v2.h @@ -86,6 +86,7 @@ #define MT_DEVICE U(0) #define MT_NON_CACHEABLE U(1) #define MT_MEMORY U(2) +#define MT_NORMAL U(4) /* Values up to 7 are reserved to add new memory types in the future */
#define MT_RO (U(0) << MT_PERM_SHIFT) diff --git a/plat/renesas/rcar/aarch64/platform_common.c b/plat/renesas/rcar/aarch64/platform_common.c index b0a88cb6b..744c79c98 100644 --- a/plat/renesas/rcar/aarch64/platform_common.c +++ b/plat/renesas/rcar/aarch64/platform_common.c @@ -64,7 +64,7 @@ const uint8_t version_of_renesas[VERSION_OF_RENESAS_MAXLEN]
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \ - MT_DEVICE | MT_RW | MT_SECURE) + MT_NORMAL | MT_RW | MT_SECURE)
Sure, having no tail loop simplifies the algorithm, but is wrong. So what is the point here?
Agreed, I just want to give some pointers related to this issue, so that we can narrow down the issue. that's all.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
Hi Andre,
Looks like the issue is related to unhandled Exception in EL3. I have enabled serial prints in BL31, that is pointing to this. Please see the below logs. Any ideas??
INFO: ARM GICv2 driver initialized NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BL2: FDT at 0xe63236c8 NOTICE: BL2: v2.3(release):v2.3-786-ge62d7c7e9-dirty NOTICE: BL2: Built : 14:06:08, Oct 11 2020 NOTICE: BL2: Normal boot INFO: BL2: Doing platform setup INFO: BL2: Loading image id 3 NOTICE: BL2: dst=0xe6323300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) WARNING: r-car ignoring the BL31 size from certificate,using RCAR_TRUSTED_SRAM_SIZE instead INFO: Loading image id=3 at address 0x44000000 NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) INFO: Image id=3 loaded: 0x44000000 - 0x4403e000 INFO: BL2: Loading image id 5 INFO: Loading image id=5 at address 0x50000000 NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) INFO: Image id=5 loaded: 0x50000000 - 0x50100000 NOTICE: BL2: Booting BL31 INFO: Entry point address = 0x44000000 INFO: SPSR = 0x3cd Unhandled Exception in EL3. x30 = 0x0000000000000000 x0 = 0x0000000000000001 x1 = 0x0000000000000000 x2 = 0x0000000000000000 x3 = 0x0000000000000004 x4 = 0x0000000000000047 x5 = 0x0000000044053fc0 x6 = 0x0000000000000000 x7 = 0x0000000000000000 x8 = 0x00000000e6322218 x9 = 0x00000000440000e8 x10 = 0x000000004400546c x11 = 0x00000000e63220c8 x12 = 0x00000000000000b8 x13 = 0x00000000000000a4 x14 = 0x00000000e63234d8 x15 = 0x00000000e63236c8 x16 = 0x00000000e630e730 x17 = 0x0000000044005010 x18 = 0x0000000000000013 x19 = 0x000000004400003c x20 = 0x0000000000000000 x21 = 0x0000000000000000 x22 = 0x0000000000000000 x23 = 0x0000000000000000 x24 = 0x0000000000000000 x25 = 0x0000000000000000 x26 = 0x0000000000000000 x27 = 0x0000000000000000 x28 = 0x0000000000000000 x29 = 0x00000000440084d0 scr_el3 = 0x0000000000000238 sctlr_el3 = 0x0000000030c5183a cptr_el3 = 0x0000000000000000 tcr_el3 = 0x0000000000000000 daif = 0x00000000000003c0 mair_el3 = 0x44e048e000098aa4 spsr_el3 = 0x00000000800002cc elr_el3 = 0x0000000000000000 ttbr0_el3 = 0x0000000000000000 esr_el3 = 0x0000000002000000 far_el3 = 0x0000000000000000 spsr_el1 = 0x0000000000000000 elr_el1 = 0x0000000000000000 spsr_abt = 0x0000000000000010 spsr_und = 0x0000000000000010 spsr_irq = 0x0000000000000010 spsr_fiq = 0x0000000000000010 sctlr_el1 = 0x0000000000c50838 actlr_el1 = 0x0000000000000000 cpacr_el1 = 0x0000000000300000 csselr_el1 = 0x0000000000000000 sp_el1 = 0x00000000e6360000 esr_el1 = 0x0000000000000000 ttbr0_el1 = 0x0000000000000000 ttbr1_el1 = 0x0000000000000000 mair_el1 = 0x44e048e000098aa4 amair_el1 = 0x0000000000000000 tcr_el1 = 0x0000000000000000 tpidr_el1 = 0x0000000000000000 tpidr_el0 = 0x0000000000000000 tpidrro_el0 = 0x0000000000000000 par_el1 = 0x0000000000000000 mpidr_el1 = 0x0000000080000000 afsr0_el1 = 0x0000000000000000 afsr1_el1 = 0x0000000000000000 contextidr_el1 = 0x0000000000000000 vbar_el1 = 0x0000000000000000 cntp_ctl_el0 = 0x0000000000000000 cntp_cval_el0 = 0x0000000000000000 cntv_ctl_el0 = 0x0000000000000000 cntv_cval_el0 = 0x0000000000000000 cntkctl_el1 = 0x0000000000000000 sp_el0 = 0x00000000440084d0 isr_el1 = 0x0000000000000000 dacr32_el2 = 0x0000000055555555 ifsr32_el2 = 0x0000000000000000 cpuectlr_el1 = 0x0000001b00000040 cpumerrsr_el1 = 0x0000000000000000 l2merrsr_el1 = 0x0000000000000000 gicc_hppir = 0x00000000000003ff gicc_ahppir = 0x00000000000003ff gicc_ctlr = 0x0000000000000000 gicd_ispendr regs (Offsets 0x200 - 0x278) Offset: value 0000000000000200: 0x0000000000000000 0000000000000208: 0x0000000000000000 0000000000000210: 0x0000000000000000 0000000000000218: 0x0000000000000000 0000000000000220: 0x0000000000000000 0000000000000228: 0x0000000000000000 0000000000000230: 0x0000000000000000 0000000000000238: 0x0000000000000000 0000000000000240: 0x0000000000000000 0000000000000248: 0x0000000000000000 0000000000000250: 0x0000000000000000 0000000000000258: 0x0000000000000000 0000000000000260: 0x0000000000000000 0000000000000268: 0x0000000000000000 0000000000000270: 0x0000000000000000 0000000000000278: 0x0000000000000000 cci_snoop_ctrl_cluster0x10000000000000000 cci_snoop_ctrl_cluster1x10000000000000000
Working log by commenting memset in rcar_log_init ------------------------------------------------------------------------
INFO: ARM GICv2 driver initialized NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BLDEBUG:dst=0xe63234b8, val = 0, count=80 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BLDEBUG:dst=0xe63236c8, val = 0, count=4096 NOTICE: BLDEBUG:dst=0xe63234b8, val = 0, count=80 NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BLDEBUG:dst=0xe6323760, val = 0, count=20 NOTICE: BLDEBUG:dst=0xe6323760, val = 0, count=16 NOTICE: BL2: FDT at 0xe63236c8 NOTICE: BL2: v2.3(release):v2.3-787-g662de50a8-dirty NOTICE: BL2: Built : 13:08:51, Oct 11 2020 NOTICE: BL2: Normal boot INFO: BL2: Doing platform setup INFO: BL2: Loading image id 3 NOTICE: BLDEBUG:dst=0xe6323300, val = 0, count=512 NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 NOTICE: BL2: dst=0xe6323300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 WARNING: r-car ignoring the BL31 size from certificate,using RCAR_TRUSTED_SRAM_SIZE instead INFO: Loading image id=3 at address 0x44000000 NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 INFO: Image id=3 loaded: 0x44000000 - 0x4403e000 NOTICE: BLDEBUG:dst=0xe6324740, val = 0, count=40 NOTICE: BLDEBUG:dst=0x4403e200, val = 0, count=224 INFO: BL2: Loading image id 5 NOTICE: BLDEBUG:dst=0xe6323300, val = 0, count=512 NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 INFO: Loading image id=5 at address 0x50000000 NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 INFO: Image id=5 loaded: 0x50000000 - 0x50100000 NOTICE: BLDEBUG:dst=0xe6324740, val = 0, count=40 NOTICE: BL2: Booting BL31 INFO: Entry point address = 0x44000000 INFO: SPSR = 0x3cd NOTICE: BL3-1 : Rev.2.0.6 NOTICE: BL31: v2.3(release):v2.3-787-g662de50a8-dirty NOTICE: BL31: Built : 13:08:55, Oct 11 2020 INFO: ARM GICv2 driver initialized INFO: BL31: Initializing runtime services NOTICE: BLDEBUG:dst=0x4400b9b0, val = -1, count=128 INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x50000000 INFO: SPSR = 0x3c5
U-Boot 2020.10-00535-gd0e9210a2f (Oct 08 2020 - 12:34:09 +0100)
CPU: Renesas Electronics R8A774A1 rev 1.3 Model: HopeRun HiHope RZ/G2M with sub board DRAM: 3.9 GiB MMC: mmc@ee100000: 0, mmc@ee160000: 1 Loading Environment from MMC... OK In: serial@e6e88000 Out: serial@e6e88000 Err: serial@e6e88000 Net: eth0: ethernet@e6800000 Hit any key to stop autoboot: 0
Regards, Biju
-----Original Message----- From: Biju Das Sent: 11 October 2020 11:01 To: André Przywara andre.przywara@arm.com; TF- A@lists.trustedfirmware.org; Madhukar Pappireddy madhukar.pappireddy@arm.com; Joanna.Farley@arm.com Cc: Chris Paterson Chris.Paterson2@renesas.com; Prabhakar Mahadev Lad prabhakar.mahadev-lad.rj@bp.renesas.com; Marek Vasut marek.vasut@gmail.com; Marek Vasut marek.vasut+renesas@gmail.com Subject: RE: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
>> >>>>>> Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: >>>>>> improve performance by avoiding single byte writes") >>>>>> causing >>>>>> BL31 boot failure on Renesas RZ/G2 platforms. >>>>>> >>>>>> On 08/10/2020 20:05, Biju Das wrote: >>>>>>> Then on target, found that BL31 is failed to boot[3], >>>>>>> which is fixed by reverting the commit 75fab6496e5fce9a11 >>>>>>> ("libc: memset: improve performance by avoiding single >>>>>>> byte writes"), see >>>>>> the logs [3]. >>>>>> >>>>>> Mmmh, interesting. Can you build with "DEBUG=1" to get
more
>>>>>> output from BL31? >>>>> >>>>> Sure Will do and provide feedback. >>> >>>>>> I see some calls to memset() from code in
drivers/renesas/rcar.
>>>>>> Can you add some debug prints at the top of memset() to
dump
>>>>>> the parameters on each call? To see what breaks it? >>>>> >>>>> OK. >>> >>> I believe commenting [1] fixed the issue which corresponds to >>> memset >> parameters[2]. >>> Looks like calling rcar_log_init-->memset from [3] is causing >>> the
issue.
>> >> Many thanks for debugging this, that's helpful! >> >> But without your patch it is working fine. Only thing is, you >> have added extra code to >>> improve the speed. I am not sure how it is impacting here. >>> Currently I don't >> have access to Lauterbach to debug further. Any thoughts ??? >> >> So this RCAR_BL31_LOG_BASE address is mapped as device
memory,
what >> kind of device is this? SRAM? Or something non-volatile, even? >> Even if it's SRAM, sometimes the controller managing this does >> not support all AMBA accesses (hence it's a device, not memory). > > It is DRAM only, with Security protected attribute.
Mmh, interesting. Why is it then mapped as device?
#define MAP_ATFW_LOG
MAP_REGION_FLAT(RCAR_BL31_LOG_BASE,
\ RCAR_BL31_LOG_SIZE, \ MT_DEVICE | MT_RW | MT_SECURE)
You could try using MT_NORMAL instead. Or ...
>> In general letting the compiler access device memory directly >> is asking for trouble, every access should go through MMIO
accessors.
>> Linux has >> memset_io() for those kind of problems. >> >> Can you try the attached patch? That should downgrade the >> stores to 32-bit ones. Not a solution, but would help to isolate the
problem.
> > Patch didn't fix the issue.
I see, bummer. Does it work if you apply this on top?
diff --git a/lib/libc/memset.c b/lib/libc/memset.c index 5ad6ede51..7478871e3 100644 --- a/lib/libc/memset.c +++ b/lib/libc/memset.c @@ -7,6 +7,7 @@ #include <stddef.h> #include <string.h> #include <stdint.h> +#include <lib/mmio.h>
void *memset(void *dst, int val, size_t count) { @@ -34,7 +35,8 @@ void *memset(void *dst, int val, size_t count) /* Use 32-bit writes for as long as possible. */ ptr32 = (void *)ptr; for (; count >= 4; count -= 4) { -*ptr32++ = fill; +mmio_write_32((uintptr_t)ptr32, fill); +ptr32++; }
/* Handle the remaining part byte-per-byte. */
> However I copied the same code emmc_memset to rcar_memset()
and
with that it works fine. > Looks like byte access works OK. Any thoughts??
That's weird, I just see "str w1, [x3, x5]" in the disassembly, and since everything is nicely aligned, it should actually work. That is still with the note that letting the compiler access device memory is wrong, but the generated code is actually very similar in this special case
here.
Issue is with your original commit[1], it is entering infinite loop. I
commented that while loop and it is fine.
[1]https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tre e/ li b/libc/memset.c#n43
That's very weird. I stared at the code for a while, and can't see the issue. I also put it through my unit tests again, and tried to mimic your special case (clearing 81840 bytes at 0xyyyy0010). That worked
fine for me every time.
Do you see the actual problem?
Can you send me the resulting binary:
build/rcar/release/libc/memset.o?
Just to check what your compiler made of it?
Here it is [1] https://gitlab.com/bijudas/debug
Thanks, that is actually identical to what my compiler produced, so that means toolchain-wise we are on the same page.
Just a suggestion, Why can't we use finite for loops with pointer arithmetic
instead of the while loops?
- first for loop with ptr64 and ptr64 +(remainingcount >> 3) and
- second for loop with ptr and ptr + (remainingcount & 7)
But what would that change?
Hmm. The result same, it didn't fix the issue.
Please see the code generated by compiler for working case and non
working case.
Just to clarify: with "working" you mean "commenting the last while loop"?
Yes. I just want to highlight that some kind of issue is happening just before the while loop.
Which breaks the algorithm, as we would lose the tail bytes to be filled, in case the end of the to-be-filled region is not 64-bit aligned? So it's not an option to just skip that. It just happens to work in your case because the end *is* 64-bit aligned, so the final loop is not needed.
Agreed.
And besides, I still don't see the problem. I stepped through the disassembly, with (<aligned ptr>, 0, 81840) for the parameters and don't see why there would be a problem. Once it's done with the 64-bit write section, both x2 and x6 are 0, so the cmp at 0x48 sets the Z bit, and "b.eq 68" branches to the
ret.
So the whole final loop is skipped, as intended.
Please see the line inserted by compiler for non working case. 3c:928000e4 movx4, #0xfffffffffffffff8 // #-8
Yes, that has a particular purpose (which is not very obvious because this is - Os optimised): It is used to *subtract* the already filled bytes from the original length, to then serve as the limit for the final while loop - which is actually turned in a for loop in assembly! There is no msub in AArch64, so the compiler uses madd and makes one of the factors negative.
So, can you please double check that: memset(0x44040010, 0, 81840); is really the call that hangs your system?
Yes, This is called from console.S-->rcar_log_int-->memset
And that it really does because so it's being stuck in the last while loop?
After further investigation, not sure it is related to while loop or some kind of corruption?, since I don't have access to JTAG further debugging. Mostly some kind of corruption leading to crash.
The new code introduced, some increased code size, which is causing bl31 boot failure, when we call memset from console.S
Case 1) console.S -->rcar_log_int -->memset --> bl31 not booting.
Case 2) If I comment, rcar_log_init from console.S --> bl31 boots fine.
Case 3) If I comment, rcar_log_init from console.S and call rcar_log_init from bl31_early_platform_setup2 --> bl31 boots fine.
Any thoughts?
And can you make this RCAR_BL31_LOG_BASE region MT_NORMAL
memory and
see if the problem still persists?
Yes , still the problem persists. The current mapping "MT_DEVICE" is as per "Use of coherent memory in TF- A" section as mentioned in [1]. By changing the region as MT_NORMAL also didn't help , see the patch [2]. Please share your opinion on this?
[1] https://github.com/ARM-software/arm-trusted- firmware/blob/master/docs/design/firmware-design.rst [2] diff --git a/include/lib/xlat_tables/xlat_tables_v2.h b/include/lib/xlat_tables/xlat_tables_v2.h index 359b9839a..c50727a42 100644 --- a/include/lib/xlat_tables/xlat_tables_v2.h +++ b/include/lib/xlat_tables/xlat_tables_v2.h @@ -86,6 +86,7 @@ #define MT_DEVICE U(0) #define MT_NON_CACHEABLE U(1) #define MT_MEMORY U(2) +#define MT_NORMAL U(4) /* Values up to 7 are reserved to add new memory types in the future */
#define MT_RO (U(0) << MT_PERM_SHIFT) diff --git a/plat/renesas/rcar/aarch64/platform_common.c b/plat/renesas/rcar/aarch64/platform_common.c index b0a88cb6b..744c79c98 100644 --- a/plat/renesas/rcar/aarch64/platform_common.c +++ b/plat/renesas/rcar/aarch64/platform_common.c @@ -64,7 +64,7 @@ const uint8_t version_of_renesas[VERSION_OF_RENESAS_MAXLEN]
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \
MT_DEVICE | MT_RW | MT_SECURE)
MT_NORMAL | MT_RW | MT_SECURE)
Sure, having no tail loop simplifies the algorithm, but is wrong. So what is the point here?
Agreed, I just want to give some pointers related to this issue, so that we can narrow down the issue. that's all.
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 11/10/2020 14:13, Biju Das wrote:
Hi Andre,
Looks like the issue is related to unhandled Exception in EL3. I have enabled serial prints in BL31, that is pointing to this. Please see the below logs. Any ideas??
INFO: ARM GICv2 driver initialized NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BL2: FDT at 0xe63236c8 NOTICE: BL2: v2.3(release):v2.3-786-ge62d7c7e9-dirty NOTICE: BL2: Built : 14:06:08, Oct 11 2020 NOTICE: BL2: Normal boot INFO: BL2: Doing platform setup INFO: BL2: Loading image id 3 NOTICE: BL2: dst=0xe6323300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) WARNING: r-car ignoring the BL31 size from certificate,using RCAR_TRUSTED_SRAM_SIZE instead INFO: Loading image id=3 at address 0x44000000 NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) INFO: Image id=3 loaded: 0x44000000 - 0x4403e000 INFO: BL2: Loading image id 5 INFO: Loading image id=5 at address 0x50000000 NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) INFO: Image id=5 loaded: 0x50000000 - 0x50100000 NOTICE: BL2: Booting BL31 INFO: Entry point address = 0x44000000 INFO: SPSR = 0x3cd Unhandled Exception in EL3.
So this is happening before the first output of BL31?
Unfortunately the information in the crash dump below is not very helpful, other than some NULL pointer sneaking into the PC somehow (ELR_EL3 is NULL). But LR is 0 as well, and ESR says "unknown reason". You could go through the list of possible situations listed in the ARMv8 ARM, D13.2.38 "ESR_EL3, Exception Syndrome Register (EL3)", under "ISS encoding for exceptions with an unknown reason".
Cheers, Andre.
x30 = 0x0000000000000000 x0 = 0x0000000000000001 x1 = 0x0000000000000000 x2 = 0x0000000000000000 x3 = 0x0000000000000004 x4 = 0x0000000000000047 x5 = 0x0000000044053fc0 x6 = 0x0000000000000000 x7 = 0x0000000000000000 x8 = 0x00000000e6322218 x9 = 0x00000000440000e8 x10 = 0x000000004400546c x11 = 0x00000000e63220c8 x12 = 0x00000000000000b8 x13 = 0x00000000000000a4 x14 = 0x00000000e63234d8 x15 = 0x00000000e63236c8 x16 = 0x00000000e630e730 x17 = 0x0000000044005010 x18 = 0x0000000000000013 x19 = 0x000000004400003c x20 = 0x0000000000000000 x21 = 0x0000000000000000 x22 = 0x0000000000000000 x23 = 0x0000000000000000 x24 = 0x0000000000000000 x25 = 0x0000000000000000 x26 = 0x0000000000000000 x27 = 0x0000000000000000 x28 = 0x0000000000000000 x29 = 0x00000000440084d0 scr_el3 = 0x0000000000000238 sctlr_el3 = 0x0000000030c5183a cptr_el3 = 0x0000000000000000 tcr_el3 = 0x0000000000000000 daif = 0x00000000000003c0 mair_el3 = 0x44e048e000098aa4 spsr_el3 = 0x00000000800002cc elr_el3 = 0x0000000000000000 ttbr0_el3 = 0x0000000000000000 esr_el3 = 0x0000000002000000 far_el3 = 0x0000000000000000 spsr_el1 = 0x0000000000000000 elr_el1 = 0x0000000000000000 spsr_abt = 0x0000000000000010 spsr_und = 0x0000000000000010 spsr_irq = 0x0000000000000010 spsr_fiq = 0x0000000000000010 sctlr_el1 = 0x0000000000c50838 actlr_el1 = 0x0000000000000000 cpacr_el1 = 0x0000000000300000 csselr_el1 = 0x0000000000000000 sp_el1 = 0x00000000e6360000 esr_el1 = 0x0000000000000000 ttbr0_el1 = 0x0000000000000000 ttbr1_el1 = 0x0000000000000000 mair_el1 = 0x44e048e000098aa4 amair_el1 = 0x0000000000000000 tcr_el1 = 0x0000000000000000 tpidr_el1 = 0x0000000000000000 tpidr_el0 = 0x0000000000000000 tpidrro_el0 = 0x0000000000000000 par_el1 = 0x0000000000000000 mpidr_el1 = 0x0000000080000000 afsr0_el1 = 0x0000000000000000 afsr1_el1 = 0x0000000000000000 contextidr_el1 = 0x0000000000000000 vbar_el1 = 0x0000000000000000 cntp_ctl_el0 = 0x0000000000000000 cntp_cval_el0 = 0x0000000000000000 cntv_ctl_el0 = 0x0000000000000000 cntv_cval_el0 = 0x0000000000000000 cntkctl_el1 = 0x0000000000000000 sp_el0 = 0x00000000440084d0 isr_el1 = 0x0000000000000000 dacr32_el2 = 0x0000000055555555 ifsr32_el2 = 0x0000000000000000 cpuectlr_el1 = 0x0000001b00000040 cpumerrsr_el1 = 0x0000000000000000 l2merrsr_el1 = 0x0000000000000000 gicc_hppir = 0x00000000000003ff gicc_ahppir = 0x00000000000003ff gicc_ctlr = 0x0000000000000000 gicd_ispendr regs (Offsets 0x200 - 0x278) Offset: value 0000000000000200: 0x0000000000000000 0000000000000208: 0x0000000000000000 0000000000000210: 0x0000000000000000 0000000000000218: 0x0000000000000000 0000000000000220: 0x0000000000000000 0000000000000228: 0x0000000000000000 0000000000000230: 0x0000000000000000 0000000000000238: 0x0000000000000000 0000000000000240: 0x0000000000000000 0000000000000248: 0x0000000000000000 0000000000000250: 0x0000000000000000 0000000000000258: 0x0000000000000000 0000000000000260: 0x0000000000000000 0000000000000268: 0x0000000000000000 0000000000000270: 0x0000000000000000 0000000000000278: 0x0000000000000000 cci_snoop_ctrl_cluster0x10000000000000000 cci_snoop_ctrl_cluster1x10000000000000000
Working log by commenting memset in rcar_log_init
INFO: ARM GICv2 driver initialized NOTICE: BL2: RZ/G Initial Program Loader(CA57) Rev.2.0.6 NOTICE: BL2: PRR is RZG G2M Ver.1.3 NOTICE: BLDEBUG:dst=0xe63234b8, val = 0, count=80 NOTICE: BL2: Board is HiHope RZ/G2M Rev.4.0 NOTICE: BL2: Boot device is QSPI Flash(40MHz) NOTICE: BL2: LCM state is unknown NOTICE: BL2: DDR3200(rev.0.40) NOTICE: BL2: [COLD_BOOT] NOTICE: BL2: DRAM Split is 2ch NOTICE: BL2: QoS is default setting(rev.0.19) NOTICE: BL2: DRAM refresh interval 1.95 usec NOTICE: BL2: Periodic Write DQ Training NOTICE: BLDEBUG:dst=0xe63236c8, val = 0, count=4096 NOTICE: BLDEBUG:dst=0xe63234b8, val = 0, count=80 NOTICE: BL2: CH0: 400000000 - 47fffffff, 2 GiB NOTICE: BL2: CH2: 600000000 - 67fffffff, 2 GiB NOTICE: BLDEBUG:dst=0xe6323760, val = 0, count=20 NOTICE: BLDEBUG:dst=0xe6323760, val = 0, count=16 NOTICE: BL2: FDT at 0xe63236c8 NOTICE: BL2: v2.3(release):v2.3-787-g662de50a8-dirty NOTICE: BL2: Built : 13:08:51, Oct 11 2020 NOTICE: BL2: Normal boot INFO: BL2: Doing platform setup INFO: BL2: Loading image id 3 NOTICE: BLDEBUG:dst=0xe6323300, val = 0, count=512 NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 NOTICE: BL2: dst=0xe6323300 src=0x8180000 len=512(0x200) NOTICE: BL2: dst=0x43f00000 src=0x8180400 len=6144(0x1800) NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 WARNING: r-car ignoring the BL31 size from certificate,using RCAR_TRUSTED_SRAM_SIZE instead INFO: Loading image id=3 at address 0x44000000 NOTICE: rcar_file_len: len: 0x0003e000 NOTICE: BL2: dst=0x44000000 src=0x81c0000 len=253952(0x3e000) NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 INFO: Image id=3 loaded: 0x44000000 - 0x4403e000 NOTICE: BLDEBUG:dst=0xe6324740, val = 0, count=40 NOTICE: BLDEBUG:dst=0x4403e200, val = 0, count=224 INFO: BL2: Loading image id 5 NOTICE: BLDEBUG:dst=0xe6323300, val = 0, count=512 NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 INFO: Loading image id=5 at address 0x50000000 NOTICE: rcar_file_len: len: 0x00100000 NOTICE: BL2: dst=0x50000000 src=0x8300000 len=1048576(0x100000) NOTICE: BLDEBUG:dst=0xe6324728, val = 0, count=24 INFO: Image id=5 loaded: 0x50000000 - 0x50100000 NOTICE: BLDEBUG:dst=0xe6324740, val = 0, count=40 NOTICE: BL2: Booting BL31 INFO: Entry point address = 0x44000000 INFO: SPSR = 0x3cd NOTICE: BL3-1 : Rev.2.0.6 NOTICE: BL31: v2.3(release):v2.3-787-g662de50a8-dirty NOTICE: BL31: Built : 13:08:55, Oct 11 2020 INFO: ARM GICv2 driver initialized INFO: BL31: Initializing runtime services NOTICE: BLDEBUG:dst=0x4400b9b0, val = -1, count=128 INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x50000000 INFO: SPSR = 0x3c5
U-Boot 2020.10-00535-gd0e9210a2f (Oct 08 2020 - 12:34:09 +0100)
CPU: Renesas Electronics R8A774A1 rev 1.3 Model: HopeRun HiHope RZ/G2M with sub board DRAM: 3.9 GiB MMC: mmc@ee100000: 0, mmc@ee160000: 1 Loading Environment from MMC... OK In: serial@e6e88000 Out: serial@e6e88000 Err: serial@e6e88000 Net: eth0: ethernet@e6800000 Hit any key to stop autoboot: 0
Hi Andre,
Found the root cause, the new changes corrupting the registers used by console.S. So I have fixed the issue. Please find the attached patch.
Marek or Andre,
I have tested this patch only on RZ/G2 platform and currently I don't have access to R-Car board. So please test this patch on R-Car boards and if it is OK, please apply to renesas tree.
Cheers, Biju
-----Original Message----- From: André Przywara andre.przywara@arm.com Sent: 11 October 2020 17:38 To: Biju Das biju.das.jz@bp.renesas.com; TF-A@lists.trustedfirmware.org; Madhukar Pappireddy madhukar.pappireddy@arm.com; Joanna.Farley@arm.com Cc: Chris Paterson Chris.Paterson2@renesas.com; Prabhakar Mahadev Lad prabhakar.mahadev-lad.rj@bp.renesas.com; Marek Vasut marek.vasut@gmail.com; Marek Vasut marek.vasut+renesas@gmail.com Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 11/10/2020 14:13, Biju Das wrote:
Hi Andre,
Looks like the issue is related to unhandled Exception in EL3. I have enabled
serial prints in BL31, that is pointing to this.
Please see the below logs. Any ideas??
INFO: SPSR = 0x3cd Unhandled Exception in EL3.
So this is happening before the first output of BL31?
Thanks for the hint.
Unfortunately the information in the crash dump below is not very helpful, other than some NULL pointer sneaking into the PC somehow (ELR_EL3 is NULL). But LR is 0 as well, and ESR says "unknown reason". You could go through the list of possible situations listed in the ARMv8 ARM, D13.2.38 "ESR_EL3, Exception Syndrome Register (EL3)", under "ISS encoding for exceptions with an unknown reason".
This made me to think that there is register corruption.
Regards, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 11/10/2020 19:23, Biju Das wrote:
Hi Biju,
Found the root cause, the new changes corrupting the registers used by console.S. So I have fixed the issue. Please find the attached patch.
Good find for the source of the register corruption, but unfortunately this patch won't cut it:
- I am not sure you are actually allowed to call a C function from this context. I need to check back with the team tomorrow to find the reason these console_*_register functions are all in assembly. I don't see any assembly callers, but there is (was?) surely a reason behind this. I think this was meant to be called either early in the game (before we have a SP), or during a crash, again with no C context.
- Since this function is called *from* C, you need to adhere to the AAPCS(64) standard, which means your function must preserve x19-x28. So you would need to save x24 and x25 to the stack. If we have one already at this point (see above).
Cheers, Andre
Marek or Andre,
I have tested this patch only on RZ/G2 platform and currently I don't have access to R-Car board. So please test this patch on R-Car boards and if it is OK, please apply to renesas tree.
Cheers, Biju
-----Original Message----- From: André Przywara andre.przywara@arm.com Sent: 11 October 2020 17:38 To: Biju Das biju.das.jz@bp.renesas.com; TF-A@lists.trustedfirmware.org; Madhukar Pappireddy madhukar.pappireddy@arm.com; Joanna.Farley@arm.com Cc: Chris Paterson Chris.Paterson2@renesas.com; Prabhakar Mahadev Lad prabhakar.mahadev-lad.rj@bp.renesas.com; Marek Vasut marek.vasut@gmail.com; Marek Vasut marek.vasut+renesas@gmail.com Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 11/10/2020 14:13, Biju Das wrote:
Hi Andre,
Looks like the issue is related to unhandled Exception in EL3. I have enabled
serial prints in BL31, that is pointing to this.
Please see the below logs. Any ideas??
INFO: SPSR = 0x3cd Unhandled Exception in EL3.
So this is happening before the first output of BL31?
Thanks for the hint.
Unfortunately the information in the crash dump below is not very helpful, other than some NULL pointer sneaking into the PC somehow (ELR_EL3 is NULL). But LR is 0 as well, and ESR says "unknown reason". You could go through the list of possible situations listed in the ARMv8 ARM, D13.2.38 "ESR_EL3, Exception Syndrome Register (EL3)", under "ISS encoding for exceptions with an unknown reason".
This made me to think that there is register corruption.
Regards, Biju
Hi Andre,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 11/10/2020 19:23, Biju Das wrote:
Hi Biju,
Found the root cause, the new changes corrupting the registers used by
console.S.
So I have fixed the issue. Please find the attached patch.
Good find for the source of the register corruption, but unfortunately this patch won't cut it:
- I am not sure you are actually allowed to call a C function from this context. I
need to check back with the team tomorrow to find the reason these console_*_register functions are all in assembly. I don't see any assembly callers, but there is (was?) surely a reason behind this. I think this was meant to be called either early in the game (before we have a SP), or during a crash, again with no C context.
- Since this function is called *from* C, you need to adhere to the
AAPCS(64) standard, which means your function must preserve x19-x28. So you would need to save x24 and x25 to the stack. If we have one already at this point (see above).
May be we could call "rcar_log_init" from "c" file, ie, just after the " rcar_console_runtime_init()" instead of assembly? https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/plat/rene...
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
On 12/10/2020 10:22, Biju Das wrote:
Hi Biju,
Subject: Re: commit 75fab6496e5fce9a11 ("libc: memset: improve performance by avoiding single byte writes") causing BL31 boot failure on Renesas RZ/G2 platforms.
On 11/10/2020 19:23, Biju Das wrote:
Hi Biju,
Found the root cause, the new changes corrupting the registers used by
console.S.
So I have fixed the issue. Please find the attached patch.
Good find for the source of the register corruption, but unfortunately this patch won't cut it:
- I am not sure you are actually allowed to call a C function from this context. I
need to check back with the team tomorrow to find the reason these console_*_register functions are all in assembly. I don't see any assembly callers, but there is (was?) surely a reason behind this. I think this was meant to be called either early in the game (before we have a SP), or during a crash, again with no C context.
- Since this function is called *from* C, you need to adhere to the
AAPCS(64) standard, which means your function must preserve x19-x28. So you would need to save x24 and x25 to the stack. If we have one already at this point (see above).
May be we could call "rcar_log_init" from "c" file, ie, just after the " rcar_console_runtime_init()" instead of assembly? https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/plat/rene...
I would just try to move console_rcar_init() from the assembly file up to its caller, and implement it there, in C. I don't see it being called from assembly, so that should be fine. There seem to be some issues to be solved, but that might be the better route.
Cheers, Andre
On 11/10/2020 11:00, Biju Das wrote:
Hi,
....
So, can you please double check that: memset(0x44040010, 0, 81840); is really the call that hangs your system?
Yes, This is called from console.S-->rcar_log_int-->memset
And that it really does because so it's being stuck in the last while loop?
After further investigation, not sure it is related to while loop or some kind of corruption?, since I don't have access to JTAG further debugging. Mostly some kind of corruption leading to crash.
The new code introduced, some increased code size, which is causing bl31 boot failure, when we call memset from console.S> Case 1) console.S -->rcar_log_int -->memset --> bl31 not booting. Case 2) If I comment, rcar_log_init from console.S --> bl31 boots fine. Case 3) If I comment, rcar_log_init from console.S and call rcar_log_init from bl31_early_platform_setup2 --> bl31 boots fine. Any thoughts?
So does it tip something over the edge? Can you compare the program headers in the ELF file to see if some grew bigger? $ readelf -l build/.../bl31/bl31.elf Compare a working with a non-working version. Also you might want to check the section headers (-S), to see if something sticks out.
Another things I stumbled upon the other day is stack size: If you overrun this (too many/too big local variables or too deep function call nesting), then BL31 silently hangs.
What is the memory map of your platform, and BL31 in particular? You run from DRAM, don't you? I see that most Renesas DTs in the kernel reserve the first 128MB as secure memory. Are there any other self-imposed limits, for instance, in BL31 in particular?
Another related thing to check is the number of page tables, that might be too tight for your code (although I think the build system would warn).
And can you make this RCAR_BL31_LOG_BASE region MT_NORMAL memory and see if the problem still persists?
Yes , still the problem persists. The current mapping "MT_DEVICE" is as per "Use of coherent memory in TF-A" section as mentioned in [1]. By changing the region as MT_NORMAL also didn't help , see the patch [2].
I see, thanks for trying this!
Cheers, Andre
Please share your opinion on this?
[1] https://github.com/ARM-software/arm-trusted-firmware/blob/master/docs/design... [2] diff --git a/include/lib/xlat_tables/xlat_tables_v2.h b/include/lib/xlat_tables/xlat_tables_v2.h index 359b9839a..c50727a42 100644 --- a/include/lib/xlat_tables/xlat_tables_v2.h +++ b/include/lib/xlat_tables/xlat_tables_v2.h @@ -86,6 +86,7 @@ #define MT_DEVICE U(0) #define MT_NON_CACHEABLE U(1) #define MT_MEMORY U(2) +#define MT_NORMAL U(4) /* Values up to 7 are reserved to add new memory types in the future */
#define MT_RO (U(0) << MT_PERM_SHIFT) diff --git a/plat/renesas/rcar/aarch64/platform_common.c b/plat/renesas/rcar/aarch64/platform_common.c index b0a88cb6b..744c79c98 100644 --- a/plat/renesas/rcar/aarch64/platform_common.c +++ b/plat/renesas/rcar/aarch64/platform_common.c @@ -64,7 +64,7 @@ const uint8_t version_of_renesas[VERSION_OF_RENESAS_MAXLEN]
#define MAP_ATFW_LOG MAP_REGION_FLAT(RCAR_BL31_LOG_BASE, \ RCAR_BL31_LOG_SIZE, \
MT_DEVICE | MT_RW | MT_SECURE)
MT_NORMAL | MT_RW | MT_SECURE)
Sure, having no tail loop simplifies the algorithm, but is wrong. So what is the point here?
Agreed, I just want to give some pointers related to this issue, so that we can narrow down the issue. that's all.
Cheers, Biju
Renesas Electronics Europe GmbH, Geschaeftsfuehrer/President: Carsten Jauch, Sitz der Gesellschaft/Registered office: Duesseldorf, Arcadiastrasse 10, 40472 Duesseldorf, Germany, Handelsregister/Commercial Register: Duesseldorf, HRB 3708 USt-IDNr./Tax identification no.: DE 119353406 WEEE-Reg.-Nr./WEEE reg. no.: DE 14978647
tf-a@lists.trustedfirmware.org