Hi Thomas,
I got below build error message: /home/bin.wu/rdn1edge/uefi/edk2/edk2-platforms/Platform/ARM/SgiPkg/Drivers/PlatformDxe/PlatformDxe.c:57:43: error: 'gRdDanielAcpiTablesFileGuid' undeclared (first use in this function) Status = LocateAndInstallAcpiFromFv (&gRdDanielAcpiTablesFileGuid); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/bin.wu/rdn1edge/uefi/edk2/edk2-platforms/Platform/ARM/SgiPkg/Drivers/PlatformDxe/PlatformDxe.c:57:43: note: each undeclared identifier is reported only once for each function it appears in GNUmakefile:429: recipe for target '/home/bin.wu/rdn1edge/uefi/edk2/Build/ArmSgi/DEBUG_GCC5/AARCH64/Platform/ARM/SgiPkg/Drivers/PlatformDxe/PlatformDxeMm/OUTPUT/PlatformDxe.obj' failed make: *** [/home/bin.wu/rdn1edge/uefi/edk2/Build/ArmSgi/DEBUG_GCC5/AARCH64/Platform/ARM/SgiPkg/Drivers/PlatformDxe/PlatformDxeMm/OUTPUT/PlatformDxe.obj] Error 1 build.py... : error 7000: Failed to execute command make -s tbuild [/home/bin.wu/rdn1edge/uefi/edk2/Build/ArmSgi/DEBUG_GCC5/AARCH64/Platform/ARM/SgiPkg/Drivers/PlatformDxe/PlatformDxeMm] build.py... Do I need to cherry-pick any more patch that I need to merge?
Below patch has been merged. kernel: 1. UNTESTED firmware: arm_sdei: Add support for binding interrupts as SDE events 2. UNTESTED firmware: arm_sdei: Allow events to be disabled from within their ha.. 3. firmware/arm_sdei.c: use mm_communicate smc to inject platform error build_scripts: 1. configs: reenable ras support for platforms with dmc620 controllerrefinfra 2. build-uefi.sh: pass compile time defined macros correctly to the build edk2-platform:1. Platform/ARM/Sgi: Fix changes that make RAS optional
BRs, Bin Wu
------------------------------------------------------------------ 发件人:Thomas Abraham thomas.abraham@arm.com 发送时间:2020年4月22日(星期三) 20:27 收件人:吴斌(郅隆) zhilong.wb@alibaba-inc.com; Olivier Deprez Olivier.Deprez@arm.com; Raghu K via TF-A tf-a@lists.trustedfirmware.org 主 题:RE: 回复:RE: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hi Bin Wu,
it seems rdn1edge master codebase will use kernel-release.git repository and use it's refinfra branch.
Yes, that is correct. For all the components in the platform stack, the patches will be held in the ‘refinfra’ branch in those component repos.
Does this patch has been merged to this kernel branch?
Yes, it has been merged. These are temporary patches which will eventually be pushed to upstream.
Thanks, Thomas.
From:吴斌(郅隆) zhilong.wb@alibaba-inc.com Sent: Wednesday, April 22, 2020 5:51 PM To: Thomas Abraham thomas.abraham@arm.com; Olivier Deprez Olivier.Deprez@arm.com; Raghu K via TF-A tf-a@lists.trustedfirmware.org Subject: 回复:RE: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hi Thomas,
Could you please comfirm, which kernel branch that this patch has been pushed to?
I use rdn1edge.xml to track the master branch of rdn1edge. The rdn1edge.xml show: <project remote="linaro" name="landing-teams/working/arm/kernel-release" path="linux" revision="refinfra" upstream="refinfra"/> it seems rdn1edge master codebase will use kernel-release.git repository and use it's refinfra branch.
But this kernel branch's latest log show:
5 days UNTESTED firmware: arm_sdei: Add support for binding interrupts as SDE eventsrefinfra James Morse
Does this patch has been merged to this kernel branch?
BRs, Bin Wu ------------------------------------------------------------------ 发件人:Thomas Abrahamthomas.abraham@arm.com 日 期:2020年04月22日 18:30:14 收件人:Olivier DeprezOlivier.Deprez@arm.com; 吴斌(郅隆)zhilong.wb@alibaba-inc.com; Raghu K via TF-Atf-a@lists.trustedfirmware.org 主 题:RE: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hi Bin Wu,
The relevant changes to get dmc-620 ras error handling functional back again has been pushed to linaro repositories. If there are any questions on obtaining the updates, please let me know. A quick reference would be https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git...
Thanks, Thomas.
-----Original Message----- From: Olivier Deprez Olivier.Deprez@arm.com Sent: Tuesday, April 21, 2020 7:36 PM To: 吴斌(郅隆) zhilong.wb@alibaba-inc.com; Thomas Abraham thomas.abraham@arm.com; Raghu K via TF-A <tf- a@lists.trustedfirmware.org> Subject: Re: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hi Bin Wu,
Glad if this helped!
Hi Thomas,
Thanks for the heads up!
Regards, Olivier.
From: 吴斌(郅隆) zhilong.wb@alibaba-inc.com Sent: 21 April 2020 13:52 To: Thomas Abraham; Olivier Deprez; TF-A Subject: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Dear All,
Thanks all your help again. Your professionalism and assistance impressed me.
BRs, Bin Wu ------------------原始邮件 ------------------ 发件人:Thomas Abraham thomas.abraham@arm.com 发送时间:Tue Apr 21 19:38:38 2020 收件人:Olivier Deprez Olivier.Deprez@arm.com, TF-A <tf-a- bounces@lists.trustedfirmware.org>, 吴斌(郅隆) <zhilong.wb@alibaba- inc.com> 主题:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed Hi,
Looking into the mail chain below, this is probably being tested on RD-N1- Edge platform. There was regression noticed in the dmc620 ras error handling in the code pushed to Linaro for RD-N1-Edge platform. This will be fixed later today and patches will be merged into Linaro repos. It should then be accessible using the usual repo init/sync commands.
Thanks,
Thomas.
-----Original Message-----
From: TF-A On Behalf Of Olivier
Deprez via TF-A
Sent: Tuesday, April 21, 2020 4:45 PM
To: TF-A ; Raghu K via TF-A a@lists.trustedfirmware.org>; 吴斌(郅隆)
Subject: Re: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
0xC4000061 and BL31 Crashed
Hi Raghu,
Yes you're right, we probably need few return code checks here and here. I
may submit a patch and verify it doesn't break anything else.
Hi Bin Wu,
I had noticed the following sequence originating from linux sdei driver init
down to TF-A:
INFO: SDEI: Private events initialized on 81000100
INFO: SDEI: Private events initialized on 81000200
INFO: SDEI: Private events initialized on 81000300
INFO: SDEI: Private events initialized on 81010000
INFO: SDEI: Private events initialized on 81010100
INFO: SDEI: Private events initialized on 81010200
INFO: SDEI: Private events initialized on 81010300
INFO: SDEI: > VER
INFO: SDEI: < VER:1000000000000
INFO: SDEI: > P_RESET():81000000
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81000200
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81000300
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81010000
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81010100
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81010200
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81010300
INFO: SDEI: < P_RESET:0
INFO: SDEI: > P_RESET():81000100
INFO: SDEI: < P_RESET:0
INFO: SDEI: > S_RESET():81000100
INFO: SDEI: < S_RESET:0
INFO: SDEI: > UNMASK:81000000
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81000100
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81000200
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81000300
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81010000
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81010100
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81010200
INFO: SDEI: < UNMASK:0
INFO: SDEI: > UNMASK:81010300
INFO: SDEI: < UNMASK:0
INFO: SDEI: > INFO(n:804, 0)
INFO: SDEI: < INFO:0
INFO: SDEI: > INFO(n:805, 0)
INFO: SDEI: < INFO:0
There is an Sdei Info request about events 804 and 805.
Although I don't see any register or enable event service call, so I wonder
if
this demo code is missing something or expects that the platform
implements such event definition natively.
This does not look like flows described in https://trustedfirmware-
a.readthedocs.io/en/latest/components/sdei.html
for regular SDEI usage or explicit dispatch of events.
Maybe we should involve Linaro ppl on the expected init sequence and
dependency to TF-A (platform files).
Regards,
Olivier.
From: TF-A on behalf of 吴斌(郅
隆) via TF-A
Sent: 21 April 2020 08:45
To: TF-A; Raghu K via TF-A
Subject: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
0xC4000061 and BL31 Crashed
Hi Olivier and All,
Thank you so much for your help. It makes me understand the internals.
The next step, I need to check this event_num(804) register flow in kernel
side, am I right?
BRs,
Bin Wu
------------------原始邮件 ------------------
发件人:TF-A
发送时间:Tue Apr 21 09:51:49 2020
收件人:Raghu K via TF-A
主题:Re: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061
and
BL31 Crashed
Nice debug! Apart from the issue you pointed out, there is also the
issue with not checking the return code. The ras handler should really
be checking or panic'ing if there is an unexpected error code from
spm_sp_call and sdei_dispatch_event.
-Raghu
On 4/20/20 2:37 PM, Olivier Deprez via TF-A wrote:
Hi Bin Wu,
Here's an early observation. On receiving the RAS fiq interrupt the
following occurs:
ehf_el3_interrupt_handler => sgi_ras_intr_handler => spm_sp_call
(enters/exit the SP to handle the injected RAS error) =>
sdei_dispatch_event
se = get_event_entry(map);
if (!can_sdei_state_trans(se, DO_DISPATCH))
return -1;
p *map
$6 = {ev_num = 804, intr = 0, map_flags = 112, reg_count = 0, lock = {lock
=
0}}
p *se
$4 = {ep = 0, arg = 0, affinity = 0, reg_flags = 0, state = 0 '\0'}
sdei_dispatch_event exits in error at this stage, this does not seem a
correct behavior.
The SDEI handler is not called in NS world and context remains
unchanged.
The interrupt handler blindly returns to S-EL1 SP context at same
location
where it last exited.
sgi_ras_intr_handler => ehf_el3_interrupt_handler => vector_entry
fiq_aarch64 => el3_exit => re-enters the SP with X0=0xC4000061
SP then exits but the EL3 context has not been setup for SP entry leading
to crash.
IMO there is an issue around mapping SDEI event number to RAS
interrupt
number leading to sdei_dispatch_event exiting early.
Regards,
Olivier.
From: TF-A on behalf of Matteo Carlini via TF-A
Sent: 14 April 2020 10:41
To: 吴斌(郅隆); tf-a@lists.trustedfirmware.org; Thomas Abraham;
Deepak
Pandey
Cc: nd
Subject: Re: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event -
0xC4000061
and BL31 Crashed
Looping-in Thomas & Deepak, responsible for the RD-N1 landing team
platforms releases. They might be able to help.
Thanks
Matteo
From: TF-A On Behalf Of ??(??) via TF-A
Sent: 14 April 2020 06:47
To: TF-A ; Raghu Krishnamurthy via TF-A
Subject: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061
and BL31 Crashed
Hi RagHu,
Really appreciate your help.
I was downloaded this software stack from git.linaro.org. This software
stack include ATF, kernel, edk2 and so on.
The user guide i used from linaro is:https://git.linaro.org/landing-
teams/working/arm/arm-reference-
platforms.git/about/docs/rdn1edge/user-guide.rst#obtaining-the-rd-n1-
edge-and-rd-n1-edge-dual-fast-model
- What platform you are running on? Can this issue be reproduced
outside your testing environment, perhaps on FVP or QEMU?
A: I am running on ARM N1-Edge FVP platform. It can reproduced on this
FVP platform.
- What version of TF-A and StandaloneMM is being used? Preferably
the
commit-id, so that we can be sure we are looking at the same code.
A: TF-A: https://git.linaro.org/landing-teams/working/arm/arm-tf.git
tag:RD-INFRA-20191024-RC0
StandloneMM seems build from edk2 & edk2-platform. so i just put edk2
and edk2-platform version information. if anything i missed, please let me
know.
edk2: https://git.linaro.org/landing-teams/working/arm/edk2.git tag:RD-
INFRA-20191024-RC0
edk2-platform: https://git.linaro.org/landing-teams/working/arm/edk2-
platforms.git tag:RD-INFRA-20191024-RC0
- What version of the kernel and sdei driver is being used?
A: kernel-release: https://git.linaro.org/landing-
teams/working/arm/kernel-release.git tag:RD-INFRA-20191024-RC0
The sdei driver was included in kernel, do i need to provide sdei driver
version? If need please let me know.
- I can't tell from looking at the log but do you know if writing 0x123
to sde_ras_poison causes a DMC620 interrupt or an SError or external
abort through memory access ?
A: Sorry, linaro only refered it will inject the DMC-620 single-bit RAS
error.
So I am also not sure which exception type it will trigger.
BRs,
Bin Wu
------------------原始邮件 ------------------
发件人:TF-A >
发送时间:Tue Apr 14 01:25:47 2020
收件人:Raghu Krishnamurthy via TF-A >
主题:Re: [TF-A] [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31
Crashed
Hello,
Does BL31 need to send 0xC4000061 event to BL32 again?
I don't think it will. It is really odd that
0xC4000061(SP_EVENT_COMPLETE_AARCH64) ever reaches the
BL32/MM
handler.
This is from looking at the upstream code quickly but it definitely
depends on the platform you are running, what version of TF-A you are
using, build options used. Is it possible that the unhandled exception
is occurring after successful handling of the DMC620 error but there is
a following issue that occurs right after, causing the crash?
From the register dump it looks like there was an Instruction abort
exception at address 0 while running in EL3. Something seems to have
gone seriously wrong to have 0xC4000061 ever go back to BL32 and to
get
an instruction abort at address 0.
Does current TF-A support to run RAS test? It seems BL31 will crash.
See above. The answer really depends on the factors mentioned above.
The following would be helpful to know:
- What platform you are running on? Can this issue be reproduced
outside your testing environment, perhaps on FVP or QEMU?
- What version of TF-A and StandaloneMM is being used? Preferably
the
commit-id, so that we can be sure we are looking at the same code.
- What version of the kernel and sdei driver is being used?
- I can't tell from looking at the log but do you know if writing 0x123
to sde_ras_poison causes a DMC620 interrupt or an SError or external
abort through memory access ?
Thanks
Raghu
On 4/13/20 12:16 AM, 吴斌(郅隆) via TF-A wrote:
Dear Friends,
I am using TF-A to test RAS feature.
When I triggered DMC620 RAS error in Linux(echo 0x123 >
/sys/kernel/debug/sdei_ras_poison).
BL32 will recieve
UnRecognized Event - 0xC4000061(SP_EVENT_COMPLETE_AARCH64)
and
finally
BL31 crashed.
In my understanding, this 0xC4000061 should consumed by BL31, not
send
it to BL32 again.
A piece of error log as below:
CperWrite - CperAddress@0xFF610064
CperWrite - 1 Section@FFBE91A8, Length 80, SectionType@FFBE9138
CperWrite - Got Error Section: Platform Memory.
MmEntryPoint Done
Received delegated event
X0 : 0xC4000061
X1 : 0x0
X2 : 0x0
X3 : 0x0
Received event - 0xC4000061 on cpu 0
UnRecognized Event - 0xC4000061
Failed delegated event 0xC4000061, Status 0x2
Unhandled Exception in EL3.
x30 = 0x0000000000000000
x0 = 0x00000000ff007e00
x1 = 0xfffffffffffffffe
x2 = 0x00000000600003c0
x3 = 0x0000000000000000
x4 = 0x0000000000000000
x5 = 0x0000000000000000
x6 = 0x00000000ff015080
x7 = 0x0000000000000000
x8 = 0x00000000c4000061
x9 = 0x0000000000000021
x10 = 0x0000000000000040
x11 = 0x00000000ff00f2b0
x12 = 0x00000000ff0118c0
x13 = 0x0000000000000002
x14 = 0x00000000ff016b70
x15 = 0x00000000ff003f20
x16 = 0x0000000000000044
x17 = 0x00000000ff010430
x18 = 0x0000000000000e3c
x19 = 0x0000000000000000
More error log please refer to attachment.
My question is,
- Does BL31 need to send 0xC4000061 event to BL32 again?
- Does current TF-A support to run RAS test? It seems BL31 will crash.
Appreciate your help.
BRs,
Bin Wu
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium. Thank you.
--
TF-A mailing list
TF-A@lists.trustedfirmware.org
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.