5 days | UNTESTED firmware: arm_sdei: Add support for binding interrupts as SDE eventsrefinfra | James Morse |
------------------------------------------------------------------
发件人:Thomas Abraham<thomas.abraham@arm.com>
日 期:2020年04月22日 18:30:14
收件人:Olivier Deprez<Olivier.Deprez@arm.com>; 吴斌(郅隆)<zhilong.wb@alibaba-inc.com>; Raghu K via TF-A<tf-a@lists.trustedfirmware.org>
主 题:RE: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hi Bin Wu,
The relevant changes to get dmc-620 ras error handling functional back again has been pushed to linaro repositories. If there are any questions on obtaining the updates, please let me know. A quick reference would be https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/tree/docs/rdn1edge/user-guide.rst
Thanks,
Thomas.
> -----Original Message-----
> From: Olivier Deprez <Olivier.Deprez@arm.com>
> Sent: Tuesday, April 21, 2020 7:36 PM
> To: 吴斌(郅隆) <zhilong.wb@alibaba-inc.com>; Thomas Abraham
> <thomas.abraham@arm.com>; Raghu K via TF-A <tf-
> a@lists.trustedfirmware.org>
> Subject: Re: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized
> Event - 0xC4000061 and BL31 Crashed
>
> Hi Bin Wu,
>
> Glad if this helped!
>
> Hi Thomas,
>
> Thanks for the heads up!
>
> Regards,
> Olivier.
>
>
> ________________________________________
> From: 吴斌(郅隆) <zhilong.wb@alibaba-inc.com>
> Sent: 21 April 2020 13:52
> To: Thomas Abraham; Olivier Deprez; TF-A
> Subject: 回复:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized
> Event - 0xC4000061 and BL31 Crashed
>
> Dear All,
>
> Thanks all your help again. Your professionalism and assistance impressed
> me.
>
> BRs,
> Bin Wu
> ------------------原始邮件 ------------------
> 发件人:Thomas Abraham <thomas.abraham@arm.com>
> 发送时间:Tue Apr 21 19:38:38 2020
> 收件人:Olivier Deprez <Olivier.Deprez@arm.com>, TF-A <tf-a-
> bounces@lists.trustedfirmware.org>, 吴斌(郅隆) <zhilong.wb@alibaba-
> inc.com>
> 主题:RE: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
> 0xC4000061 and BL31 Crashed
> Hi,
>
>
>
> Looking into the mail chain below, this is probably being tested on RD-N1-
> Edge platform. There was regression noticed in the dmc620 ras error
> handling in the code pushed to Linaro for RD-N1-Edge platform. This will be
> fixed later today and patches will be merged into Linaro repos. It should
> then be accessible using the usual repo init/sync commands.
>
>
>
> Thanks,
>
> Thomas.
>
>
>
> > -----Original Message-----
>
> > From: TF-A On Behalf Of Olivier
>
> > Deprez via TF-A
>
> > Sent: Tuesday, April 21, 2020 4:45 PM
>
> > To: TF-A ; Raghu K via TF-A
> > a@lists.trustedfirmware.org>; 吴斌(郅隆)
>
> > Subject: Re: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
>
> > 0xC4000061 and BL31 Crashed
>
> >
>
> > Hi Raghu,
>
> >
>
> > Yes you're right, we probably need few return code checks here and here. I
>
> > may submit a patch and verify it doesn't break anything else.
>
> >
>
> > Hi Bin Wu,
>
> >
>
> > I had noticed the following sequence originating from linux sdei driver init
>
> > down to TF-A:
>
> >
>
> > INFO: SDEI: Private events initialized on 81000100
>
> > INFO: SDEI: Private events initialized on 81000200
>
> > INFO: SDEI: Private events initialized on 81000300
>
> > INFO: SDEI: Private events initialized on 81010000
>
> > INFO: SDEI: Private events initialized on 81010100
>
> > INFO: SDEI: Private events initialized on 81010200
>
> > INFO: SDEI: Private events initialized on 81010300
>
> > INFO: SDEI: > VER
>
> > INFO: SDEI: < VER:1000000000000
>
> > INFO: SDEI: > P_RESET():81000000
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81000200
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81000300
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81010000
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81010100
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81010200
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81010300
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > P_RESET():81000100
>
> > INFO: SDEI: < P_RESET:0
>
> > INFO: SDEI: > S_RESET():81000100
>
> > INFO: SDEI: < S_RESET:0
>
> > INFO: SDEI: > UNMASK:81000000
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81000100
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81000200
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81000300
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81010000
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81010100
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81010200
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > UNMASK:81010300
>
> > INFO: SDEI: < UNMASK:0
>
> > INFO: SDEI: > INFO(n:804, 0)
>
> > INFO: SDEI: < INFO:0
>
> > INFO: SDEI: > INFO(n:805, 0)
>
> > INFO: SDEI: < INFO:0
>
> >
>
> > There is an Sdei Info request about events 804 and 805.
>
> > Although I don't see any register or enable event service call, so I wonder
> if
>
> > this demo code is missing something or expects that the platform
>
> > implements such event definition natively.
>
> >
>
> > This does not look like flows described in https://trustedfirmware-
>
> > a.readthedocs.io/en/latest/components/sdei.html
>
> > for regular SDEI usage or explicit dispatch of events.
>
> >
>
> > Maybe we should involve Linaro ppl on the expected init sequence and
>
> > dependency to TF-A (platform files).
>
> >
>
> > Regards,
>
> > Olivier.
>
> >
>
> >
>
> > ________________________________________
>
> > From: TF-A on behalf of 吴斌(郅
>
> > 隆) via TF-A
>
> > Sent: 21 April 2020 08:45
>
> > To: TF-A; Raghu K via TF-A
>
> > Subject: [TF-A] 回复:Re: 回复:Re: [RAS] BL32 UnRecognized Event -
>
> > 0xC4000061 and BL31 Crashed
>
> >
>
> > Hi Olivier and All,
>
> >
>
> > Thank you so much for your help. It makes me understand the internals.
>
> > The next step, I need to check this event_num(804) register flow in kernel
>
> > side, am I right?
>
> >
>
> >
>
> > BRs,
>
> > Bin Wu
>
> > ------------------原始邮件 ------------------
>
> > 发件人:TF-A
>
> > 发送时间:Tue Apr 21 09:51:49 2020
>
> > 收件人:Raghu K via TF-A
>
> > 主题:Re: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061
> and
>
> > BL31 Crashed
>
> > Nice debug! Apart from the issue you pointed out, there is also the
>
> > issue with not checking the return code. The ras handler should really
>
> > be checking or panic'ing if there is an unexpected error code from
>
> > spm_sp_call and sdei_dispatch_event.
>
> >
>
> > -Raghu
>
> >
>
> > On 4/20/20 2:37 PM, Olivier Deprez via TF-A wrote:
>
> > > Hi Bin Wu,
>
> > >
>
> > > Here's an early observation. On receiving the RAS fiq interrupt the
>
> > following occurs:
>
> > >
>
> > > ehf_el3_interrupt_handler => sgi_ras_intr_handler => spm_sp_call
>
> > (enters/exit the SP to handle the injected RAS error) =>
> sdei_dispatch_event
>
> > >
>
> > > se = get_event_entry(map);
>
> > > if (!can_sdei_state_trans(se, DO_DISPATCH))
>
> > > return -1;
>
> > >
>
> > > p *map
>
> > > $6 = {ev_num = 804, intr = 0, map_flags = 112, reg_count = 0, lock = {lock
> =
>
> > 0}}
>
> > > p *se
>
> > > $4 = {ep = 0, arg = 0, affinity = 0, reg_flags = 0, state = 0 '\0'}
>
> > >
>
> > > sdei_dispatch_event exits in error at this stage, this does not seem a
>
> > correct behavior.
>
> > > The SDEI handler is not called in NS world and context remains
> unchanged.
>
> > > The interrupt handler blindly returns to S-EL1 SP context at same
> location
>
> > where it last exited.
>
> > > sgi_ras_intr_handler => ehf_el3_interrupt_handler => vector_entry
>
> > fiq_aarch64 => el3_exit => re-enters the SP with X0=0xC4000061
>
> > > SP then exits but the EL3 context has not been setup for SP entry leading
>
> > to crash.
>
> > >
>
> > > IMO there is an issue around mapping SDEI event number to RAS
> interrupt
>
> > number leading to sdei_dispatch_event exiting early.
>
> > >
>
> > > Regards,
>
> > > Olivier.
>
> > >
>
> > >
>
> > > ________________________________________
>
> > > From: TF-A on behalf of Matteo Carlini via TF-A
>
> > > Sent: 14 April 2020 10:41
>
> > > To: 吴斌(郅隆); tf-a@lists.trustedfirmware.org; Thomas Abraham;
> Deepak
>
> > Pandey
>
> > > Cc: nd
>
> > > Subject: Re: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event -
> 0xC4000061
>
> > and BL31 Crashed
>
> > >
>
> > > Looping-in Thomas & Deepak, responsible for the RD-N1 landing team
>
> > platforms releases. They might be able to help.
>
> > >
>
> > > Thanks
>
> > > Matteo
>
> > >
>
> > > From: TF-A On Behalf Of ??(??) via TF-A
>
> > > Sent: 14 April 2020 06:47
>
> > > To: TF-A ; Raghu Krishnamurthy via TF-A
>
> > > Subject: [TF-A] 回复:Re: [RAS] BL32 UnRecognized Event - 0xC4000061
>
> > and BL31 Crashed
>
> > >
>
> > > Hi RagHu,
>
> > >
>
> > > Really appreciate your help.
>
> > >
>
> > > I was downloaded this software stack from git.linaro.org. This software
>
> > stack include ATF, kernel, edk2 and so on.
>
> > > The user guide i used from linaro is:https://git.linaro.org/landing-
>
> > teams/working/arm/arm-reference-
>
> > platforms.git/about/docs/rdn1edge/user-guide.rst#obtaining-the-rd-n1-
>
> > edge-and-rd-n1-edge-dual-fast-model
>
> > >
>
> > > 1) What platform you are running on? Can this issue be reproduced
>
> > > outside your testing environment, perhaps on FVP or QEMU?
>
> > > A: I am running on ARM N1-Edge FVP platform. It can reproduced on this
>
> > FVP platform.
>
> > >
>
> > > 2) What version of TF-A and StandaloneMM is being used? Preferably
> the
>
> > > commit-id, so that we can be sure we are looking at the same code.
>
> > > A: TF-A: https://git.linaro.org/landing-teams/working/arm/arm-tf.git
>
> > tag:RD-INFRA-20191024-RC0
>
> > > StandloneMM seems build from edk2 & edk2-platform. so i just put edk2
>
> > and edk2-platform version information. if anything i missed, please let me
>
> > know.
>
> > > edk2: https://git.linaro.org/landing-teams/working/arm/edk2.git tag:RD-
>
> > INFRA-20191024-RC0
>
> > > edk2-platform: https://git.linaro.org/landing-teams/working/arm/edk2-
>
> > platforms.git tag:RD-INFRA-20191024-RC0
>
> > >
>
> > > 3) What version of the kernel and sdei driver is being used?
>
> > > A: kernel-release: https://git.linaro.org/landing-
>
> > teams/working/arm/kernel-release.git tag:RD-INFRA-20191024-RC0
>
> > > The sdei driver was included in kernel, do i need to provide sdei driver
>
> > version? If need please let me know.
>
> > > 4) I can't tell from looking at the log but do you know if writing 0x123
>
> > > to sde_ras_poison causes a DMC620 interrupt or an SError or external
>
> > > abort through memory access ?
>
> > > A: Sorry, linaro only refered it will inject the DMC-620 single-bit RAS
> error.
>
> > So I am also not sure which exception type it will trigger.
>
> > >
>
> > > BRs,
>
> > > Bin Wu
>
> > >
>
> > > ------------------原始邮件 ------------------
>
> > > 发件人:TF-A >
>
> > > 发送时间:Tue Apr 14 01:25:47 2020
>
> > > 收件人:Raghu Krishnamurthy via TF-A >
>
> > > 主题:Re: [TF-A] [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31
>
> > Crashed
>
> > > Hello,
>
> > >
>
> > > >>Does BL31 need to send 0xC4000061 event to BL32 again?
>
> > >
>
> > > I don't think it will. It is really odd that
>
> > > 0xC4000061(SP_EVENT_COMPLETE_AARCH64) ever reaches the
> BL32/MM
>
> > handler.
>
> > > This is from looking at the upstream code quickly but it definitely
>
> > > depends on the platform you are running, what version of TF-A you are
>
> > > using, build options used. Is it possible that the unhandled exception
>
> > > is occurring after successful handling of the DMC620 error but there is
>
> > > a following issue that occurs right after, causing the crash?
>
> > > From the register dump it looks like there was an Instruction abort
>
> > > exception at address 0 while running in EL3. Something seems to have
>
> > > gone seriously wrong to have 0xC4000061 ever go back to BL32 and to
> get
>
> > > an instruction abort at address 0.
>
> > >
>
> > > >>Does current TF-A support to run RAS test? It seems BL31 will crash.
>
> > > See above. The answer really depends on the factors mentioned above.
>
> > >
>
> > > The following would be helpful to know:
>
> > > 1) What platform you are running on? Can this issue be reproduced
>
> > > outside your testing environment, perhaps on FVP or QEMU?
>
> > > 2) What version of TF-A and StandaloneMM is being used? Preferably
> the
>
> > > commit-id, so that we can be sure we are looking at the same code.
>
> > > 3) What version of the kernel and sdei driver is being used?
>
> > > 4) I can't tell from looking at the log but do you know if writing 0x123
>
> > > to sde_ras_poison causes a DMC620 interrupt or an SError or external
>
> > > abort through memory access ?
>
> > >
>
> > > Thanks
>
> > > Raghu
>
> > >
>
> > >
>
> > > On 4/13/20 12:16 AM, 吴斌(郅隆) via TF-A wrote:
>
> > >> Dear Friends,
>
> > >>
>
> > >> I am using TF-A to test RAS feature.
>
> > >> When I triggered DMC620 RAS error in Linux(echo 0x123 >
>
> > >> /sys/kernel/debug/sdei_ras_poison).
>
> > >> BL32 will recieve
>
> > >> UnRecognized Event - 0xC4000061(SP_EVENT_COMPLETE_AARCH64)
> and
>
> > finally
>
> > >> BL31 crashed.
>
> > >>
>
> > >> In my understanding, this 0xC4000061 should consumed by BL31, not
>
> > send
>
> > >> it to BL32 again.
>
> > >>
>
> > >> A piece of error log as below:
>
> > >>
>
> > >> *************************************
>
> > >>
>
> > >> CperWrite - CperAddress@0xFF610064
>
> > >> CperWrite - 1 Section@FFBE91A8, Length 80, SectionType@FFBE9138
>
> > >> CperWrite - Got Error Section: Platform Memory.
>
> > >> MmEntryPoint Done
>
> > >> Received delegated event
>
> > >> X0 : 0xC4000061
>
> > >> X1 : 0x0
>
> > >> X2 : 0x0
>
> > >> X3 : 0x0
>
> > >> Received event - 0xC4000061 on cpu 0
>
> > >> UnRecognized Event - 0xC4000061
>
> > >> Failed delegated event 0xC4000061, Status 0x2
>
> > >> Unhandled Exception in EL3.
>
> > >> x30 = 0x0000000000000000
>
> > >> x0 = 0x00000000ff007e00
>
> > >> x1 = 0xfffffffffffffffe
>
> > >> x2 = 0x00000000600003c0
>
> > >> x3 = 0x0000000000000000
>
> > >> x4 = 0x0000000000000000
>
> > >> x5 = 0x0000000000000000
>
> > >> x6 = 0x00000000ff015080
>
> > >> x7 = 0x0000000000000000
>
> > >> x8 = 0x00000000c4000061
>
> > >> x9 = 0x0000000000000021
>
> > >> x10 = 0x0000000000000040
>
> > >> x11 = 0x00000000ff00f2b0
>
> > >> x12 = 0x00000000ff0118c0
>
> > >> x13 = 0x0000000000000002
>
> > >> x14 = 0x00000000ff016b70
>
> > >> x15 = 0x00000000ff003f20
>
> > >> x16 = 0x0000000000000044
>
> > >> x17 = 0x00000000ff010430
>
> > >> x18 = 0x0000000000000e3c
>
> > >> x19 = 0x0000000000000000
>
> > >> More error log please refer to attachment.
>
> > >>
>
> > >> My question is,
>
> > >> 1. Does BL31 need to send 0xC4000061 event to BL32 again?
>
> > >> 2. Does current TF-A support to run RAS test? It seems BL31 will crash.
>
> > >>
>
> > >> Appreciate your help.
>
> > >>
>
> > >> BRs,
>
> > >> Bin Wu
>
> > >>
>
> > > --
>
> > > TF-A mailing list
>
> > > TF-A@lists.trustedfirmware.org
>
> > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
>
> >
>
> > --
>
> > TF-A mailing list
>
> > TF-A@lists.trustedfirmware.org
>
> > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
>
> > IMPORTANT NOTICE: The contents of this email and any attachments are
>
> > confidential and may also be privileged. If you are not the intended
>
> > recipient, please notify the sender immediately and do not disclose the
>
> > contents to any other person, use it for any purpose, or store or copy the
>
> > information in any medium. Thank you.
>
> > --
>
> > TF-A mailing list
>
> > TF-A@lists.trustedfirmware.org
>
> > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.