Re: [TF-A] psci shutdown do not follow graceful power off sequence

List overview All Threads
Download

newer

older

Re: [TF-A] RPi4 and CPU hotplugging

RPi4 and CPU hotplugging

Dan Handley

5 Dec 2019 5 Dec '19

5:43 p.m.

Hi Sandeep

(I accidentally dropped the TF-A list in my last reply - now re-adding).

...

-----Original Message----- From: Sandeep Tripathy sandeep.tripathy@broadcom.com Sent: 05 December 2019 17:17

On Thu, Dec 5, 2019 at 9:54 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

...
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Sandeep Tripathy via TF-A Sent: 05 December 2019 12:00

My query is more on the spec. The OS (eg: linux) and atf and psci spec seem to have assumed that it is managing an independent system or managing 'all' the masters in a coherent domain. What other reason could possibly encourage to not to follow a shutdown sequence.

Do you mean "to not follow a *graceful* shutdown sequence"?

Yes, exactly. Thanks!

...
If so I can think of 3 reasons:

It's much slower than a non-graceful shutdown.

But this is certainly not a concern for smaller embedded systems.

True, but TF-A tries to be a reference for all systems.

...

...

There is no observable difference between a graceful and non-graceful

shutdown from the calling OS's point of view. The OS presumably has no knowledge of other masters it does not manage.

Can CCN state machine go bad because one participating entity just goes off without marking its exit ? Please note I have not seen the issue and it is my assumption.

It depends on the interconnect. Arm interconnects designed for pre-v8.2 systems required explicit programming to take the master our of the coherency domain. Arm interconnects for v8.2+ systems do this automatically via hardware system coherency signals. The TF-A off/reset platform interfaces have provision to do this programming if necessary, but only for the running cluster, which is another reason not to use these PSCI functions in this scenario.

...

...

It's hard for firmware to implement in the multicore situation.

Agree. It is complex to initiate and ensure 'other cores' power down in firmware.

...
I haven't yet seen a reason why SYSTEM_SUSPEND won't work instead.

I think you are suggesting to use psci system suspend hook in reboot /power off path Or use system suspend from the OS itself ? Should work.

I'm suggesting to just do a normal SYSTEM_SUSPEND (suspend to RAM) from the OS.

...

@Sudeep, I agree alternate approaches to solve data loss problem works and may be those are the best suited. The past thread[1] is somewhat related but diverged in multiple directions. I wanted to know and focus the above 3 points especially point 2.

Regards

Dan.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Show replies by date

Sandeep Tripathy

6 Dec 6 Dec

6:51 a.m.

New subject: psci shutdown do not follow graceful power off sequence

On Thu, Dec 5, 2019 at 11:13 PM Dan Handley Dan.Handley@arm.com wrote:

...

Hi Sandeep

(I accidentally dropped the TF-A list in my last reply - now re-adding).

...
-----Original Message----- From: Sandeep Tripathy sandeep.tripathy@broadcom.com Sent: 05 December 2019 17:17

On Thu, Dec 5, 2019 at 9:54 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

...
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Sandeep Tripathy via TF-A Sent: 05 December 2019 12:00

My query is more on the spec. The OS (eg: linux) and atf and psci spec seem to have assumed that it is managing an independent system or managing 'all' the masters in a coherent domain. What other reason could possibly encourage to not to follow a shutdown sequence.

Do you mean "to not follow a *graceful* shutdown sequence"?

Yes, exactly. Thanks!

...
If so I can think of 3 reasons:

It's much slower than a non-graceful shutdown.

But this is certainly not a concern for smaller embedded systems.

True, but TF-A tries to be a reference for all systems.

...
...

There is no observable difference between a graceful and non-graceful

shutdown from the calling OS's point of view. The OS presumably has no knowledge of other masters it does not manage.

Can CCN state machine go bad because one participating entity just goes off without marking its exit ? Please note I have not seen the issue and it is my assumption.

It depends on the interconnect. Arm interconnects designed for pre-v8.2 systems required explicit programming to take the master our of the coherency domain. Arm interconnects for v8.2+ systems do this automatically via hardware system coherency signals. The TF-A off/reset platform interfaces have provision to do this programming if necessary, but only for the running cluster, which is another reason not to use these PSCI functions in this scenario.

we use the reset/reset2/ platform interface for the coherency exit. I thought there might be some dependency on a proper core and cluster power down sequence like clearing smp bit flushing the local caches. And it did not seem wrong to do so. Though it does assume at leaf level that other cores have done their bit. Not sure which is the right place to handle that. If it is firmware, it is complex.

...

...
...

It's hard for firmware to implement in the multicore situation.

Agree. It is complex to initiate and ensure 'other cores' power down in firmware.

...
I haven't yet seen a reason why SYSTEM_SUSPEND won't work instead.

I think you are suggesting to use psci system suspend hook in reboot /power off path Or use system suspend from the OS itself ? Should work.

I'm suggesting to just do a normal SYSTEM_SUSPEND (suspend to RAM) from the OS.

...
@Sudeep, I agree alternate approaches to solve data loss problem works and may be those are the best suited. The past thread[1] is somewhat related but diverged in multiple directions. I wanted to know and focus the above 3 points especially point 2.

Regards

Dan.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Sudeep Holla

10:51 a.m.

New subject: psci shutdown do not follow graceful power off sequence

On Fri, Dec 06, 2019 at 12:21:47PM +0530, Sandeep Tripathy wrote:

...

On Thu, Dec 5, 2019 at 11:13 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

(I accidentally dropped the TF-A list in my last reply - now re-adding).

...
-----Original Message----- From: Sandeep Tripathy sandeep.tripathy@broadcom.com Sent: 05 December 2019 17:17

On Thu, Dec 5, 2019 at 9:54 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

...
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Sandeep Tripathy via TF-A Sent: 05 December 2019 12:00

My query is more on the spec. The OS (eg: linux) and atf and psci spec seem to have assumed that it is managing an independent system or managing 'all' the masters in a coherent domain. What other reason could possibly encourage to not to follow a shutdown sequence.

Do you mean "to not follow a *graceful* shutdown sequence"?

Yes, exactly. Thanks!

...
If so I can think of 3 reasons:

It's much slower than a non-graceful shutdown.

But this is certainly not a concern for smaller embedded systems.

True, but TF-A tries to be a reference for all systems.

...
...

There is no observable difference between a graceful and non-graceful

shutdown from the calling OS's point of view. The OS presumably has no knowledge of other masters it does not manage.

Can CCN state machine go bad because one participating entity just goes off without marking its exit ? Please note I have not seen the issue and it is my assumption.

It depends on the interconnect. Arm interconnects designed for pre-v8.2 systems required explicit programming to take the master our of the coherency domain. Arm interconnects for v8.2+ systems do this automatically via hardware system coherency signals. The TF-A off/reset platform interfaces have provision to do this programming if necessary, but only for the running cluster, which is another reason not to use these PSCI functions in this scenario.

we use the reset/reset2/ platform interface for the coherency exit. I thought there might be some dependency on a proper core and cluster power down sequence like clearing smp bit flushing the local caches.

So let's get into details. The OS has either initiated or being asked by other masters in the system to perform either SYSTEM_OFF or _RESET. Now, IIUC OS can save all the data it needs to preserve are written to the non-volatile memory and then the poweroff/reboot sequence as you have described in earlier mails get executed. So what in the volatile memory(RAM or caches) you have to preserve at that point ?

/me still struggling to understand the use-case. I am asking again as you mentioned the requirements have diverged since the original thread in LKML. If it's same, can we please continue there by first getting quite a few open questions in that thread.

-- Regards, Sudeep

Sandeep Tripathy

7 Dec 7 Dec

7:15 a.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sudeep, My intention here was to figure out all the reasons to skip power down sequence deliberately. I feel the only reason we do not do graceful power down sequence is:- 1. In many core (may be 100(s) to have perceivable impact) server systems it can increase the time of reboot. 2. we do not see a valid generic use case where things can fail. 3. complexity. I think this is a matter of intent for tf-a folks :) If above is true I can close the thread and suggest to keep the solution to plat-specific hook.

On Fri, Dec 6, 2019 at 4:21 PM Sudeep Holla sudeep.holla@arm.com wrote:

...

On Fri, Dec 06, 2019 at 12:21:47PM +0530, Sandeep Tripathy wrote:

...
On Thu, Dec 5, 2019 at 11:13 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

(I accidentally dropped the TF-A list in my last reply - now re-adding).

...
-----Original Message----- From: Sandeep Tripathy sandeep.tripathy@broadcom.com Sent: 05 December 2019 17:17

On Thu, Dec 5, 2019 at 9:54 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

...
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Sandeep Tripathy via TF-A Sent: 05 December 2019 12:00

My query is more on the spec. The OS (eg: linux) and atf and psci spec seem to have assumed that it is managing an independent system or managing 'all' the masters in a coherent domain. What other reason could possibly encourage to not to follow a shutdown sequence.

Do you mean "to not follow a *graceful* shutdown sequence"?

Yes, exactly. Thanks!

...
If so I can think of 3 reasons:

It's much slower than a non-graceful shutdown.

But this is certainly not a concern for smaller embedded systems.

True, but TF-A tries to be a reference for all systems.

...
...

There is no observable difference between a graceful and non-graceful

shutdown from the calling OS's point of view. The OS presumably has no knowledge of other masters it does not manage.

Can CCN state machine go bad because one participating entity just goes off without marking its exit ? Please note I have not seen the issue and it is my assumption.

It depends on the interconnect. Arm interconnects designed for pre-v8.2 systems required explicit programming to take the master our of the coherency domain. Arm interconnects for v8.2+ systems do this automatically via hardware system coherency signals. The TF-A off/reset platform interfaces have provision to do this programming if necessary, but only for the running cluster, which is another reason not to use these PSCI functions in this scenario.

we use the reset/reset2/ platform interface for the coherency exit. I thought there might be some dependency on a proper core and cluster power down sequence like clearing smp bit flushing the local caches.

So let's get into details. The OS has either initiated or being asked by other masters in the system to perform either SYSTEM_OFF or _RESET. Now, IIUC OS can save all the data it needs to preserve are written to the non-volatile memory and then the poweroff/reboot sequence as you have described in earlier mails get executed. So what in the volatile memory(RAM or caches) you have to preserve at that point ?

/me still struggling to understand the use-case. I am asking again as you mentioned the requirements have diverged since the original thread in LKML. If it's same, can we please continue there by first getting quite a few open questions in that thread.

I will dig into that thread and check where it ended up as I was not directly involved in that. tl:dr; 1-if the secondary cores do not exit coherency domain interconnect can fail. So just doing 'reboot' can potentially fail such system. solution: 'system_off'/'system_reset/system_reset2' from any core may take care to do this for all the cores in firmware implementation ? And implementation can be in plat_specific hooks for respective call.

2-may be the cpu caches (NS and S dirty lines) can be of value. eg: may be some logs updated by core are in cache. solution: follow graceful power down sequence for all. I don't know how. May be do IPI to bring all others to el3 and force power down sequence for them in order. solution2: avoid 'reboot'. initiate cpuhp / suspend if possible.

...

-- Regards, Sudeep

Sudeep Holla

11:36 a.m.

New subject: psci shutdown do not follow graceful power off sequence

On Sat, Dec 07, 2019 at 12:45:41PM +0530, Sandeep Tripathy wrote:

...

Hi Sudeep, My intention here was to figure out all the reasons to skip power down sequence deliberately.

Looking other way around, do we have reasons to execute power down sequence for reboot ? Except the *special* use case / design you have in your system ?

...

I feel the only reason we do not do graceful power down sequence is:-

In many core (may be 100(s) to have perceivable impact) server

systems it can increase the time of reboot.

For sure, this is one of the main reason and not the only one.

...

we do not see a valid generic use case where things can fail.

Indeed, even in your case as you have not given complete system design details and not explained why other alternatives suggested don't work, we have to isolate your case as custom.

...

complexity. I think this is a matter of intent for tf-a folks :)

If above is true I can close the thread and suggest to keep the solution to plat-specific hook.

Sorry, but we can't help unless this is generic usecase.

IIUC, since the secondaries are parked in the OS, you need to pull in them to secure side using IPI and then execute so called power down sequence. To be honest, I don't like it as it's being pushed into TF-A because OS refuses to do so with *very valid* reasons.

...

On Fri, Dec 6, 2019 at 4:21 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
On Fri, Dec 06, 2019 at 12:21:47PM +0530, Sandeep Tripathy wrote:

...
On Thu, Dec 5, 2019 at 11:13 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

(I accidentally dropped the TF-A list in my last reply - now re-adding).

...
-----Original Message----- From: Sandeep Tripathy sandeep.tripathy@broadcom.com Sent: 05 December 2019 17:17

On Thu, Dec 5, 2019 at 9:54 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

> -----Original Message----- > From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of > Sandeep Tripathy via TF-A > Sent: 05 December 2019 12:00 > > My query is more on the spec. > The OS (eg: linux) and atf and psci spec seem to have assumed that > it is managing an independent system or managing 'all' the masters > in a coherent domain. > What other > reason could possibly encourage to not to follow a shutdown sequence. > Do you mean "to not follow a *graceful* shutdown sequence"?

Yes, exactly. Thanks!

...
If so I can think of 3 reasons:

It's much slower than a non-graceful shutdown.

But this is certainly not a concern for smaller embedded systems.

True, but TF-A tries to be a reference for all systems.

...
...

There is no observable difference between a graceful and non-graceful

shutdown from the calling OS's point of view. The OS presumably has no knowledge of other masters it does not manage.

Can CCN state machine go bad because one participating entity just goes off without marking its exit ? Please note I have not seen the issue and it is my assumption.

It depends on the interconnect. Arm interconnects designed for pre-v8.2 systems required explicit programming to take the master our of the coherency domain. Arm interconnects for v8.2+ systems do this automatically via hardware system coherency signals. The TF-A off/reset platform interfaces have provision to do this programming if necessary, but only for the running cluster, which is another reason not to use these PSCI functions in this scenario.

we use the reset/reset2/ platform interface for the coherency exit. I thought there might be some dependency on a proper core and cluster power down sequence like clearing smp bit flushing the local caches.

So let's get into details. The OS has either initiated or being asked by other masters in the system to perform either SYSTEM_OFF or _RESET. Now, IIUC OS can save all the data it needs to preserve are written to the non-volatile memory and then the poweroff/reboot sequence as you have described in earlier mails get executed. So what in the volatile memory(RAM or caches) you have to preserve at that point ?

/me still struggling to understand the use-case. I am asking again as you mentioned the requirements have diverged since the original thread in LKML. If it's same, can we please continue there by first getting quite a few open questions in that thread.

I will dig into that thread and check where it ended up as I was not directly involved in that. tl:dr; 1-if the secondary cores do not exit coherency domain interconnect can fail. So just doing 'reboot' can potentially fail such system. solution: 'system_off'/'system_reset/system_reset2' from any core may take care to do this for all the cores in firmware implementation ? And implementation can be in plat_specific hooks for respective call. 2-may be the cpu caches (NS and S dirty lines) can be of value. eg: may be some logs updated by core are in cache.

One thing that I have not seen answered so far is what's those data that OS maintains in RAM/caches that it's responsible for and fail to write it to non-volatile memory before executing shutdown. It looks like some design flaw and OS *must* take care to ensure that it has saved all the data. The whole discussion is based on that and have never got response for that question.

...

solution: follow graceful power down sequence for all. I don't know how. May be do IPI to bring all others to el3 and force power down sequence for them in order.

As I already guessed this and mentioned above, the idea sounds bad.

...

solution2: avoid 'reboot'. initiate cpuhp / suspend if possible.

Maybe, but it depends on your design.

-- Regards, Sudeep

Sandeep Tripathy

9 Dec 9 Dec

9:20 a.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sudeep,

On Sat, Dec 7, 2019 at 5:07 PM Sudeep Holla sudeep.holla@arm.com wrote:

...

On Sat, Dec 07, 2019 at 12:45:41PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep, My intention here was to figure out all the reasons to skip power down sequence deliberately.

Looking other way around, do we have reasons to execute power down sequence for reboot ? Except the *special* use case / design you have in your system ?

...
I feel the only reason we do not do graceful power down sequence is:-

In many core (may be 100(s) to have perceivable impact) server

systems it can increase the time of reboot.

For sure, this is one of the main reason and not the only one.

...

we do not see a valid generic use case where things can fail.

Indeed, even in your case as you have not given complete system design details and not explained why other alternatives suggested don't work, we have to isolate your case as custom.

...

complexity. I think this is a matter of intent for tf-a folks :)

If above is true I can close the thread and suggest to keep the solution to plat-specific hook.

Sorry, but we can't help unless this is generic usecase.

IIUC, since the secondaries are parked in the OS, you need to pull in them to secure side using IPI and then execute so called power down sequence. To be honest, I don't like it as it's being pushed into TF-A because OS refuses to do so with *very valid* reasons.

...
On Fri, Dec 6, 2019 at 4:21 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
On Fri, Dec 06, 2019 at 12:21:47PM +0530, Sandeep Tripathy wrote:

...
On Thu, Dec 5, 2019 at 11:13 PM Dan Handley Dan.Handley@arm.com wrote:

...
Hi Sandeep

(I accidentally dropped the TF-A list in my last reply - now re-adding).

...
-----Original Message----- From: Sandeep Tripathy sandeep.tripathy@broadcom.com Sent: 05 December 2019 17:17

On Thu, Dec 5, 2019 at 9:54 PM Dan Handley Dan.Handley@arm.com wrote: > > Hi Sandeep > > > -----Original Message----- > > From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of > > Sandeep Tripathy via TF-A > > Sent: 05 December 2019 12:00 > > > > My query is more on the spec. > > The OS (eg: linux) and atf and psci spec seem to have assumed that > > it is managing an independent system or managing 'all' the masters > > in a coherent domain. > > What other > > reason could possibly encourage to not to follow a shutdown sequence. > > > Do you mean "to not follow a *graceful* shutdown sequence"? Yes, exactly. Thanks! > If so I can think of 3 reasons: > 1. It's much slower than a non-graceful shutdown. But this is certainly not a concern for smaller embedded systems.

True, but TF-A tries to be a reference for all systems.

...
> 2. There is no observable difference between a graceful and non-graceful > shutdown from the calling OS's point of view. The OS presumably has no > knowledge of other masters it does not manage.

Can CCN state machine go bad because one participating entity just goes off without marking its exit ? Please note I have not seen the issue and it is my assumption.

It depends on the interconnect. Arm interconnects designed for pre-v8.2 systems required explicit programming to take the master our of the coherency domain. Arm interconnects for v8.2+ systems do this automatically via hardware system coherency signals. The TF-A off/reset platform interfaces have provision to do this programming if necessary, but only for the running cluster, which is another reason not to use these PSCI functions in this scenario.

we use the reset/reset2/ platform interface for the coherency exit. I thought there might be some dependency on a proper core and cluster power down sequence like clearing smp bit flushing the local caches.

So let's get into details. The OS has either initiated or being asked by other masters in the system to perform either SYSTEM_OFF or _RESET. Now, IIUC OS can save all the data it needs to preserve are written to the non-volatile memory and then the poweroff/reboot sequence as you have described in earlier mails get executed. So what in the volatile memory(RAM or caches) you have to preserve at that point ?

/me still struggling to understand the use-case. I am asking again as you mentioned the requirements have diverged since the original thread in LKML. If it's same, can we please continue there by first getting quite a few open questions in that thread.

I will dig into that thread and check where it ended up as I was not directly involved in that. tl:dr; 1-if the secondary cores do not exit coherency domain interconnect can fail. So just doing 'reboot' can potentially fail such system. solution: 'system_off'/'system_reset/system_reset2' from any core may take care to do this for all the cores in firmware implementation ? And implementation can be in plat_specific hooks for respective call. 2-may be the cpu caches (NS and S dirty lines) can be of value. eg: may be some logs updated by core are in cache.

One thing that I have not seen answered so far is what's those data that OS maintains in RAM/caches that it's responsible for and fail to write it to non-volatile memory before executing shutdown. It looks like some design flaw and OS *must* take care to ensure that it has saved all the data. The whole discussion is based on that and have never got response for that question.

*what's those data that OS maintains in RAM/caches that it's responsible for * Any software be it an application/driver sharing the coherent memory with another masters can assume that it need not do explicit cache-ops ever and coherency is guaranteed by platform (firmware/hardware/os)? The data updated to the coherent memory region may be in L1/L2 D$ and we want a graceful/abrupt shutdown/reboot of this *slave system* where other master(s) not managed by *slave system* 'linux/tf-a' are still functional and can snoop the data. In this case such application(s) have to do explicit cache flush on reboot/shutdown event on a coherent memory.

...

...
solution: follow graceful power down sequence for all. I don't know how. May be do IPI to bring all others to el3 and force power down sequence for them in order.

As I already guessed this and mentioned above, the idea sounds bad.

...
solution2: avoid 'reboot'. initiate cpuhp / suspend if possible.

Maybe, but it depends on your design.

-- Regards, Sudeep

Thanks Sandeep

Sudeep Holla

10:22 a.m.

New subject: psci shutdown do not follow graceful power off sequence

On Mon, Dec 09, 2019 at 02:50:43PM +0530, Sandeep Tripathy wrote:

...

Hi Sudeep,

[...]

...

*what's those data that OS maintains in RAM/caches that it's responsible for * Any software be it an application/driver sharing the coherent memory with another masters can assume that it need not do explicit cache-ops ever and coherency is guaranteed by platform (firmware/hardware/os)?

OK, you are missing something obvious in such design. If there are other slaves and masters depending on this slave(OSPM), then the master initiating the shutdown of this slave(OSPM) but be aware of it and can broadcast the same across. If master can't, then firmware dealing with this slave shutdown must. You simply can't assume things you currently are. Sounds like a design gap in such a multi master-slave system to me.

...

The data updated to the coherent memory region may be in L1/L2 D$ and we want a graceful/abrupt shutdown/reboot of this *slave system* where other master(s) not managed by *slave system*

Yes of-course slaves don't manage master. Not sure how the master and slave communicate in such a setup. Looks like some communication gap between them :)

...

'linux/tf-a' are still functional and can snoop the data. In this case such application(s) have to do explicit cache flush on reboot/shutdown event on a coherent memory.

Absence of Shutdown/Reboot notification in such a system seems to be the root of all such problems to me.

-- Regards, Sudeep

Sandeep Tripathy

11:59 a.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sudeep,

On Mon, Dec 9, 2019 at 3:53 PM Sudeep Holla sudeep.holla@arm.com wrote:

...

On Mon, Dec 09, 2019 at 02:50:43PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep,

[...]

...
*what's those data that OS maintains in RAM/caches that it's responsible for * Any software be it an application/driver sharing the coherent memory with another masters can assume that it need not do explicit cache-ops ever and coherency is guaranteed by platform (firmware/hardware/os)?

OK, you are missing something obvious in such design. If there are other slaves and masters depending on this slave(OSPM), then the master initiating the shutdown of this slave(OSPM) but be aware of it and can broadcast the same across.

Of Course it will. The issue is not about notification mechanism. If master can't, then firmware dealing with

...

this slave shutdown must. You simply can't assume things you currently are. Sounds like a design gap in such a multi master-slave system to me.

...
The data updated to the coherent memory region may be in L1/L2 D$ and we want a graceful/abrupt shutdown/reboot of this *slave system* where other master(s) not managed by *slave system*

Yes of-course slaves don't manage master. Not sure how the master and slave communicate in such a setup. Looks like some communication gap between them :)

...
'linux/tf-a' are still functional and can snoop the data. In this case such application(s) have to do explicit cache flush on reboot/shutdown event on a coherent memory.

Absence of Shutdown/Reboot notification in such a system seems to be the root of all such problems to me.

I did not say notification does not exist or applications can't do cache ops along with many other stuff or communication protocol it might have to do on a shutdown/reboot (not relevant). Trust me various approaches we discussed here so far and other CENH works :).

The generic discussion is: Is it the responsibility of an application to do cache maintenance on a *coherent memory* in shutdown/reboot path where it never have to do so in its normal course. Is it not valid to expect such mechanism from the underlying platform firmware or hardware. It is the core which is going down and expected to do so in a graceful manner if possible. If the limitation is time, understandable but not exciting for smaller systems.

...

-- Regards, Sudeep

Thanks Sandeep

Sudeep Holla

12:15 p.m.

New subject: psci shutdown do not follow graceful power off sequence

On Mon, Dec 09, 2019 at 05:29:21PM +0530, Sandeep Tripathy wrote:

...

Hi Sudeep,

On Mon, Dec 9, 2019 at 3:53 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
On Mon, Dec 09, 2019 at 02:50:43PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep,

[...]

...
*what's those data that OS maintains in RAM/caches that it's responsible for * Any software be it an application/driver sharing the coherent memory with another masters can assume that it need not do explicit cache-ops ever and coherency is guaranteed by platform (firmware/hardware/os)?

OK, you are missing something obvious in such design. If there are other slaves and masters depending on this slave(OSPM), then the master initiating the shutdown of this slave(OSPM) but be aware of it and can broadcast the same across.

Of Course it will. The issue is not about notification mechanism.

And what's done in those masters with *this particular* notification ? Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?

...

...
If master can't, then firmware dealing with this slave shutdown must. You simply can't assume things you currently are. Sounds like a design gap in such a multi master-slave system to me.

...
The data updated to the coherent memory region may be in L1/L2 D$ and we want a graceful/abrupt shutdown/reboot of this *slave system* where other master(s) not managed by *slave system*

Yes of-course slaves don't manage master. Not sure how the master and slave communicate in such a setup. Looks like some communication gap between them :)

...
'linux/tf-a' are still functional and can snoop the data. In this case such application(s) have to do explicit cache flush on reboot/shutdown event on a coherent memory.

Absence of Shutdown/Reboot notification in such a system seems to be the root of all such problems to me.

I did not say notification does not exist or applications can't do cache ops along with many other stuff or communication protocol it might have to do on a shutdown/reboot (not relevant).

...

Trust me various approaches we discussed here so far and other CENH works :).

I am not saying other approaches are not tried/discussed. But I was not aware of it. Also I am still not aware of the full design of your system yet.

CENH ?

...

The generic discussion is: Is it the responsibility of an application to do cache maintenance on a *coherent memory* in shutdown/reboot path where it never have to do so in its normal course.

OS don't(or can't as it's about to shutdown) care about the data in this case. The notification is an indication to the application or other masters.

...

Is it not valid to expect such mechanism from the underlying platform firmware or hardware. It is the core which is going down and expected to do so in a graceful manner if possible. If the limitation is time, understandable but not exciting for smaller systems.

Not sure if that's the only reason. The core has also notified that it's about to power off or reboot and that's OS takes care to save what it needs and platform may give chance to others to do the same via notifications.

-- Regards, Sudeep

Sandeep Tripathy

3:06 p.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s). On shutdown/reboot notification what they ought to do ? If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee (1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?

Thanks Sandeep On Mon, Dec 9, 2019 at 5:45 PM Sudeep Holla sudeep.holla@arm.com wrote:

...

On Mon, Dec 09, 2019 at 05:29:21PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep,

On Mon, Dec 9, 2019 at 3:53 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
On Mon, Dec 09, 2019 at 02:50:43PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep,

[...]

...
*what's those data that OS maintains in RAM/caches that it's responsible for * Any software be it an application/driver sharing the coherent memory with another masters can assume that it need not do explicit cache-ops ever and coherency is guaranteed by platform (firmware/hardware/os)?

OK, you are missing something obvious in such design. If there are other slaves and masters depending on this slave(OSPM), then the master initiating the shutdown of this slave(OSPM) but be aware of it and can broadcast the same across.

Of Course it will. The issue is not about notification mechanism.

And what's done in those masters with *this particular* notification ?

Anything but not cache maintenance preferably.

...

Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?

Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks. But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.

...

...
...
If master can't, then firmware dealing with this slave shutdown must. You simply can't assume things you currently are. Sounds like a design gap in such a multi master-slave system to me.

...
The data updated to the coherent memory region may be in L1/L2 D$ and we want a graceful/abrupt shutdown/reboot of this *slave system* where other master(s) not managed by *slave system*

Yes of-course slaves don't manage master. Not sure how the master and slave communicate in such a setup. Looks like some communication gap between them :)

...
'linux/tf-a' are still functional and can snoop the data. In this case such application(s) have to do explicit cache flush on reboot/shutdown event on a coherent memory.

Absence of Shutdown/Reboot notification in such a system seems to be the root of all such problems to me.

I did not say notification does not exist or applications can't do cache ops along with many other stuff or communication protocol it might have to do on a shutdown/reboot (not relevant).

OK

...
Trust me various approaches we discussed here so far and other CENH works :).

I am not saying other approaches are not tried/discussed. But I was not aware of it. Also I am still not aware of the full design of your system yet.

I think now we have narrowed down the discussion to very specific cache maintenance ownership issue.

...

CENH ?

sorry. pls ignore cute embedded .. hacks !

...

...
The generic discussion is: Is it the responsibility of an application to do cache maintenance on a *coherent memory* in shutdown/reboot path where it never have to do so in its normal course.

OS don't(or can't as it's about to shutdown) care about the data in this case. The notification is an indication to the application or other masters.

...
Is it not valid to expect such mechanism from the underlying platform firmware or hardware. It is the core which is going down and expected to do so in a graceful manner if possible. If the limitation is time, understandable but not exciting for smaller systems.

Not sure if that's the only reason. The core has also notified that it's about to power off or reboot and that's OS takes care to save what it needs and platform may give chance to others to do the same via notifications.

-- Regards, Sudeep

Sudeep Holla

3:37 p.m.

New subject: psci shutdown do not follow graceful power off sequence

On Mon, Dec 09, 2019 at 08:36:18PM +0530, Sandeep Tripathy wrote:

...

Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s).

Are we referring OS apps/drivers or something on the other masters ? Please be as specific as possible.

...

On shutdown/reboot notification what they ought to do ?

If on the same slave OS, then stop all on-going transactions. There are hooks to do that. Simply, it can map to be remove device calls. And the other masters also have notification to do the same.

...

If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee

Sure, but I need clarity on above to answer this.

...

(1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?

I think we had enough discussions so far to tell this is not a generic requirement.

[...]

...

...
And what's done in those masters with *this particular* notification ?

Anything but not cache maintenance preferably.

And what does that *anything* include in your case. Please be specific with example. I am still failing to understand the role of this fancy master who wants its slave to take care of cache maintenance for it.

...

...
Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?

Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks.

...

But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.

OK. SO why are dirty lines in local caches, why can't the driver/device take action on shutdown ?

...

...
I am not saying other approaches are not tried/discussed. But I was not aware of it. Also I am still not aware of the full design of your system yet.

I think now we have narrowed down the discussion to very specific cache maintenance ownership issue.

Not sure yet, let's see ;)

...

...
CENH ?

sorry. pls ignore cute embedded .. hacks !

Hmmm...

-- Regards, Sudeep

Sandeep Tripathy

4:44 p.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sudeep, On Mon, Dec 9, 2019 at 9:07 PM Sudeep Holla sudeep.holla@arm.com wrote:

...

On Mon, Dec 09, 2019 at 08:36:18PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s).

Are we referring OS apps/drivers or something on the other masters ? Please be as specific as possible.

Linux application and drivers running on arm cluster and sharing DDR over CCN with other masters. Not sure if the specifics will make it any clearer. for ex: Broadcom smartNIC or any accelerator on PCIe. I think the details will only distract the discussion to platform specifics where as the issue is not.

Assume one application is logging some data to the said coherent cacheable ddr. on reboot /shutdown notification it will stop logging. If cpu cache is not flushed log is lost. The cache can be small but the buffer can be huge for a range flush.

...

...
On shutdown/reboot notification what they ought to do ?

If on the same slave OS, then stop all on-going transactions. There are hooks to do that. Simply, it can map to be remove device calls. And the other masters also have notification to do the same.

Done. stopped all transactions. but data lies in local caches.

...

...
If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee

Sure, but I need clarity on above to answer this.

...
(1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?

I think we had enough discussions so far to tell this is not a generic requirement.

Thanks for the quick responses. looks like finally we have to agree to disagree.

...

[...]

...
...
And what's done in those masters with *this particular* notification ?

Anything but not cache maintenance preferably.

And what does that *anything* include in your case. Please be specific with example. I am still failing to understand the role of this fancy master who wants its slave to take care of cache maintenance for it.

Anything can be nothing also or as you mentioned above it can just stop further transactions. but it is agnostic of the underlying caches.

...

...
...
Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?

Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks.

OK

...
But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.

OK. SO why are dirty lines in local caches, why can't the driver/device take action on shutdown ?

Since driver/app is working on coherent memory by convention it can assume no explicit maintenance.

...

...
...
I am not saying other approaches are not tried/discussed. But I was not aware of it. Also I am still not aware of the full design of your system yet.

I think now we have narrowed down the discussion to very specific cache maintenance ownership issue.

Not sure yet, let's see ;)

...
...
CENH ?

sorry. pls ignore cute embedded .. hacks !

Hmmm...

-- Regards, Sudeep

Thanks Sandeep

Sudeep Holla

5:10 p.m.

New subject: psci shutdown do not follow graceful power off sequence

On Mon, Dec 09, 2019 at 10:14:25PM +0530, Sandeep Tripathy wrote:

...

Hi Sudeep, On Mon, Dec 9, 2019 at 9:07 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
On Mon, Dec 09, 2019 at 08:36:18PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s).

Are we referring OS apps/drivers or something on the other masters ? Please be as specific as possible.

Linux application and drivers running on arm cluster and sharing DDR over CCN with other masters. Not sure if the specifics will make it any clearer. for ex: Broadcom smartNIC or any accelerator on PCIe. I think the details will only distract the discussion to platform specifics where as the issue is not.

Sure. Now I feel that it's nothing to do with external master, just Linux OS(slave) and it's logging application. Please shout if that's not the case. The producer is OS while the consumer is external master/slave.

...

Assume one application is logging some data to the said coherent cacheable ddr. on reboot /shutdown notification it will stop logging.

The application has to terminate cleanly when SIGTERM is sent(may be using appropriate handler. And can intimate the same to the consumers so that they can consume the data before it's lost. That's exactly why I kept mentioning notification.It need not be generic shutdown, but can be from logging producer to the logging consumers.

...

If cpu cache is not flushed log is lost. The cache can be small but the buffer can be huge for a range flush.

If both producers and consumers are aware of logging being stopped due to shutdown or reboot, we must not be worried about caches here.

...

...
...
On shutdown/reboot notification what they ought to do ?

If on the same slave OS, then stop all on-going transactions. There are hooks to do that. Simply, it can map to be remove device calls. And the other masters also have notification to do the same.

Done. stopped all transactions. but data lies in local caches.

...
...
If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee

Sure, but I need clarity on above to answer this.

...
(1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?

I think we had enough discussions so far to tell this is not a generic requirement.

Thanks for the quick responses. looks like finally we have to agree to disagree.

Yes definitely disagree with the attempted approach both in OS and TF-A to solve the issue as you have describe(rather how I have understood)

...

...
[...]

...
...
And what's done in those masters with *this particular* notification ?

Anything but not cache maintenance preferably.

And what does that *anything* include in your case. Please be specific with example. I am still failing to understand the role of this fancy master who wants its slave to take care of cache maintenance for it.

Anything can be nothing also or as you mentioned above it can just stop further transactions. but it is agnostic of the underlying caches.

OK, but with more info, I feel involving caches into your problem itself is incorrect.

...

...
...
...
Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?

Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks.

OK

...
But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.

OK. SO why are dirty lines in local caches, why can't the driver/device take action on shutdown ?

Since driver/app is working on coherent memory by convention it can assume no explicit maintenance.

Yes, but they(consumers in your case) need to be aware of start and stop of producer.

-- Regards, Sudeep

Sandeep Tripathy

10 Dec 10 Dec

10:29 a.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sudeep,

On Mon, Dec 9, 2019 at 10:40 PM Sudeep Holla sudeep.holla@arm.com wrote:

...

On Mon, Dec 09, 2019 at 10:14:25PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep, On Mon, Dec 9, 2019 at 9:07 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
On Mon, Dec 09, 2019 at 08:36:18PM +0530, Sandeep Tripathy wrote:

...
Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s).

Are we referring OS apps/drivers or something on the other masters ? Please be as specific as possible.

Linux application and drivers running on arm cluster and sharing DDR over CCN with other masters. Not sure if the specifics will make it any clearer. for ex: Broadcom smartNIC or any accelerator on PCIe. I think the details will only distract the discussion to platform specifics where as the issue is not.

Sure. Now I feel that it's nothing to do with external master, just Linux OS(slave) and it's logging application. Please shout if that's not the case. The producer is OS while the consumer is external master/slave.

Yes in this example.

...

...
Assume one application is logging some data to the said coherent cacheable ddr. on reboot /shutdown notification it will stop logging.

The application has to terminate cleanly when SIGTERM is sent(may be using appropriate handler. And can intimate the same to the consumers so that they can consume the data before it's lost.

The DDR is not powered off ever in this scenario. So when to/how to consume the log is up to the (consumer) application design. Assume its an incrementing log ie. after reboot this (producer)master again will continue to dump more records on to it. How would you suggest to handle this. In this case both producer and consumer deliberately asked for coherent memory so why it should also consider a possible data loss due to platforms not giving the coherency because it will add some time to flush the core caches. If they get non-cached(coherent) memory range they don't have to do anything isn't it ?

...

That's exactly why I kept mentioning notification.It need not be generic shutdown, but can be from logging producer to the logging consumers.

...
If cpu cache is not flushed log is lost. The cache can be small but the buffer can be huge for a range flush.

If both producers and consumers are aware of logging being stopped due to shutdown or reboot, we must not be worried about caches here.

...
...
...
On shutdown/reboot notification what they ought to do ?

If on the same slave OS, then stop all on-going transactions. There are hooks to do that. Simply, it can map to be remove device calls. And the other masters also have notification to do the same.

Done. stopped all transactions. but data lies in local caches.

...
...
If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee

Sure, but I need clarity on above to answer this.

...
(1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?

I think we had enough discussions so far to tell this is not a generic requirement.

Thanks for the quick responses. looks like finally we have to agree to disagree.

Yes definitely disagree with the attempted approach both in OS and TF-A to solve the issue as you have describe(rather how I have understood)

...
...
[...]

...
...
And what's done in those masters with *this particular* notification ?

Anything but not cache maintenance preferably.

And what does that *anything* include in your case. Please be specific with example. I am still failing to understand the role of this fancy master who wants its slave to take care of cache maintenance for it.

Anything can be nothing also or as you mentioned above it can just stop further transactions. but it is agnostic of the underlying caches.

OK, but with more info, I feel involving caches into your problem itself is incorrect.

negative.

...

...
...
...
...
Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?

Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks.

OK

...
But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.

OK. SO why are dirty lines in local caches, why can't the driver/device take action on shutdown ?

Since driver/app is working on coherent memory by convention it can assume no explicit maintenance.

Yes, but they(consumers in your case) need to be aware of start and stop of producer.

That is a valid constraint. But we should not force application to do coherency management when it deliberately asked for coherent memory.

...

-- Regards, Sudeep

Thanks Sandeep

Sudeep Holla

3:08 p.m.

New subject: psci shutdown do not follow graceful power off sequence

Hi Sandeep,

The more we discuss, I think we will get to know all sorts of CENH(as you put) are done all over the place and expecting system is work just fine even when lots of interface/contracts are broken is just .....(fill your own word :))

I promise not to discuss these CENH any further after this email :)

On Tue, Dec 10, 2019 at 03:59:01PM +0530, Sandeep Tripathy wrote:

...

Hi Sudeep,

On Mon, Dec 9, 2019 at 10:40 PM Sudeep Holla sudeep.holla@arm.com wrote:

...
The application has to terminate cleanly when SIGTERM is sent(may be using appropriate handler. And can intimate the same to the consumers so that they can consume the data before it's lost.

The DDR is not powered off ever in this scenario. So when to/how to consume the log is up to the (consumer) application design.

CENH#1

...

Assume its an incrementing log ie. after reboot this (producer) master again will continue to dump more records on to it.

CENH#2

(I see the roles being exchanged, OS was slave + producer and not sure what you are referring has master above. Anyways use KDUMP and features like that if you need RAM dump for portions of memory given to the kernel.

...

How would you suggest to handle this. In this case both producer and consumer deliberately asked for coherent memory so why it should also consider a possible data loss due to platforms not giving the coherency because it will add some time to flush the core caches.

CENH#3, not sure if such flexibility should be given to applications.

...

If they get non-cached(coherent) memory range they don't have to do anything isn't it ?

Applications must not try that, kernel mostly provides cached memory from it's memory allocator. I get a sense that this is some magic pre-allocated memory that is either reserved or taken out of kernel memory, but the application (along with its driver) maps it coherent in some magic way.

-- Regards, Sudeep

2041

days inactive

2046

days old

tf-a@lists.trustedfirmware.org

14 comments

participants

tags (0)

participants (3)

Dan Handley
Sandeep Tripathy
Sudeep Holla