Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s). On shutdown/reboot notification what they ought to do ? If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee (1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?
Thanks Sandeep On Mon, Dec 9, 2019 at 5:45 PM Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Dec 09, 2019 at 05:29:21PM +0530, Sandeep Tripathy wrote:
Hi Sudeep,
On Mon, Dec 9, 2019 at 3:53 PM Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Dec 09, 2019 at 02:50:43PM +0530, Sandeep Tripathy wrote:
Hi Sudeep,
[...]
*what's those data that OS maintains in RAM/caches that it's responsible for * Any software be it an application/driver sharing the coherent memory with another masters can assume that it need not do explicit cache-ops ever and coherency is guaranteed by platform (firmware/hardware/os)?
OK, you are missing something obvious in such design. If there are other slaves and masters depending on this slave(OSPM), then the master initiating the shutdown of this slave(OSPM) but be aware of it and can broadcast the same across.
Of Course it will. The issue is not about notification mechanism.
And what's done in those masters with *this particular* notification ?
Anything but not cache maintenance preferably.
Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?
Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks. But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.
If master can't, then firmware dealing with this slave shutdown must. You simply can't assume things you currently are. Sounds like a design gap in such a multi master-slave system to me.
The data updated to the coherent memory region may be in L1/L2 D$ and we want a graceful/abrupt shutdown/reboot of this *slave system* where other master(s) not managed by *slave system*
Yes of-course slaves don't manage master. Not sure how the master and slave communicate in such a setup. Looks like some communication gap between them :)
'linux/tf-a' are still functional and can snoop the data. In this case such application(s) have to do explicit cache flush on reboot/shutdown event on a coherent memory.
Absence of Shutdown/Reboot notification in such a system seems to be the root of all such problems to me.
I did not say notification does not exist or applications can't do cache ops along with many other stuff or communication protocol it might have to do on a shutdown/reboot (not relevant).
OK
Trust me various approaches we discussed here so far and other CENH works :).
I am not saying other approaches are not tried/discussed. But I was not aware of it. Also I am still not aware of the full design of your system yet.
I think now we have narrowed down the discussion to very specific cache maintenance ownership issue.
CENH ?
sorry. pls ignore cute embedded .. hacks !
The generic discussion is: Is it the responsibility of an application to do cache maintenance on a *coherent memory* in shutdown/reboot path where it never have to do so in its normal course.
OS don't(or can't as it's about to shutdown) care about the data in this case. The notification is an indication to the application or other masters.
Is it not valid to expect such mechanism from the underlying platform firmware or hardware. It is the core which is going down and expected to do so in a graceful manner if possible. If the limitation is time, understandable but not exciting for smaller systems.
Not sure if that's the only reason. The core has also notified that it's about to power off or reboot and that's OS takes care to save what it needs and platform may give chance to others to do the same via notifications.
-- Regards, Sudeep