Re: [TF-A] How should we manage patches affecting multiple users? - TF-A

28 Jun 2020


      My take on this is perhaps not unexpectantly fairly aligned with Sandrine's.
We as a project have over 30 platforms now up-streamed and the expectations is these are managed by the identified platform owners. We look to them to decide on which of any new capabilities offered are supported in their platforms. We have a strategy not to break up-streamed platforms except where we have clearly announced deprecation and this is all documented here https://trustedfirmware-a.readthedocs.io/en/latest/process/platform-compatib... and where interfaces have changed and the work is trivial the submitter is expected to attempt to make a good effort to migrate platforms.
To support this the current CI system hosted by Arm can only verify the builds of most platforms which is one of the gerrit approvals needed to merge a patch. Of course that does not mean the executables produced will necessarily run correctly which is where platform owners would need to assist with their own approvals and own testing if possible. With the Arm Juno platform and various Arm FVP models we do have the capability to test but that misses out all the other platforms up-streamed.
With the OpenCI coming, which we owe a Tech-Forum session on BTW,  the Juno and Arm FVP models will also be made available and if platform owners want they will be able to add their platforms to be tested if they want to invest the effort. Sadly migrating to the OpenCI will take time and will have to be delivered in phases but the hope is this will be useful to platform owners as well as for core changes with the visibility of the CI results. The various Arm platforms are generally owned and managed by different teams and go through the same processes as other non Arm platforms on getting patched up-streamed.
So when new core features are added for new Arm hardware IP support or new software features that change behaviour as previously we still look to the platform owners to decide if they want support for these in their platforms. For some trivial changes that may be in the form of reviewing changes prepared but for others that may be deciding if they want to invest the effort to implement themselves. For core changes as well as making patches available via Gerrit for review we have been making more use of the TF-A mailing list to announce the patches are available. Indeed we are trying to make use of the mailing list while work is still in design to discuss openly decisions that need to be made which may or may not effect platforms. The Tech-Forum is also being used to present and discuss ideas and ongoing work. This is all being done to be more open about upcoming changes.  We would welcome discussions on the mailing list and tech-forum sessions from platform owners about any subjects. This could be around coordinated support across platforms of new capabilities vs each platform following its own path. That of course holds challenges in coordination of effort and who's effort.
Hopefully the direction of travel here for the project helps with some of the suggestions but happy to discuss more here on the mailing list of in a tech-forum meeting.
Thanks
Joanna
On 25/06/2020, 03:16, "TF-A on behalf of Varun Wadekar via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi,
>> As you said, even a small change might break a platform and unfortunately we're not in a position to detect that with the CI yet.
I agree that this is a limitation today. If you remember,  I have been advocating for announcing on the email alias when in doubt. That way we minimize surprises for the platform owners.
>> IMO we cannot reasonably announce all such changes and I would think that contributors are responsible for keeping an eye on patches and testing them if they think they might be affected.
I understand. We should announce on the mailing list, when in doubt. I strongly disagree with the second part of the statement though. Asking platform owners to keep an eye on every change defeats the purpose of upstreaming. The burden must lie on the implementer of a change to try and upgrade consumers.
>> But I think that the same change might be considered as an improvement by some, and a useless/undesirable change by others.
Possible. And then we take call as a community. If it makes sense for some platforms to move ahead, then that should be OK too.
>> We might be able to reduce the number of build options in some cases but there will always be a need to retain most of them IMO
Thinking out loud - maybe moving to runtime checks makes sense in some cases instead of makefile variables e.g. CPU erratas. We should look at each case individually.
I guess, the main problem seems to be low level of testing. So any implementer wont be confident making a change that affects other platforms. That's when sending an email to the community makes sense. We tackle the situation as a team, instead of one person doing all the work.
-Varun
-----Original Message-----
    From: Sandrine Bailleux sandrine.bailleux@arm.com 
    Sent: Wednesday, June 24, 2020 4:54 AM
    To: Varun Wadekar vwadekar@nvidia.com
    Cc: tf-a@lists.trustedfirmware.org
    Subject: Re: [TF-A] How should we manage patches affecting multiple users?
External email: Use caution opening links or attachments
Hi Varun,
Thanks for sharing your opinion.
On 6/20/20 1:55 AM, Varun Wadekar wrote:
    > Hello @Sandrine Bailleux,
    >
    > Thanks for starting the email chain. My response to the issues you flagged in the email below.
    >
    > 1. This seems to be a bit tricky. As a consumer of an API, one would expect that it works as advertised. If the behavior changes, thus affecting the expectations or assumptions surrounding the API, then they should be announced. With so many platforms in the tree, we should always assume that even a small change will break a platform.
I agree. Whenever there is clear intention to change the expectations or assumptions of an API, this should be announced and discussed.
However, there are cases where we might want to rework the implementation of an API (to clean the code for example), make some improvements or extend its functionality. All of that might be done with no intent to step outside of the API original scope but still subtly break some code. As you said, even a small change might break a platform and unfortunately we're not in a position to detect that with the CI yet. IMO we cannot reasonably announce all such changes and I would think that contributors are responsible for keeping an eye on patches and testing them if they think they might be affected.
> 2. For improvements, we should strive to upgrade all consumers of the API/makefile/features. This will ensure that all platforms remain current and we reduce maintenance cost in the long run. I assume, that such improvements will give birth to additional configs/makefile settings/platform variables etc. We would be signing up for a lot of maintenance by allowing some platforms to lag.
    >
    > For straight forward changes (e.g. the change under discussion) I assume that the platforms will respond and we would get the changes merged relatively fast. For controversial features, this will spark a healthy discussion in the community.
I agree with you in principle. But I think that the same change might be considered as an improvement by some, and a useless/undesirable change by others.
For example, the change under discussion is an improvement from our point of view. It will lower the maintenance cost, as any new file added in the GIC driver in the future will be automatically pulled in by platforms that use the newly introduced GIC makefile, without the need to update all platforms makefiles. But this might not be what everyone wants, some platform owners might prefer continue cherry-picking specific GIC source files to retain control over what they include in their firmware. In this case, is it the right thing to do to "upgrade"
    them? I think this is debatable.
Maybe a better alternative is to simply make people aware of the change, provide a link to how the migration was done for Arm platforms in this case, so that they've got an example to guide them if they wish to embrace the change for their own platform ports.
> 3. For a case where the change breaks platforms, it makes sense to just upgrade all of them.
    >
    >>> Now the question we were pondering in Gerrit is, where does the responsibility of migrating all platforms/users lie for all these types of changes? Is the patch author responsible to do all the work?
    >
    > [VW] IMO, the author of such a patch is the best person to upgrade all affected platforms/users. The main reason being that he/she is already an expert for the patch.
Agree that the patch author is an expert for his own patch but that does not mean he's also an expert on how his patch should be applied for another platform port he's not familiar with.
> But, this might not be true in some cases where the author will need help from the community. For such cases, we should ask for help on the mailing list.
Yes, I think such cases require collaboration indeed.
> I also want to highlight the dangers of introducing make variables as a solution. The current situation is already unmanageable with so many build variables and weak handlers. We should try and avoid adding more.
Not sure how this ties with the original discussions but I agree with you, we've got far too many build options in this project. This makes testing a lot harder, as it means we have hundreds of valid combinations of build options to verify. It also makes the code harder to read and understand because it is sprinkled with #ifdefs.
Hopefully, switching to a KConfig-like configuration system in the future will help alleviate this issue, as at least it will allow us to express the valid combinations of build options in a more robust manner (today the build system tries to forbid invalid combinations but I am sure it's far from exhaustive in these types of checks) and also visualize what's been enabled/disabled more easily for a specific build.
Even though I acknowledge this problem, I am not sure we can completely solve it. Firmware is not general-purpose software and it requires a lot more control over the features you include or not because of stricter memory and CPU constraints. I think people need a way to turn on/off features at a fine granularity. We might be able to reduce the number of build options in some cases but there will always be a need to retain most of them IMO.
> Finally, I agree that there's no silver bullet and we have to deal with each situation differently. Communication is key, as you rightly said. As long we involve the community, we should be OK.
    >
    > -Varun
    >
    > -----Original Message-----
    > From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of 
    > Sandrine Bailleux via TF-A
    > Sent: Friday, June 19, 2020 2:57 AM
    > To: tf-a tf-a@lists.trustedfirmware.org
    > Subject: [TF-A] How should we manage patches affecting multiple users?
    >
    > External email: Use caution opening links or attachments
    >
    >
    > Hello everyone,
    >
    > I would like to start a discussion around the way we want to handle changes that affect all platforms/users. This subject came up in this code review [1].
    >
    > I'll present my views on the subject and would like to know what others think.
    >
    > First of all, what do we mean by "changes that affect all platforms"? It would be any change that is not self-contained within a platform port.
    > IOW, a change in some file outside of the plat/ directory. It could be in a driver, in a library, in the build system, in the platform interfaces called by the generic code... the list goes on.
    >
    > 1. Some are just implementation changes, they leave the interfaces unchanged. These do not break other platforms builds since the call sites don't need to be updated. However, they potentially change the responsibilities of the interface, in which case this might introduce behavioral changes that all users of the interface need to be aware of and adapt their code to.
    >
    > 2. Some other changes might introduce optional improvements. They introduce a new way of doing things, perhaps a cleaner or more efficient one. Users can safely stay with the old way of doing things without fear of breakage but they would benefit from migrating.
    >
    > This is the case of Alexei's patch [1], which introduces an 
    > intermediate
    > GICv2 makefile, that Arm platforms now include instead of pulling each individual file by themselves. At this point, it is just an improvement that introduces an indirection level in the build system. This is an improvement in terms of maintainability because if we add a new essential file to this driver in the future, we'll just need to add it in this new GICv2 makefile without having to update all platforms makefiles. However, platform makefiles can continue to pull in individual files if they wish to.
    >
    > 3. Some other changes might break backwards compatibility. This is something we should avoid as much as possible and we should always strive to provide a migration path. But there are cases where there might be no other way than to break things. In this case, all users have to be migrated.
    >
    >
    > Now the question we were pondering in Gerrit is, where does the responsibility of migrating all platforms/users lie for all these types of changes? Is the patch author responsible to do all the work?
    >
    > I think it's hard to come up with a one-size-fits-all answer. It really depends on the nature of changes, their consequences, the amount of work needed to do the migration, the ability for the patch author to test these changes, to name a few.
    >
    > I believe the patch author should migrate all users on a best-effort basis. If it's manageable then they should do the work and propose a patch to all affected users for them to review and test on their platforms.
    >
    > On the other hand, if it's too much work or impractical, then I think the best approach would be for the patch author to announce and discuss the changes on this mailing list. In some scenarios, the changes might be controversial and not all platform owners might agree that the patch brings enough benefit compared to the migration effort, in which case the changes might have to be withdrawn or re-thought. In any case, I believe communication is key here and no one should take any unilateral decision on their own if it affects other users.
    >
    > I realize this is a vast subject and that we probably won't come up with an answer/reach an agreement on all the aspects of the question. But I'd be interested in hearing other contributors' view and if they've got any experience managing these kinds of issues, perhaps in other open source project.
    >
    > Best regards,
    > Sandrine
    >
    > [1] 
    > https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/4538
    > --
    > TF-A mailing list
    > TF-A@lists.trustedfirmware.org
    > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
    >
    -- 
    TF-A mailing list
    TF-A@lists.trustedfirmware.org
    https://lists.trustedfirmware.org/mailman/listinfo/tf-a