Hi Glen, Don, and others,
I've seen that a couple of TF-A patches I've been CCed on recently often seem to fail the CI run (Allow-CI+1) due to some strange build-time errors that don't seem to have anything to do with the patch at hand, and then one of the maintainers usually suggests that the patch needs a rebase, and the next CI run succeeds after that. Here are two recent examples:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16160 https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/14666
I was wondering if this is a known problem and if the CI can do anything here to mitigate it and save developers from this extra layer of friction? I'm not really sure why a rebase was necessary in either of these examples or why the CI run failed before it (unless the whole repository was in a bad state that didn't build, but since all submissions are guarded by the CI that shouldn't have been possible?). But I also don't really understand why the rebase would make a difference for the CI anyway. Generally, when a patch is submitted in Gerrit, that means it is cherry-picked onto the current master (regardless of what parent commit it was uploaded with). Since the CI is supposed to be a test run for submission, I would expect that the CI should also test a patch by cherry-picking it onto the current master, not just by building the patch on top of whatever parent commit it was uploaded with. But since rebasing a patch evidently seems to make a difference to the CI, that suggests that it's currently doing the latter strategy? Should that maybe be changed to the former to avoid these kinds of issues?
If this isn't a known problem yet maybe it would be worth adding it to the JIRA?
Thanks, Julius
Hi Julius,
Let me try...
This relates to TF-A continuous integration model spanning multiple trees incl. TF-A, TF-A-tests, tf-a-ci-scripts, Hafnium, job configs etc. When Allow-CI+1 label is applied to a TF-A change, jenkins clones the tree at the TF-A change (which can be old vs tip of integration), and clones tip of master for others. This can easily get out of sync when a change is left unattended even for few days, see the example below and attached picture (apologies I didn't manage to express my ascii art directly within the email).
At some point in time a TFA0 change is developed (e.g. branched from master). CI master head is CI0 at this very moment. Allow-CI+1 is applied to TF-A change and passes. Later a new test build config CI1 change is merged to tf-a-ci-scripts requiring a companion TFA1 change to build properly. When we come later to review TFA0 we launch Allow CI+1 but now takes CI1 as tip of master. Build breaks because TFA0 tree misses TFA1 in its history. A TFA0 rebase to TFA0' solves the issue.
Now add all aforementioned trees in this picture, the possible occurrence for this problem grows significantly.
Regards, Olivier.
From: Julius Werner via TF-A tf-a@lists.trustedfirmware.org Sent: 12 August 2022 01:09 To: tf-a tf-a@lists.trustedfirmware.org; Glen Valante glen.valante@linaro.org; Don Harbin don.harbin@linaro.org Subject: [TF-A] TF-A CI often fail due to missing rebase Hi Glen, Don, and others,
I've seen that a couple of TF-A patches I've been CCed on recently often seem to fail the CI run (Allow-CI+1) due to some strange build-time errors that don't seem to have anything to do with the patch at hand, and then one of the maintainers usually suggests that the patch needs a rebase, and the next CI run succeeds after that. Here are two recent examples:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16160 https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/14666
I was wondering if this is a known problem and if the CI can do anything here to mitigate it and save developers from this extra layer of friction? I'm not really sure why a rebase was necessary in either of these examples or why the CI run failed before it (unless the whole repository was in a bad state that didn't build, but since all submissions are guarded by the CI that shouldn't have been possible?). But I also don't really understand why the rebase would make a difference for the CI anyway. Generally, when a patch is submitted in Gerrit, that means it is cherry-picked onto the current master (regardless of what parent commit it was uploaded with). Since the CI is supposed to be a test run for submission, I would expect that the CI should also test a patch by cherry-picking it onto the current master, not just by building the patch on top of whatever parent commit it was uploaded with. But since rebasing a patch evidently seems to make a difference to the CI, that suggests that it's currently doing the latter strategy? Should that maybe be changed to the former to avoid these kinds of issues?
If this isn't a known problem yet maybe it would be worth adding it to the JIRA?
Thanks, Julius
Actually missed to answer the questions:
I would expect that the CI should also test a patch by cherry-picking it onto the current
master, not just by building the patch on top of whatever parent commit it was uploaded with. But since rebasing a patch evidently seems to make a difference to the CI, that suggests that it's currently doing the latter strategy?
Yes.
Should that maybe be changed to
the former to avoid these kinds of issues?
I believe this design relates to legacy. Indeed I wonder if we could modify scripts to clone tip of TF-A integration and cherry-pick the change. Perhaps there are gerrit option to help with this. But this would mean Allow-CI+1 can not always be applied if there are conflicts due to new changes merged in integration in the meanwhile.
________________________________________ From: Olivier Deprez via TF-A tf-a@lists.trustedfirmware.org Sent: 12 August 2022 10:16 To: tf-a; Glen Valante; Don Harbin; Julius Werner Subject: [TF-A] Re: TF-A CI often fail due to missing rebase
Hi Julius,
Let me try...
This relates to TF-A continuous integration model spanning multiple trees incl. TF-A, TF-A-tests, tf-a-ci-scripts, Hafnium, job configs etc. When Allow-CI+1 label is applied to a TF-A change, jenkins clones the tree at the TF-A change (which can be old vs tip of integration), and clones tip of master for others. This can easily get out of sync when a change is left unattended even for few days, see the example below and attached picture (apologies I didn't manage to express my ascii art directly within the email).
At some point in time a TFA0 change is developed (e.g. branched from master). CI master head is CI0 at this very moment. Allow-CI+1 is applied to TF-A change and passes. Later a new test build config CI1 change is merged to tf-a-ci-scripts requiring a companion TFA1 change to build properly. When we come later to review TFA0 we launch Allow CI+1 but now takes CI1 as tip of master. Build breaks because TFA0 tree misses TFA1 in its history. A TFA0 rebase to TFA0' solves the issue.
Now add all aforementioned trees in this picture, the possible occurrence for this problem grows significantly.
Regards, Olivier.
From: Julius Werner via TF-A tf-a@lists.trustedfirmware.org Sent: 12 August 2022 01:09 To: tf-a tf-a@lists.trustedfirmware.org; Glen Valante glen.valante@linaro.org; Don Harbin don.harbin@linaro.org Subject: [TF-A] TF-A CI often fail due to missing rebase
Hi Glen, Don, and others,
I've seen that a couple of TF-A patches I've been CCed on recently often seem to fail the CI run (Allow-CI+1) due to some strange build-time errors that don't seem to have anything to do with the patch at hand, and then one of the maintainers usually suggests that the patch needs a rebase, and the next CI run succeeds after that. Here are two recent examples:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16160 https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/14666
I was wondering if this is a known problem and if the CI can do anything here to mitigate it and save developers from this extra layer of friction? I'm not really sure why a rebase was necessary in either of these examples or why the CI run failed before it (unless the whole repository was in a bad state that didn't build, but since all submissions are guarded by the CI that shouldn't have been possible?). But I also don't really understand why the rebase would make a difference for the CI anyway. Generally, when a patch is submitted in Gerrit, that means it is cherry-picked onto the current master (regardless of what parent commit it was uploaded with). Since the CI is supposed to be a test run for submission, I would expect that the CI should also test a patch by cherry-picking it onto the current master, not just by building the patch on top of whatever parent commit it was uploaded with. But since rebasing a patch evidently seems to make a difference to the CI, that suggests that it's currently doing the latter strategy? Should that maybe be changed to the former to avoid these kinds of issues?
If this isn't a known problem yet maybe it would be worth adding it to the JIRA?
Thanks, Julius -- TF-A mailing list -- tf-a@lists.trustedfirmware.org To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org
Hi Julius,
This is a common problem with CI pipelines where the pipeline is maintained separately from the code it builds. It’s not particularly unique to TF-A or the OpenCI, and usually the “solution” is to integrate the pipeline code into the repository, which has implications for securing the pipeline if the CI can be triggered by untrusted users (perhaps not as big a deal for a project like TF-A with the Allow-CI+1 requirement, but still a deal).
As Olivier suggested, we could have Jenkins rebase the patches under test onto integration before running tests but, yes, it could mean that a CI run might fail if it requires a manual rebase.
It’s usually not too much hassle, though, to just rebase, given that Gerrit can usually do it automatically with the click of a button. The patches will, inevitably, have to be rebased eventually before merging anyway.
Something that we could potentially look into is a merge trainhttps://docs.gitlab.com/ee/ci/pipelines/merge_trains.html/merge queuehttps://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request-with-a-merge-queue, which would allow you to successfully run the CI even if there are breaking changes in the integration branch. The change would still need to be rebased whenever one of those breaking CI changes is merged into master, but if any of the changes in integration are rolled back for any reason then at least it doesn’t impact every change currently under review.
Chris
From: Olivier Deprez via TF-A tf-a@lists.trustedfirmware.org Date: Friday, 12 August 2022 at 09:25 To: tf-a tf-a@lists.trustedfirmware.org, Glen Valante glen.valante@linaro.org, Don Harbin don.harbin@linaro.org, Julius Werner jwerner@chromium.org, Olivier Deprez Olivier.Deprez@arm.com Subject: [TF-A] Re: TF-A CI often fail due to missing rebase Actually missed to answer the questions:
I would expect that the CI should also test a patch by cherry-picking it onto the current
master, not just by building the patch on top of whatever parent commit it was uploaded with. But since rebasing a patch evidently seems to make a difference to the CI, that suggests that it's currently doing the latter strategy?
Yes.
Should that maybe be changed to
the former to avoid these kinds of issues?
I believe this design relates to legacy. Indeed I wonder if we could modify scripts to clone tip of TF-A integration and cherry-pick the change. Perhaps there are gerrit option to help with this. But this would mean Allow-CI+1 can not always be applied if there are conflicts due to new changes merged in integration in the meanwhile.
________________________________________ From: Olivier Deprez via TF-A tf-a@lists.trustedfirmware.org Sent: 12 August 2022 10:16 To: tf-a; Glen Valante; Don Harbin; Julius Werner Subject: [TF-A] Re: TF-A CI often fail due to missing rebase
Hi Julius,
Let me try...
This relates to TF-A continuous integration model spanning multiple trees incl. TF-A, TF-A-tests, tf-a-ci-scripts, Hafnium, job configs etc. When Allow-CI+1 label is applied to a TF-A change, jenkins clones the tree at the TF-A change (which can be old vs tip of integration), and clones tip of master for others. This can easily get out of sync when a change is left unattended even for few days, see the example below and attached picture (apologies I didn't manage to express my ascii art directly within the email).
At some point in time a TFA0 change is developed (e.g. branched from master). CI master head is CI0 at this very moment. Allow-CI+1 is applied to TF-A change and passes. Later a new test build config CI1 change is merged to tf-a-ci-scripts requiring a companion TFA1 change to build properly. When we come later to review TFA0 we launch Allow CI+1 but now takes CI1 as tip of master. Build breaks because TFA0 tree misses TFA1 in its history. A TFA0 rebase to TFA0' solves the issue.
Now add all aforementioned trees in this picture, the possible occurrence for this problem grows significantly.
Regards, Olivier.
From: Julius Werner via TF-A tf-a@lists.trustedfirmware.org Sent: 12 August 2022 01:09 To: tf-a tf-a@lists.trustedfirmware.org; Glen Valante glen.valante@linaro.org; Don Harbin don.harbin@linaro.org Subject: [TF-A] TF-A CI often fail due to missing rebase
Hi Glen, Don, and others,
I've seen that a couple of TF-A patches I've been CCed on recently often seem to fail the CI run (Allow-CI+1) due to some strange build-time errors that don't seem to have anything to do with the patch at hand, and then one of the maintainers usually suggests that the patch needs a rebase, and the next CI run succeeds after that. Here are two recent examples:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16160 https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/14666
I was wondering if this is a known problem and if the CI can do anything here to mitigate it and save developers from this extra layer of friction? I'm not really sure why a rebase was necessary in either of these examples or why the CI run failed before it (unless the whole repository was in a bad state that didn't build, but since all submissions are guarded by the CI that shouldn't have been possible?). But I also don't really understand why the rebase would make a difference for the CI anyway. Generally, when a patch is submitted in Gerrit, that means it is cherry-picked onto the current master (regardless of what parent commit it was uploaded with). Since the CI is supposed to be a test run for submission, I would expect that the CI should also test a patch by cherry-picking it onto the current master, not just by building the patch on top of whatever parent commit it was uploaded with. But since rebasing a patch evidently seems to make a difference to the CI, that suggests that it's currently doing the latter strategy? Should that maybe be changed to the former to avoid these kinds of issues?
If this isn't a known problem yet maybe it would be worth adding it to the JIRA?
Thanks, Julius -- TF-A mailing list -- tf-a@lists.trustedfirmware.org To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org -- TF-A mailing list -- tf-a@lists.trustedfirmware.org To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org
Hi Chris, Olivier,
Thanks for the detailed explanation about the interaction with other repositories, I understand the cause of the issue better now.
However, I still feel like changing the strategy with which Jenkins tests the patches would be an elegant and hopefully not too complicated solution. I know that this will cause a patch that actually doesn't rebase cleanly onto the current HEAD at the moment to fail the test, but isn't that a desirable feature? If they don't rebase cleanly that means the author will need to manually intervene before submission anyway, and the CI is supposed to test whether the patch is ready for submission, so I think it makes sense for the CI to fail and notify the author of the problem in that case.
It’s usually not too much hassle, though, to just rebase, given that Gerrit can usually do it automatically with the click of a button. The patches will, inevitably, have to be rebased eventually before merging anyway.
I don't think this applies to the patches currently affected by this problem. They fail because their parent is out of date with other repositories, not because the patch itself would somehow not apply cleanly. If it wasn't for the CI issue these patches should be able to get submitted (i.e. cherry-picked onto HEAD) cleanly by Gerrit without needing another rebase, because the rebase doesn't actually change anything about the patch itself, it just updates the parent.
I know it's just one button click, but I think the problem is more that this issue is hard to understand for new contributors. Before you know to click that button you first have to dig through the CI interface, find the errors, be utterly confused why you're getting build errors in directories completely unrelated to your patch, ask a maintainer for help, and then be told to click the rebase button. If there is an easy automated solution to eliminate all this unnecessary friction, I think it would be nice to implement it.
I know that this will cause a patch that
actually doesn't rebase cleanly onto the current HEAD at the moment to fail the test, but isn't that a desirable feature?
I agree. I don’t think this would be a particularly difficult change to make, but I’m just wondering if there are any impacts that we have not considered, particularly for those working on larger features or refactoring work in high-traffic areas of the code-base.
I don't think this applies to the patches currently affected by this
problem. They fail because their parent is out of date with other repositories, not because the patch itself would somehow not apply cleanly. If it wasn't for the CI issue these patches should be able to get submitted (i.e. cherry-picked onto HEAD) cleanly by Gerrit without needing another rebase, because the rebase doesn't actually change anything about the patch itself, it just updates the parent.
Sure, I don’t mean that they would necessarily have to be rebased manually.
Chris
From: Julius Werner jwerner@chromium.org Date: Monday, 15 August 2022 at 22:21 To: Chris Kay Chris.Kay@arm.com Cc: Olivier Deprez Olivier.Deprez@arm.com, tf-a tf-a@lists.trustedfirmware.org, Glen Valante glen.valante@linaro.org, Don Harbin don.harbin@linaro.org, Julius Werner jwerner@chromium.org Subject: Re: [TF-A] Re: TF-A CI often fail due to missing rebase Hi Chris, Olivier,
Thanks for the detailed explanation about the interaction with other repositories, I understand the cause of the issue better now.
However, I still feel like changing the strategy with which Jenkins tests the patches would be an elegant and hopefully not too complicated solution. I know that this will cause a patch that actually doesn't rebase cleanly onto the current HEAD at the moment to fail the test, but isn't that a desirable feature? If they don't rebase cleanly that means the author will need to manually intervene before submission anyway, and the CI is supposed to test whether the patch is ready for submission, so I think it makes sense for the CI to fail and notify the author of the problem in that case.
It’s usually not too much hassle, though, to just rebase, given that Gerrit can usually do it automatically with the click of a button. The patches will, inevitably, have to be rebased eventually before merging anyway.
I don't think this applies to the patches currently affected by this problem. They fail because their parent is out of date with other repositories, not because the patch itself would somehow not apply cleanly. If it wasn't for the CI issue these patches should be able to get submitted (i.e. cherry-picked onto HEAD) cleanly by Gerrit without needing another rebase, because the rebase doesn't actually change anything about the patch itself, it just updates the parent.
I know it's just one button click, but I think the problem is more that this issue is hard to understand for new contributors. Before you know to click that button you first have to dig through the CI interface, find the errors, be utterly confused why you're getting build errors in directories completely unrelated to your patch, ask a maintainer for help, and then be told to click the rebase button. If there is an easy automated solution to eliminate all this unnecessary friction, I think it would be nice to implement it.
Hi all,
I support Julius' proposition, I think this is a really good idea that would improve TF-A work flow, both for patch contributors and maintainers.
I too am unsure about the exact implications on the development of large features, long-standing patch stacks and the like. I can't think of any red flags, though, so I suggest we try to prototype something.
I can imagine we could use the staging instance of the OpenCI for that. For example, we could implement the rebase strategy on a branch of the tf-a-ci-scripts, and use that in the staging instance. When clicking the Allow-CI+1 label button, this would trigger the usual OpenCI job on the production CI servers with no visible change to TF-A contributors, but also trigger it on the staging instance in silent mode / private email mode. We could monitor that for some time before we take the plunge.
Cheers, Sandrine
On 8/16/22 17:17, Chris Kay via TF-A wrote:
I know that this will cause a patch that
actually doesn't rebase cleanly onto the current HEAD at the moment to fail the test, but isn't that a desirable feature?
I agree. I don’t think this would be a particularly difficult change to make, but I’m just wondering if there are any impacts that we have not considered, particularly for those working on larger features or refactoring work in high-traffic areas of the code-base.
I don't think this applies to the patches currently affected by this
problem. They fail because their parent is out of date with other repositories, not because the patch itself would somehow not apply cleanly. If it wasn't for the CI issue these patches should be able to get submitted (i.e. cherry-picked onto HEAD) cleanly by Gerrit without needing another rebase, because the rebase doesn't actually change anything about the patch itself, it just updates the parent.
Sure, I don’t mean that they would necessarily have to be rebased manually.
Chris
*From: *Julius Werner jwerner@chromium.org *Date: *Monday, 15 August 2022 at 22:21 *To: *Chris Kay Chris.Kay@arm.com *Cc: *Olivier Deprez Olivier.Deprez@arm.com, tf-a tf-a@lists.trustedfirmware.org, Glen Valante glen.valante@linaro.org, Don Harbin don.harbin@linaro.org, Julius Werner jwerner@chromium.org *Subject: *Re: [TF-A] Re: TF-A CI often fail due to missing rebase
Hi Chris, Olivier,
Thanks for the detailed explanation about the interaction with other repositories, I understand the cause of the issue better now.
However, I still feel like changing the strategy with which Jenkins tests the patches would be an elegant and hopefully not too complicated solution. I know that this will cause a patch that actually doesn't rebase cleanly onto the current HEAD at the moment to fail the test, but isn't that a desirable feature? If they don't rebase cleanly that means the author will need to manually intervene before submission anyway, and the CI is supposed to test whether the patch is ready for submission, so I think it makes sense for the CI to fail and notify the author of the problem in that case.
It’s usually not too much hassle, though, to just rebase, given that Gerrit can usually do it automatically with the click of a button. The patches will, inevitably, have to be rebased eventually before merging anyway.
I don't think this applies to the patches currently affected by this problem. They fail because their parent is out of date with other repositories, not because the patch itself would somehow not apply cleanly. If it wasn't for the CI issue these patches should be able to get submitted (i.e. cherry-picked onto HEAD) cleanly by Gerrit without needing another rebase, because the rebase doesn't actually change anything about the patch itself, it just updates the parent.
I know it's just one button click, but I think the problem is more that this issue is hard to understand for new contributors. Before you know to click that button you first have to dig through the CI interface, find the errors, be utterly confused why you're getting build errors in directories completely unrelated to your patch, ask a maintainer for help, and then be told to click the rebase button. If there is an easy automated solution to eliminate all this unnecessary friction, I think it would be nice to implement it.
tf-a@lists.trustedfirmware.org