Hi Julius,

This is a common problem with CI pipelines where the pipeline is maintained separately from the code it builds. It’s not particularly unique to TF-A or the OpenCI, and usually the “solution” is to integrate the pipeline code into the repository, which has implications for securing the pipeline if the CI can be triggered by untrusted users (perhaps not as big a deal for a project like TF-A with the Allow-CI+1 requirement, but still a deal).

As Olivier suggested, we could have Jenkins rebase the patches under test onto integration before running tests but, yes, it could mean that a CI run might fail if it requires a manual rebase.

It’s usually not too much hassle, though, to just rebase, given that Gerrit can usually do it automatically with the click of a button. The patches will, inevitably, have to be rebased eventually before merging anyway.

Something that we could potentially look into is a merge train/merge queue, which would allow you to successfully run the CI even if there are breaking changes in the integration branch. The change would still need to be rebased whenever one of those breaking CI changes is merged into master, but if any of the changes in integration are rolled back for any reason then at least it doesn’t impact every change currently under review.

Chris

From: Olivier Deprez via TF-A <tf-a@lists.trustedfirmware.org>
Date: Friday, 12 August 2022 at 09:25
To: tf-a <tf-a@lists.trustedfirmware.org>, Glen Valante <glen.valante@linaro.org>, Don Harbin <don.harbin@linaro.org>, Julius Werner <jwerner@chromium.org>, Olivier Deprez <Olivier.Deprez@arm.com>
Subject: [TF-A] Re: TF-A CI often fail due to missing rebase

Actually missed to answer the questions:

> I would expect that the CI should also test a patch by cherry-picking it onto the current
master, not just by building the patch on top of whatever parent
commit it was uploaded with. But since rebasing a patch evidently
seems to make a difference to the CI, that suggests that it's
currently doing the latter strategy?

Yes.

> Should that maybe be changed to
the former to avoid these kinds of issues?

I believe this design relates to legacy. Indeed I wonder if we could modify scripts to clone tip of TF-A integration and cherry-pick the change.
Perhaps there are gerrit option to help with this.
But this would mean Allow-CI+1 can not always be applied if there are conflicts due to new changes merged in integration in the meanwhile.

________________________________________
From: Olivier Deprez via TF-A <tf-a@lists.trustedfirmware.org>
Sent: 12 August 2022 10:16
To: tf-a; Glen Valante; Don Harbin; Julius Werner
Subject: [TF-A] Re: TF-A CI often fail due to missing rebase

Hi Julius,

Let me try...

This relates to TF-A continuous integration model spanning multiple trees incl. TF-A, TF-A-tests, tf-a-ci-scripts, Hafnium, job configs etc.
When Allow-CI+1 label is applied to a TF-A change, jenkins clones the tree at the TF-A change (which can be old vs tip of integration), and clones tip of master for others.
This can easily get out of sync when a change is left unattended even for few days, see the example below and attached picture (apologies I didn't manage to express my ascii art directly within the email).

At some point in time a TFA0 change is developed (e.g. branched from master). CI master head is CI0 at this very moment. Allow-CI+1 is applied to TF-A change and passes.
Later a new test build config CI1 change is merged to tf-a-ci-scripts requiring a companion TFA1 change to build properly.
When we come later to review TFA0 we launch Allow CI+1 but now takes CI1 as tip of master. Build breaks because TFA0 tree misses TFA1 in its history.
A TFA0 rebase to TFA0' solves the issue.

Now add all aforementioned trees in this picture, the possible occurrence for this problem grows significantly.

Regards,
Olivier.

From: Julius Werner via TF-A <tf-a@lists.trustedfirmware.org>
Sent: 12 August 2022 01:09
To: tf-a <tf-a@lists.trustedfirmware.org>; Glen Valante <glen.valante@linaro.org>; Don Harbin <don.harbin@linaro.org>
Subject: [TF-A] TF-A CI often fail due to missing rebase

Hi Glen, Don, and others,

I've seen that a couple of TF-A patches I've been CCed on recently
often seem to fail the CI run (Allow-CI+1) due to some strange
build-time errors that don't seem to have anything to do with the
patch at hand, and then one of the maintainers usually suggests that
the patch needs a rebase, and the next CI run succeeds after that.
Here are two recent examples:

https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/16160
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/14666

I was wondering if this is a known problem and if the CI can do
anything here to mitigate it and save developers from this extra layer
of friction? I'm not really sure why a rebase was necessary in either
of these examples or why the CI run failed before it (unless the whole
repository was in a bad state that didn't build, but since all
submissions are guarded by the CI that shouldn't have been possible?).
But I also don't really understand why the rebase would make a
difference for the CI anyway. Generally, when a patch is submitted in
Gerrit, that means it is cherry-picked onto the current master
(regardless of what parent commit it was uploaded with). Since the CI
is supposed to be a test run for submission, I would expect that the
CI should also test a patch by cherry-picking it onto the current
master, not just by building the patch on top of whatever parent
commit it was uploaded with. But since rebasing a patch evidently
seems to make a difference to the CI, that suggests that it's
currently doing the latter strategy? Should that maybe be changed to
the former to avoid these kinds of issues?

If this isn't a known problem yet maybe it would be worth adding it to the JIRA?

Thanks,
Julius
--
TF-A mailing list -- tf-a@lists.trustedfirmware.org
To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org
--
TF-A mailing list -- tf-a@lists.trustedfirmware.org
To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org