Hello,
ECLAIR version 3.13 was released end of last year, and as of now, both
TF-A and TF-M CI jobs are upgraded to this version. The jobs and
reports seem to be well, but please let me know if you see any issues.
Just in case, we keep docker images with the previous version around,
so if there are issues or a need for detailed comparison of result of
old vs new version, we can easily do that.
Thanks,
Paul
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linarohttp://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
Hello All,
The servers will stop processing jobs on 2024-1-20 at around
2:00pm UTC, as the servers will be put into "Shutdown mode".
This downtime is needed in order to upgrade Jenkins and some of
its plugins to the latest LTS release.
Start: 2024-1-13 03:00pm UTC
End: 2024-1-13 07:00pm UTC
Regards,
--
Kelley Spoon <kelley.spoon(a)linaro.org>
Hello,
Overloading and scalability issues have been a known issue with OpenCI
for a long time, usually happening around the release cycles due to
elevated CI activity, but also occurring from time to time during
normal workflow too. That's why last quarter we worked on an actionable
plan to leverage TuxSuite, a recent Linaro technology for
cloud-based building and testing, which proved itself well with other
projects. For this initial pilot project, we're looking to route TF-A
FVP tests (90+% of our test load) to TuxSuite away from LAVA, to
alleviate load on existing build and physical device test
infrastructure.
It was actively worked on during this month, and I'm happy to to report
that initial development and preliminary testing on staging show
encouraging results. I'd like to perform larger-scale testing on
staging yet, but otherwise think that we should be ready to deploy to
production next.
As it's a holiday season with not many working days left, I'm sending
this a bit earlier to make sure it's not forgotten or comes as a
surprise later. The plan is otherwise to test on staging today and/or
over weekend, and then proceed with deployment and validating on
production next week(s).
Please let me know if you have questions or concerns.
Happy holidays,
Paul
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linarohttp://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
Hello All,
The server will stop processing jobs on 2023-12-20 at around
6:00pm UTC, as the server will be put into "Shutdown mode".
This downtime is needed in order to upgrade Jenkins and some of
its plugins to the latest LTS release.
Start: 2023-12-20 07:00pm UTC
End: 2023-12-20 09:00 UTC
If this upgrade remains stable in staging, we will plan to upgrade
ci.trustedfirmware.org and mbedtls.trustedfirmware.org shortly
after 2024-01-03.
Regards,
--
Kelley Spoon <kelley.spoon(a)linaro.org>
Hello All,
The jenkins server will stop processing jobs on 2023-10-13 at around
21:00 UTC, as the server will be put into "Shutdown mode".
This downtime is needed in order to upgrade Jenkins to address some
minor security vulnerabilities and upgrade internal dependencies off of
obsolete versions. This upgrade will be to the same version that's been
running on staging for the past week and a half.
Start: 2023-10-13 21:00 UTC
Finish: 2023-10-13 23:00 UTC
Regards,
--
Kelley Spoon <kelley.spoon(a)linaro.org>
Hello,
As you might know, preparing for the upcoming TF-A release, there was
motion to test how the system behaves under the extra usually caused by
the release work, to anticipate what to expect and possibly to make
adjustments to improve situation comparing with the previous releases
(were overloads were all but common).
The testing started a few weeks ago by Joanna, joined last week by the
OpenCI maintenance team, especially when "death spiral" system behavior
was detected, familiar from the previous releases. This behavior was
suspected to be caused, and have been confirmed by the following
circumstances:
1. A patch is submitted and enabled for testing (AllowCI+2) which causes
a large number of (LAVA) tests to fail due to timeout.
2. These tests keep FVP virtual devices in LAVA busy for much larger
time than usual (~10x).
3. As there're many such tests, they block devices and cause LAVA queue
to grow and bottleneck (400-500 tests waiting).
4. Jenkins jobs also retry failing LAVA tests number of times as a
stopgap measure against non-deterministically and randomly failed tests.
5. As Jenkins jobs also have timeouts, waiting for queued/retries test
results caused them timeout.
6. All these factors have a positive feedback effect on each, causing
that "death spiral" effect when both Jenkins and LAVA were severely
overloaded, while doing nothing useful (effectively, waiting). And the
whole system was effectively deadlocked, where any new started job just
kept waiting until its timeout to fail, requiring manual intervention to
clear this state.
The measure to address this situation were:
1. Decrease default LAVA test timeout to more reasonable values (better
average value, while being ready to override it as needed for
individual tests).
2. Decrease number of test failure retries.
3. Increase number of Jenkins and LAVA workers/devices/containers.
The last Joanna's test after this change showed that the system no
longer exhibits "death spiral" behavior under heavy, but realistic load.
I also performed additional "extreme" test of running AllowCI+2
for multiple timeout-failing patches at once. I still was able to
reproduce a situation when a Jenkins job timed out on its side, but at
least there was no obvious "domino" effect to other jobs.
All this work was tracked via
https://linaro.atlassian.net/browse/TFC-498, which contain much more
detail in it subtickets. This tasks is closed now, per above. Given
that OpenCI is a complex and busy system, it is hard to be 100% that it
was single underlying issue which caused the problems. So, if you see
unexpected/problematic behavior, please open a TFC ticket (which is
still the standard workflow to report and track issues).
Thanks,
Paul
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linarohttp://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
Hello,
I would like to share some developments and updates regarding
TrustedFirmware MISRA testing throughout September:
1. MISRA CI testing for TF-M was formally launched. That doesn't mean
it runs to all its power yet, but the infrastructure is in place, and
the next steps are for the TF-M team to see how it fits into their
development workflow, and decide how to address identified MISRA issues
- either record them as deviations or fix in the TF-M source code
(likely combination of both).
2. One of the development done for the TF-M testing was implementation
of the cumulative report across multiple configurations (vs myriad of
individual per-configuration reports, which are hard to follow). This
feature was already forward-ported to the TF-A "daily" build. It
immediately made visible the fact that a MISRA mandatory rule violation
crept into the codebase:
https://ci-builds.trustedfirmware.org/static-files/llodfObQwsfBE_M8BN9W1URq…
, select "Mandatory rules - violations" (note that the link will expire
after some time).
Further development plans are:
1. Cooperate with the TF-M team regarding MISRA rule, etc.
configuration to get the reports into a shape useful for developers and
contributors.
2. Forward-port cumulative report feature to the TF-A "delta" (i.e.
patch) testing.
These will be worked on starting from October, subject to other feature
development and maintenance work.
Thanks,
Paul
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linarohttp://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
Hello all,
The gerrit server on review.trustedfirmware.org will be offline for a 2
hour maintenance window starting today (Oct 4, 2023) at 19:00 UTC.
This downtime is needed in order to upgrade gerrit to the current version
(3.8.1).
Changelog is available at:
--
Kelley Spoon <kelley.spoon(a)linaro.org>
Hello All,
The jenkins server will stop processing jobs on 2023-10-03 at around
21:00 UTC, as the server will be put into "Shutdown mode".
This downtime is needed in order to upgrade Jenkins to address some minor
security vulnerabilities and upgrade internal dependencies off of obsolete
versions.
Start: 2023-10-03 21:00 UTC
Finish: 2023-10-03 23:00 UTC
--
Kelley Spoon <kelley.spoon(a)linaro.org>