*Attendees*: Don, Joanna, Matteo, Xinyu, Riku, Janos Follath, Anton, Dave
Rodgman, Ben Copeland, Shebu
*Minutes*:
- Glen: Staffing - Updated team on the staffing updates
- Theo and Paul joining.
- Leonardo transitioning over the next few weeks.
- Glen to meet w/ whole team tomorrow
- Leonardo still best person to contact on MBedTSL
- MBedTLS
- M2 completed, M3 blocked on need for plugins
- Riku to take care of plugins, targeting today to complete
- TFC-176
- Still completing
- TFC-82: Arthur was working on this. Want his updates released and
should complete
- TFC-87: Arthur has been working w/ Joanna on this. May take a couple
of passes, but initial solution in the next day
- TFC-183 - MUSCA B are off line.
- Ben: Dave looking into this. Failed due to debian update and debian
bug.
- Glen: Back it out?
- Ben recommends moving forward and resolving in next sprint (next
week). Urgency?
- Anton: Only MUSCA B offline? Preparing for next month release, so
need by then for sure.
- Agreed to put in next week
- Glen: NXP Boards coming up in Lab
- Glen: Chromebooks (Jacuzi - TFC-45) received in Cambridge, got out of
customs!
- Glen: Creating card for next set of Chromebooks (Lazor)
- Glen: Renesas beginning to ship the h/w
- Shebu: Renesas Intend to push board support for TF-M
- Matteo: Any priority calls to make?
- Glen: Some of the smaller backlog tickets. Glen will be moving items
out of backlog and assigning them.
- Matteo: To draft an email on backlog email and share w/ Glen.
- Shebu: Some SC Approved have yet to be touched.
- How much effort for Disaster Recovery? It could be a larger task.
Will break this down into subtasks and do the assessment.
- ACTION: Joanna, Anton, Matteo, Shebu to go thru backlog and SC
approved and make adjustments in the Kanban. Keep others updated via email.
Attendees: Don, Riku, Xinyu, Shebu, Dave Rodgman, Ben, Joanna
Minutes:
- Several platforms ready to add to the Cambridge lab. Includes NXP,
Cyrpress, Google
- TFC-181: patches in review
- TFC-176: Continue to work. Moed from SC Approved to In Progress.
- May have a temporary fix from Leonardo.
- Monitoring at this point. Joanna recommends holding off on ticket
TFC-96 until TFC-176 is stable so as not to add more variables into the
solution. After a few days of stability, can come back to this.
- Moved TFC-36 to REOPENED and prioritized
- Moved TFC-36 to SC Approved
- Verify MBed status. Still in Blocked.
- Action: Don check w/ Glen/Leonardo as to why the Mbed work still in
"Blocked"
- Anything else to work on here?
- Joanna: TFC-87: Would be good to start if time allows. Doesn't have
to all be done at once.
- Shebu: Update on staffing
- Two engineerings from Linaro coming on to help for next 6 months to
supplement the staff. Theodore Grey & Paul Sokolovskyy (Welcome aboard!)
- Will want to make sure the team is appropriately loaded.
- Board additions will continue.
- MBedTLC CI up and running important too.
- Over the next couple of weeks Glen will be integrating them into
the team as they spin up.
*Attendees*: Don, Glen, Anton, Shebu, Matteo, Joanna, Ben C, Dave Rodgman,
Riku
*Minutes*:
- Servers went down - certificate expiration issue resolved. Blocker
- TFC-179/TFC-80 (duplicate) - certificates updated and these can be
resolved
- Glen: Adding boards:
- Have documentation on boards. Should we add vendor boards to
documentation?
- User's manual.
- Determine if public?
- Don: Assumed it was public.
- Shebu: Worth asking Eric, Julius, Sean, and others if they're OK to
do that.
- Glen: Can share what would be published in the User's Guide first.
- Would be OK to point to vendor info if works
- Anton: TF-A/M docs could also point to this
- Platform integration updates:
- Glen: Cypress/NXP in Cambridge. Will get integrated shortly.
- Google Chromebooks.
- Arthur and Fathi's returned to sender, the 3 platforms made it
to the lab.
- Working with Google to start the 2nd platform.
- MBedTLS
- Dave's team provided the AMI files.
- Dave: Darryl expected additional AMI's beyond Ubuntu image. Will
build additional configs from those after Ubuntu.
- TFC-176: Leonardo has a work-around (3 retries), but not getting to
root cause. Likely need a longer-term solution.
- Glen asking for Expect script review from Joanna.
- TFC-82: Github auth. Working, now looking to deploy.
- Anton: TF-M health check job. Expected OpenCI team to keep it up.
Sanity test is failing.
https://ci.trustedfirmware.org/view/TF-M/job/tf-m-infra-health/
- Don: Suggest creating a ticket.
- Action: Glen/Ben to open a ticket so that can track/assign.
Thanks
don
*Attendees*: Don, Glen, Anton, Riku, Joanna, Matteo, Xinyu, Dave Rodgman,
Shebu
*Minutes*:
- Glen: Shared TFC Board
- Boards:
- Chromebooks - returning to get out of customs and then sending back
again.
- Cypress/NXP being integrated in October. Received h/w in lab
- TFC-82: Arthur working on this while have time
- TFC-36 is ready for Arm team to review. Leonardo
- TFC-176: Intermittent failures. Leonardo continuing to isolate this
- TFC-20 integrated including a resolved regression. Any issues noticed?
- Arm provided AMI files. Go back to mbedTLS?
- Matteo: Will we finish TFC-36? Need to make sure this finishes.
- Glen: Agreed
- Joanna: Would like to continue w/ TFC-176 continue at least part
time.
- Glen recommends re-evaluating after his work day today
- Joanna: Agree to keep it as a background task
- TFC-172: In backlog, not any data in the ticket on quantifying slow
or how to replace.
- Xinyu: No longer reproducing. Xinyu to update ticket and we will
close
- TFC-171: Seems to be solved from other infrastructure improvements
- Joanna: OK to close resolved. Not sure what it was but not
occurring now
- Arthur may have bandwidth for another task
- Joanna: TFC-87 may be a good one to work on. Currently the team
performs a work-around.
- Moved TFC-87 to SC Approved from Backlog
- *Action: Anton* to evaluate moving TFC-173 to SC Approved.
Don
Attendees: Riku, Don, Glen, Matteo, Shebu, Anton, Joanna, Dave Rodgman
Minutes:
- TFC-20: Git performance - infrastructure changes happened including
Leonardo infrastructure changes. Tested on stage. Should keep an eye on it
for next few days. New machine already added back in. Better performance
will be seen as well since more jobs can focus on builds and not clones.
- Expect scripts: Should wrap up this week
- Joanna: Brought up LAVA timeouts TFC-176. Pass on 2nd or 3rd
attempts. Initial analysis is too many parallel LAVA jobs. Starting out by
increasing timeout. Would prevent re-running jobs.
- Riku: Should add LAVA lab folks to this ticket since adjusting
timeouts
- Boards status:
- Chromebooks still dealing w/ import issues.
- *ACTION: Don* ask Julius to reject and resend the 3 boards to the
Cambridge lab
- Cypress and NXP platforms now in the lab
- LAVA team will be updating to latest release with the new board
configs included. Want this done in the next couple of weeks
- Arthur will be available for some other tasks.
- MBedTLS
- Dave: AMI images almost ready. Expect it soon.
- Glen: Linaro support prepared to copy to AMI's when they are queued
up.
- Glen: Joanna's new list of issues
- LAVA timeout was one.
- TFC-87: Joanna's team reviewing that one. CI reporting ticket. Need
some guidance/access from Leonardo
- Glen will let him know
Thanks
Don
*Attendees*: Joanna, Xinyu, Matteo, Janos, Glen, Riku, Shebu, Anton, Ben,
Don
*Actions*:
- Glen: Follow up on notes
- Glen: Set up sync meeting to hear Riku/Leonardo/Anton/Joanna on
proposing a solution on the git clone performance issue.
*Minutes*:
- Glen: TFC Kanban board review
- Glen: Chromebooks stuck in customs - working paperwork now
- Glen: Cypress & NXP platforms both underway
- Glen: Performance issues update: (TFC-171, 172, 164)
- Ben: Limited CI Number of jobs to help relieve a performance issue.
- Riku: Impact - slower builds.
- Anton: were we testing on staging?
- Ben: No
- Should we allocate resources to work on performance?
- Joanna: Would work on server scaling versus Expect scripts
- Riku: TF-M build, launches over 100 builds, then git clones turn
into 400 simultaneous git clones - need to re-factor to do clone
up front.
- Riku/Leonardo - 1-2 week estimate
- Anton has some ideas - sync w/ him on potential solution. Once
agreed, begin the work.
- Glen: Meeting set up for tomorrow to discuss code coverage state and
how Arm might be able to help.
Hi Sherry,
I'm adding the triage maillist to the thread. As a best practice, let's cc
that list on items like this as it includes the stakeholders that
prioritize OpenCI tasks on a weekly basis(minimum) so it's helpful info in
that decision making.
I see you're already subscribed to the list which is great! :) Reviewing
the Aug 31st sync minutes, Expect scripts were determined to be the
priority. Code Coverage next steps are also discussed. Looks like Glen
was going to set up a sync meeting to further discuss this one... The
minutes could have called out this action more clearly:
- Code Coverage:
- A sync w/ Leonardo and Joanna to discuss next steps/Current status
on CC shall be planned. Glen to set up
> Is it suspended for pure priority reason, or any technical reason?
So with the above said, this is a prioritization decision made by the
triage stakeholders, not technical.
Hope this helps, please let me know if any questions or suggestions on
improving the process. :)
Best,
don
On Thu, 2 Sept 2021 at 06:35, Leonardo Sandoval <
leonardo.sandoval(a)linaro.org> wrote:
> Hi Sherry,
>
> In resume, for priority reasons.
>
> Right now I am working on some pending tickets for TF-A (expect scripts
> migration, TFC-36 <https://linaro.atlassian.net/browse/TFC-36>). Once I
> complete TFC-36 <https://linaro.atlassian.net/browse/TFC-36> and MbedTLS
> work is still on hold, I will move to TFC-7
> <https://linaro.atlassian.net/browse/TFC-7> immediately.
>
> Regards,
> lsg
>
>
>
> On Thu, 2 Sept 2021 at 02:15, Sherry Wu <Sherry.Wu(a)arm.com> wrote:
>
>> Hi Leonardo and Don,
>>
>>
>>
>> Just noticed that https://linaro.atlassian.net/browse/TFC-7 changed to
>> “TODO”. Wondering what’s latest update for the code coverage tool
>> integration on Open CI.
>>
>
> Thanks,
>>
>> Sherry
>>
>>
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>
Hi Xinyu,
Thanks for the escalation.
I see Ben is in the loop, so that's the correct first step. I've also cc'd
the triage maillist to make sure all stakeholders are looped in.
In general, as a best practice, I would suggest having as much quantifiable
data as possible upfront in tickets like this to help better understand the
magnitude as well as how to reproduce. It looks like the tickets have
already started to capture this, but I also see Ben in the ticket
requesting more. Datapoints of interest in my mind:
- Clear details on how to reproduce: The task(s) where noticeable
degradation is seen - is this in parallel to when large builds have kicked
off? etc.
- Tasks invoked and level of degradation: for example, "Gerrit reviews -
previously took xyz seconds/minutes, now taking 20% more time (or 2x, 10x?,
failing & never completing?)," frequency, etc. The more details the
better! This will help determine the priority we place on resolving. :)
Perhaps coming up with a general template for this could be helpful.
- Is there other degradations beyond Gerrit?
Ben is most certainly much more qualified than me in knowing what support
is needed to isolate/resolve, and, as noted in the ticket, he is asking for
more details as well. Let's get the details captured in the ticket(s) in
preparation for next Tuesday's Triage meeting where we can prioritize the
resolution over other TFC tasks. :)
Ben, feel free to chime in and correct any of my assumptions/suggestions.
Regards,
Don
On Thu, 2 Sept 2021 at 01:41, Xinyu Zhang <Xinyu.Zhang(a)arm.com> wrote:
> Hi Don,
>
>
>
> We found that trustedfirmware.org is getting slow. Daily work of some
> developers would be influenced.
>
> Could you please help to take a look on this issue? Here is the TFC link:
> https://linaro.atlassian.net/browse/TFC-172
>
>
>
> BR,
>
> Xinyu
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
*Attendees*: Riku, Anton, Glen, Shebu, Ben C, Joanna, Janos, Matteo, Don,
Xinyu
*Minutes*:
- Glen: Begin w/ TFC Kanban Board
- Glen: H/W Status
- Working thru getting Chromebooks imported. In progress
- Cypress boards flashing and booting, starting on jobs work. In a
week ready for boards to be shipped to Cambridge.
- NXP board is in the queue.
- MBedTLS:
- Glen: Have advanced as far as can. Need AMI files from Dean A.
Blocked awaiting AMI files.
- Janos: Progressed since last week, but working on resolving
internal issues. No estimate yet from Dean.
- Glen: Plan to move to expect script efforts with team approval
- Matteo: Agree Expect Scripts are the next item to work. Would like
completed, so good direction to go.
- Code Coverage:
- A sync w/ Leonardo and Joanna to discuss next steps/Current status
on CC shall be planned. Glen to set up
- Glen: Performance/Fixes needed?
- Riku: Nightly jobs failing due to too many parallel checkouts of
same repo/same version. TFC-20. Need to decide how to implement
- For TF-M, looks currently like 1 week work to modify CI to reduce
parallel checkouts, TF-A still looking to get an estimate.
- Ben: TF-A may be a bit larger task, but still need to look.
- Ben: Other solutions?
- Glen: Could Arm TF-A / TF-M team do this to offload the work?
- Not currently
- Ben: Need a short-term and long-term solution. Short-term - added
a server, long-term, change the configuration changes.
- Can use Staging Server to test it out.
- Git checkout taking lots of the build time.
- Shebu: Failing every night?
- Not sure.
- Riku: Last week, 1 success all other failed.
- Ben: The new server increasing capacity has caused this issue to
show itself.
- Work around is to potentially limit the number of parallel jobs.
- Next Steps:
- Leonardo scope TF-A.
- Create two subtasks for TF-A and TF-M
- Riku: Put in work-around to limit number of parallel jobs
- Need a TFC ticket here?
<end>
Attendees: Janos Fallath, Matteo, Joanna, Don, Glen, Xinyu, Ben C
Minutes:
- Cancel next week instance - folks are out
Platform enablement
- Chromebook: Arthur wrapping up the work, when receive the platforms
will be ready.
- Cypress: Got the info and moving forward again.
- NXP Platform: Some back ground work to see best way to integrate
- Focus back in Cypress
Other
- expect scripts. in holding pattern with focus on mbedTLS
- MC: Two months away from end a FY. Would like expect scripts are
finalized. Would like as a background task when things get blocked
- MC: Would like to at a minimum know what needs to be done and what
help Leonardo may need. LAVA help for example? Anything that
can be done
in parallel. So what is left and who needs to do which task?
- MBedTLS
- JF: Don't have AMI's yet, working to get them out. So blocked.
Can't set up environment.
- GV: Jumped forward to M3's for now. Will need to get to having
this up on a real system.
- GV: By end of the week, M3 should be done, having the AMI files
will be key at that point to move on.
- JF: Originally, scripts / files were to be pulled in. Now thinking
of a restricted repo and sharing it - may not have that option. What can
be used beyond a public repo?
- JF: Can Linaro see the private Arm instance?
- Glen: Have to check if can see the private repo *ACTION Glen*.
- Glen: Set up a sync to review repo sharing options *ACTION Glen*
- Movng to gitlab?
- JF: Not at this point
- JF: Could move the test repo to gitlab, but same issue.
- SC Approved:
- Code Coverage tool
- Joanna: A new ticket raised TFC-160. Can we disable code coverage
for now until the gitlab.arm.com is back up. Without it, jobs are
failing.
- MC: Should we wait?
- Joanna: TF-A has been down since Friday.
- MC: Can we see what Dean does to resolve?
- Plan to see if can disable code coverage until resolved.
- MC: Can we be explicit in the ticket on what Linaro is to do. i.e
request disable code coverage on TF-A