[Proposal] Split autopkgtest queueing and status verification out from main Britney process

tsimonq2 · February 15, 2025, 11:14pm

As it stands now

Autopkgtests are a critical part of verifying that packages are safe to migrate into the release pocket and become generally available for users. We verify that reverse dependencies for a given package pass, or at the very least do not regress, before they migrate into the release pocket. That being said, they take up a substantial part of Britney runs.

I have compiled some average run times for Britney, and the amount of time the autopkgtest part of the run takes, between the start of the Plucky cycle and the 20250215T18:52:20Z run:
Average initialization time: 3 minutes, 10.543405 seconds (11.236013320046084% of total)
Average autopkgtest processing time: 16 minutes, 46.456595 seconds (59.34899571133872% of total)
Average Excuses processing time: 8 minutes, 18.827508 seconds (29.41499096861519% of total)
Average total run time: 28 minutes, 15.827508 seconds

Here are a few observations from this data:

The average autopkgtest processing time is approximately double that of the average excuses processing time.
The average autopkgtest processing time is a majority of the average total run time, at approximately 59%.
Removing the autopkgtest processing time altogether would result in an average total run time of 11 minutes, 29.370913002 seconds.

Proposal

Introduce a new daemon between britney2-ubuntu and autopkgtest-cloud which will take care of all requests between the two. Here is an overview:

Britney starts a run, and dispatches requests to the new daemon, which stores it in a small, lightweight queue of its own.
The new daemon listens for requests from Britney, and sends them to autopkgtest-cloud. Ideally this becomes a webhook long-term (sent by autopkgtest-cloud to the new daemon), but the new daemon then regularly scans for any updates to those autopkgtest results.
Britney finishes its run, and a new run is scheduled. It then sends an “update” request to the new daemon; so the request contains the new packages that did not exist before (or a full list of packages if we prefer to store it on the new daemon), and the response contains all of the statuses for all of the pending requests, in batches.
Once the new daemon has returned the update, any fully pending results are removed from its data; Britney stores those, its job is just to act as an intermediary.

This new daemon would also be responsible for:

Replacing the need for running retry-autopkgtest-regressions --only-unknown, ideally with something more robust (classify the errors and be able to retry them en masse, or for simple situations, just retry them and store some metadata to send as a report.)
Doing all of the current trigger creation logic that exists in Britney (it would be moved.)
Storing data about running and queued autopkgtests directly related to proposed migration, in a way that encourages improvements to and integration with autopkgtest-cloud.

Benefits

Here are some of the obvious benefits of this approach:

Modularity: It will allow Britney and the new daemon to be updated separately, and if one fails, the other does not fail. Additionally, Britney logs will be much cleaner, and storing archives will take up significantly less space when scaled.
Efficiency and speed: Britney runs will be reduced by a significant amount of time on average, allowing Ubuntu Developers and all interested parties to receive results quicker, especially during large transitions.
Maintainability and flexibility: In reality, this new daemon could be written in another programming language, like C++, Go, or Rust. Additionally, some people find existing Britney code difficult to grasp for drive-by contributions, and for setting up a local instance. This would greatly improve the setup process (allowing someone to swap in a faux or alternative daemon if needed), and allow for a lighter codebase within britney2-ubuntu.

Downsides

Here are the downsides I can think of:

While I volunteer to help with this effort, it’s going to take some time to write and polish the code in all places to correctly integrate this. I’d deeply appreciate some help Canonical-side (or if you’d like to drive it, please let me know.)
This will make us diverge from Debian further. I’m not sure how far we diverge from Debian in this respect currently, but unless we did this effort in Debian as well, it may be difficult to do merges in the future.
We should get the documentation right on this. Ideally, so that it’s both easy to maintain/administer, and to use. This probably won’t have any user-facing functionality, but it may have some logs or associated status reports, and that can be difficult to get right.

Thoughts?

What do you think on this topic? Is it worth exploring?

juliank · February 16, 2025, 8:04am

It’s intentional. autopkgtest-cloud has a lot of logic to retry known temporary failures automatically, but it turns out that doing so isn’t particularly helpful and there’s a ton of gotchas where actually the package did cause the temporary failure.

That’s totally wrong. Britney needs to determine which packages must migrate together and hence provide the correct migration set as triggers when running the tests (it is not doing a good job at it so far and it needs more research as to why it fails to find the correct migration groups, but it’s a broader problem than just triggers).

That’s pretty much counter intuitive. Britney accesses a local database to avoid having to talk to the autopkgtest server because the server interactions are too slow.

tsimonq2 · February 16, 2025, 1:02pm

I’ll admit, I think you should be more open minded/optimistic here.

I’ve heard differently, that it’s usually helpful. In reality, I actually do see movement when running it. The same entries don’t exist twice, and they usually result in packages being fully fixed.

Nope, think about it harder.

All of this is done within the autopkgtest Britney policy. If this new daemon already knew about all of the packages in proposed, since the triggers are only used in the web status report and in the autopkgtest itself, this would actually save tons of time. Additionally, it could be written in a language much faster than Python.

Directly contrary to what you’re saying, autopkgtests influence migration policy, not the other way around.

Right, but it doesn’t need to happen in the same, main Britney process. Given that it does take a majority of the run time, I still submit that this would greatly improve speeds, to completely separate out the daemons and have them communicate, either via database or HTTP, depending on how you want to do the server setup.

@juliank I think you’re seeing this as counter-intuitive because nobody’s tried it before, and anyone who has was easily quieted by people who thought it couldn’t be done. Well, it can be done, it would be helpful, and I’ll volunteer to help make it a reality.

juliank · February 16, 2025, 4:00pm

I’m seeing this as counter productive because I had more or less exactly the same discussion with the Britney maintainer about triggers earlier this week and the end result was: Britney defines the policy, it needs to determine the correct packages that need to migrate together and it should not rely on external services to determine that.

My position on the topic was that we don’t need to bother calculating triggers and can use no-strict-pinning and then figure out which packages were needed from unstable/proposed and then have Britney depend on that to determine which packages need to migrate together such that tests can’t break after migration due to broken depends.

That idea was strongly rejected and I certainly agree with the position that Britney should be fixed instead.

The use of triggers you say is only in autopkgtest and that’s true but what you are missing is that the triggers is, or should be, the set of packages that need to migrate together.

That calculation needs to be done by Britney regardlessly. There’s actually nothing autopkgtest specific about it.

For the autopkgtest it’s entirely sufficient to, once having calculated the list of packages that need to migrate together, get their reverse test depends, and then trigger all of them with the entire migration set.

juliank · February 16, 2025, 4:43pm

Yes but autopkgtest-cloud already retried all temporary failures 3 times before giving up on them, it’s not magically gonna get better because something else retries them.

Sometimes it fails to properly give up (the worker crashes and the entry is not removed from the queue) and then spins in a loop retrying (differently) the same failure all the time.

It doesn’t but what I meant to say is that moving it to autopkgtest-cloud eventually is not going to work.

I believe Britney should essentially be rendering status pulled in from sources. One component needs to figure out which migrations depend on which other migrations (an NP complete problem), an apt-britney should be able to calculate that. Basically run the new solver for each package in proposed with no-strict-pinning and see what it would pick from proposed.

Another component needs to look if new migrations are incoming and iterate over them and trigger autopkgtests for them using the recursive set of depends as triggers.

Another component listens to the amqp queue and records finished results into a database.

Another component migrates packages that are ready.

And finally you have a render component that renders everything into a yaml and an html.

These can be different processes communicating over amqp or threads.

tsimonq2 · February 16, 2025, 5:48pm

I strongly disagree with this point. That makes Britney into an unchangeable monolith; the logic in the new daemon should be an exact move of existing logic, but insisting that it should be done all in Britney itself is counterproductive. (Not trying to shoot the messenger here. )

We’re talking about two different thing entirely here…

autopkgtest triggers and Heidi output don’t always match, they’re separate. Defining the triggers elsewhere does not change how Britney groups items. In fact, Britney should still be the SSOT on any grouping determinations, but there’s no reason why the actual work of retrying/requesting/etc one-by-one can’t be split off into a separate process entirely.

While I’m not sure this is accurate, I think it’s outside the scope of this proposal. My point is, it should do anything in terms of queue maintenance related to Britney that autopkgtest-cloud doesn’t do.

That’s almost what it does. Almost.

Britney starts by getting the Packages and Sources files from the archive, then extracting them and combining them in a meaningful way.

It then starts by doing the grouping you’re talking about; what depends on what, what is cached?

Then it goes through a list of policies for each package. Each policy then produces its own pass/fail or HTML/YAML status based on the criteria set internally.

If we could lighten up that Britney Policy towards autopkgtests, so it does not have to sit there and do tens of thousands of AMQP requests, but instead controls another daemon which does, and reports back, then I think we’d be golden here.

To be clear, what I’m suggesting is simply a refactor of the autopkgtest Britney policy.