Call for testing: A/B boot on Raspberry Pi

In questing, I’m intending to change the way we boot the Raspberry Pi rather substantially. Specifically, we’ll be moving to a system that will “test” new boot assets and automatically fall back to a “known good” configuration in the event the test fails.

I won’t go through all the gory details in this post (if you want those, please see the blog post here), this is simply a call for testing. If you have a supported model of Raspberry Pi, and a spare SD card, please consider trying out the new mechanism. Flash one of the questing dailies to your SD card:

Once booted (and configured, if you’re using the desktop daily), add my flash-kernel PPA and install the new flash-kernel-piboot package:

sudo add-apt-repository ppa:waveform/flash-kernel
sudo apt install flash-kernel-piboot

This will upgrade flash-kernel as well and, in the process, will migrate the content of the boot partition (mounted under /boot/firmware) to the new layout. The currently booted “known good” boot assets will be placed under /boot/firmware/current, untested new boot assets go in /boot/firmware/new, and older known good boot assets live under /boot/firmware/old.

The old/ directory is implicitly removed to make space when new/ assets are written (by flash-kernel). When untested new/ assets are present, the next boot (whether warm reboot or cold boot) will be a “double boot”. The second boot places the Pi in “tryboot” mode where the new/ assets are booted. If this succeeds, the new piboot-try-validate service will shuffle the directories around (current/ becomes old/, new/ becomes current/). If the boot fails, the next boot will implicitly use the “known good” assets under current/ and those under new/ will be marked “bad” (until they’re overwritten by the next run of flash-kernel).

Questions

If you have any questions about the new system, please feel free to ask them here (or read the associated blog post, if the size of the scroll bar doesn’t put you off!)

Bugs

If you find any issues, please file a bug against flash-kernel and tag it raspi-image to bring it to my attention.

By the way: be brutal! This is meant to save you from bad boot setups after all. My one caveat here: if you do break it, I want to know precisely what you did! Keep notes, and try and keep the SD card in the broken state so I can ask you questions about it to try and figure out what went wrong.

Known issues

None yet.

I have an inkling this may cause (more) issues for “flaky” SD cards: the process does involve writing to the SD card, then immediately rebooting in the expectation those writes will be preserved.

Anecdotally: I did try this out on a (supposed) SanDisk card in my possession which I know to be flaky, and it worked happily on two occasions, but on another the boot partition wound up horribly corrupted half-way between the pre-migration and migrated state (this despite being after a reboot of the migrated state – in other words it appeared to lose a whole load of supposedly committed writes). However, I’m not considering that a bug: if your card is flaky and this breaks it, the solution is (and always has been) “buy a better card”.

3 Likes

@waveform The link to the Server Dailies appears to be incorrect.

I’m looking forward to giving this a try. Thanks.

Hmm, it wasn’t incorrect when I posted it but apparently it’s disappeared now, which is most unusual (typically when the dailies fail to build for some reason the old ones still hang around)…

Just checked the livefs build logs, and it looks like the last build succeeded (you could grab the image from there but I don’t want to update the post with that as those links tend to go stale fairly quickly).

I’ll check in with the release team and see if something odd is occurring on cdimage…

Yup, turned out there was some snafu with cdimage; this has now been corrected and the links in the main post should be working once more.

1 Like

Will this be backported to Noble?

Would it matter if my boot drive is an M.2?

Will this be backported to Noble?

No, there are no plans to backport this. Partly this is because the switching logic relies on a new feature in GNU coreutils 9.5 (mv --exchange to atomically exchange to files/dirs). However, the primary intent is to run this experiment in an interim, so that (assuming all goes well), we can have this in 26.04, the next LTS.

Would it matter if my boot drive is an M.2?

No, this should work with any valid boot media for the Pi.

That said, there is currently an issue using it with “odd” layouts. Some discussion came up on the blog about users of PINN (née NOOBS) which constructs multi-boot setups on the storage. The current implementation of piboot-try does have a baked in assumption that the Ubuntu boot partition is the “first” boot partition. This will be the case when it’s the only OS, but that’s not the case with PINN.

I’m hoping to correct that today (made some successful experiments last night), so at some point today there should be an updated version in the PPA which will work with that setup.

Anyway, yes, feel free to test this with SD or USB or NVMe boot drives – it should work with all of these. In theory it should even work with (block-based) netboot setups, but that’s not something I’ve tried yet.

1 Like

As someone who uses a handful of Raspberry Pis as headless Ubuntu Servers I think this is a great feature! The current way kernels are handled has always made me a little nervous when booting a new kernel, especially when the Pi is at a remote location. And I won’t miss all those .bak files.

So far in my testing on a Raspberry Pi 5 everything works as expected. The following function in the fish shell lets me avoid the double reboot:

if command -q piboot-try
    function piboot --description 'Reboot a Raspberry Pi and avoid a double boot'
        if piboot-try --test
            sudo piboot-try --reboot
        else
            sudo reboot
        end
    end
end

The only thing this new approach breaks for me is a script I have that modifies /boot/firmware/cmdline.txt since that file has moved. I don’t mind changing my script but perhaps a symbolic link might help avoid issues for others.

Since I only use LTS releases I guess I won’t be able to use this feature in “production” until 26.04.1. I’ll try to be patient. :slight_smile:

Great work, Dave!

2 Likes

Symbolic links are rather hard to do on a vfat :wink:

Whoops, that’s good point.

I’ve just uploaded a new (hopefully final?) variant of flash-kernel-piboot to the PPA for testing (give it an hour or so to build and publish). Quite a bit has changed internally, but hopefully very little of it should be “noticeable” to users. Specifically:

  • PINN/NOOBS multi-boot setups are now supported. Migration has been tested, and the tryboot mechanism works smoothly.

  • The opt-out mentioned in the linked blog has also been tested, and works well (TL;DR: if you want to avoid these changes, you can override the relevant flash-kernel db entry to use “Method: pi” instead of “Method: pi-try” and the migration to the new setup will never occur).

  • A new validation step has been added. This is to answer the question “what is a good boot?”. Previously, simply reaching the piboot-try-validate service was deemed “good enough”, but what about (for example) a headless server in which a new kernel is missing a module to drive the ethernet port? The boot would still succeed, but the machine would be unreachable. There’s no way we can cover all these possible scenarios universally, but there’s now an /etc/flash-kernel/piboot-validate hook that can be used to handle this (the default simply runs true).

  • A couple of new features have been added to the piboot-try binary. It can now be used to “reset” the “bad” state of the “new” assets after failure (e.g. if you’ve corrected some external configuration that caused assets to be marked bad erroneously). It can also be used to restore the “old” assets to “current” in case you want to manually fall back to those.

As someone who uses a handful of Raspberry Pis as headless Ubuntu Servers I think this is a great feature! The current way kernels are handled has always made me a little nervous when booting a new kernel, especially when the Pi is at a remote location. And I won’t miss all those .bak files.

Quite! From the flip-side, pushing out updates to however many Pi users we have, knowing we had basically no safety net, was making me more and more nervous too.

So far in my testing on a Raspberry Pi 5 everything works as expected. The following function in the fish shell lets me avoid the double reboot: […]

That’s exactly how I envisaged that being used. I was vaguely tempted to wrap all that logic up in one tool, but I figure the separation of --test and --reboot may prove useful for certain automated systems.

The only thing this new approach breaks for me is a script I have that modifies /boot/firmware/cmdline.txt since that file has moved. I don’t mind changing my script but perhaps a symbolic link might help avoid issues for others.

Symbolic links are rather hard to do on a vfat :wink:

Heh, precisely.

This is the one remaining thing that’s bugging me, as I know of a few scripts that assume they can edit /boot/firmware/cmdline.txt.

If this winds up causing issues related to that in questing, I might consider having, say, a /boot/firmware/cmdline.txt file which some service (probably piboot-try-reboot) handles auto-copying over /boot/firmware/{current,new}/cmdline.txt as needed. But which should it copy over? If both, there’s no fallback. If current then new will overwrite it, losing changes. If new then users may be surprised the change does not take effect until flash-kernel is re-run. Anyway, we’ll see how this works in questing and I can return to that question later.

I just happened upgrade these packages before your post and have run into a bug, which I thought I’d report here since it appears easy to recreate:

dem@questing ~> sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.14.0-1005-raspi
Using DTB: bcm2712-rpi-5-b.dtb
Installing /lib/firmware/6.14.0-1005-raspi/device-tree/broadcom/bcm2712-rpi-5-b.dtb into /boot/dtbs/6.14.0-1005-raspi/./bcm2712-rpi-5-b.dtb
Taking backup of bcm2712-rpi-5-b.dtb.
Installing new bcm2712-rpi-5-b.dtb.
flash-kernel: installing version 6.14.0-1005-raspi
Removing /boot/firmware/old/ and /boot/firmware/new/
Copying kernel assets to /boot/firmware/new/
Copying boot firmware to /boot/firmware/new/
Copying device trees to /boot/firmware/new/
Copying device tree overlays to /boot/firmware/new/overlays/
Completed; please be aware next reboot will boot twice
dem@questing ~> piboot-try --test
piboot is in an invalid or unexpected state:1/-/good/unknown
Please file an issue: ubuntu-bug flash-kernel

Edit: I think I got an earlier version of the packages from the PPA. Will test again.

Edit again: Yes, sorry about this, the error was with version “ppa7” but I now have “ppa12” and don’t see the issue.

Indeed – that’s one of the things I came across in testing. I hadn’t considered all the possible states that runtime could legitimately be in (previously I’d considered the states from the perspective of the two services, but that’s not enough for piboot-try “post-boot” which can be in several more states).

The reboot into tryboot should also be a bit quicker now (particularly for PINN/NOOBS users; previously it would load the PINN/NOOBS bootloader but we skip all that now). So the claim the boot will take 2x as long probably isn’t entirely accurate, it’s probably something like 1.5x - 1.7x times as long (guesstimating from a couple of tests here, but that’ll probably vary quite a bit by board and whether the server / desktop image is in use).

This is now uploaded to questing. I should caution this is going to be a relatively long transition for the questing dailies as it’s not “simply” a matter of updating flash-kernel. Specifically, the TODO list is as follows:

  • Firstly, the flash-kernel upload will require AA approval, as it defines a new binary package
  • If that approval is given, flash-kernel will land in questing release
  • The raspi platform seeds need to be updated to include flash-kernel-piboot (the merge is already prepped, but I’ll only commit that once the update lands to prevent earlier breakage of the dailies)
  • Finally, the gadget will need updating to provide the new boot partition structure on fresh images. I’m still prepping those changes, as they’re mildly entangled with other changes fixing an issue with mkswap.service

At some point during this, the questing dailies may break. Before the gadget is updated (last step), flash-kernel will be trying to migrate the boot partition structure during ubuntu-image’s attempt to build the images … I’ve no idea if that will work or not, but if it doesn’t then the builds will fail (until the gadget is updated).

Many thanks for all the testing and feedback from everyone who participated – it was much appreciated, and certainly helped improve the design. Anyway, I’ll keep an eye on things and try and shepherd all this to a conclusion!