Migration to rust-coreutils in 25.10

We’d like to migrate to rust-coreutils this cycle as a new default. People can revert back if needed. Below are the relevant parts of the internal draft spec.

I’d like to upload the changes as soon as possible once the archive is open, probably next week.

Mechanisms for migrating

Migrating coreutils to a new package is an arduous task, as it is an Essential package which has the requirements that:

  1. It must work when merely unpacked, files must not disappear at any point
  2. It must not conflict with other essential packages in files shipped as bootstrapping merely extracts them and then does the proper unpacking runs. Having the same files in multiple packages would lead to indeterminate results.

This ruled out two mechanisms from the start:

  • We cannot use alternatives to manage coreutils, because alternatives are configured in the maintainer script - there would be no cp and other binaries until we run the maintainer script.
  • We cannot use diversions and have both coreutils and coreutils-from-uutils (or similar) ship /usr/bin/cp - they also would clash at bootstrapping time when we extract the packages [assuming we need to bootstrap with both temporarily], it’s also not particularly clean to have all these diversions on a clean system.

The approach proposed therefore is as follows (“advanced dependency gymnastics”):

  • We rename the existing coreutils package to gnu-coreutils, and build it with a gnu prefix, e.g. gnucp (the gnu prefix has prior use in BSD world, otherwise we should have used gnu-).
  • We introduce a new package, coreutils-from (https://git.launchpad.net/~juliank/+git/coreutils-from/) that provides the following binaries:
    • Package: coreutils
      Pre-Depends: coreutils-from-uutils | coreutils-from
      Essential: yes
    • Package: coreutils-from-uutils
      Pre-Depends: rust-coreutils
      Provides: coreutils, coreutils-from
      Conflicts: coreutils-from
      Replaces: coreutils-from, coreutils (<< ${split})
      [optionally] Breaks: coreutils (<< ${split})
      Protected: yes
    • Package: coreutils-from-gnu
      Pre-Depends: gnu-coreutils
      [… as coreutils-from-uutils … ]
  • Explanation:
    • We need to use Pre-Depends here because we need coreutils to be working when unpacked, as it is Essential, this applies transitively and may be somewhat hard on the solver/ordering…
    • We need to mark the provider packages coreutils-from-* as Protected: yes to prevent the package manager or the user from trying to switch them, as that would fail: APT would remove the other provider first, and then the binaries would be missing and dpkg would fail (see revert mechanism)
    • We need the Replaces/Conflicts/Provides between the providers as they are not co-installable. We need to use coreutils-from here as we don’t want to conflict with the coreutils metapackage.
    • We need the versioned Replaces and Breaks against the old coreutils such that we can migrate the functionality from it. We can drop the Breaks if they cause issues (there may be a loop here, and apt could decide to upgrade coreutils before installing coreutils-from-*), but the Replaces are needed for unpacking to succeed.
  • The coreutils package is empty, it exists just to pull in a provider and to mark the functionality of coreutils as Essential. It’s mostly useful for upgrades.
  • The coreutils-from-* packages contain symlink farms for commands (potentially mini wrappers, see security impact), manual pages, and completions, such as:
    • /usr/bin/ls -> /usr/bin/coreutils
    • /usr/share/man/man1/ls.1.gz -> /usr/share/man/man1/rust-ls.1.gz

Related Debian work

Debian is working on the same but sort of opposite thing, namely, supporting toybox and busybox as alternative providers of the coreutils functionality to provide smaller minimal images. The scheme has been proven to work in their experiments (https://salsa.debian.org/josch/busybox-is-coreutils-demo/).

Impact on minimal images

Assuming that we eventually want to drop the gnu coreutils to universe, and only support the rust-based version, we are looking at a significant problem for those stories:

Larger image size: A Docker image currently is 75 MB large. Rust-coreutils come in at 25 MB vs 7 MB for the classic coreutils, increasing the image size by 18MB to 93MB (+24%).

This can be worked around by continuing to commit to classic GNU coreutils on those platforms, but this increases the overhead of having to support and validate two implementations.

Security impact

AppArmor profiles don’t work correctly with a multi-call binary. A profile allowing /usr/bin/ls now needs to allow /usr/bin/coreutils (as AppArmor profiles follow symbolic links) and there is no way to identify which of the tools is being called in the profile. The upstream project may want to define an apparmor profile for coreutils with “hats” for the individual tools and then switch the hat on initialisation, but this only solves the issue partially (doesn’t help for inherited profiles, for example).

This solves some problems, but there may be more:

  1. Build tiny wrapper binaries that ensure the coreutils binary is called with the right first argument, that is, /usr/bin/rm essentially does argv[0] = “rm”; execv(“/usr/bin/coreutils”, argv) - this ensures that an apparmor profile that does e.g. /usr/bin/rm Ux works safely, but doesn’t necessarily work for other things.
  2. Build coreutils into a dynamic library that simply exposes coreutils_main() and then call that from tiny wrapper binaries

Testing

With the changes as described above, the existing tests that get triggered for coreutils will automatically be triggered for coreutils-from-uutils, as the new coreutils depends on it (and it is installed in the chroots).

This doesn’t cover everything, as the coreutils are “essential” and not everything depends on them explicitly.

Upgrades

Per above, on upgrade the new coreutils binary from the coreutils-from source package will be installed, which will pull in coreutils-from-uutils as it is the leftmost dependency.

Revert mechanism

The simple mechanism that should work is
apt install coreutils-from-gnu coreutils-from-uutils- --allow-remove-essential

But we have a couple of limitations in that APT tries to remove packages first, the binaries would go missing, and dpkg would then fail to unpack the new coreutils. This is an APT limitation and can be fixed in APT, dpkg supports just installing the new package and will remove the other package itself; that is, this works fine:

  • apt download coreutils-from-gnu
  • dpkg --install ./coreutls-from-gnu*.deb (will remove coreutils-from-uutils automagically)

However, to work with existing APT, we can adopt protective diversions: For all coreutils providers:

  • In their prerm script, run dpkg-divert –no-rename –add for all coreutils to make dpkg think the files have a different name and not remove the real ones.
  • In their preinst script, we run dpkg-divert –no-rename –remove to remove those protective diversions, in turn when our provider is unpacked it will override the leftover protected binaries.

See prerm.in and preinst.in in https://git.launchpad.net/~juliank/+git/coreutils-from/tree/debian?h=main for the details.

So when you switch from coreutils-from-a to coreutils-from-b:

  1. coreutils-from-a prerm adds the diversions, making dpkg think that, for example, ls is ls.remove-bak
  2. coreutils-from-a removal does effectively nothing to the files because it will try to remove ls.remove-bak which does not exist, and keep ls untouched
  3. coreutils-from-b preinst undoes the diversion
  4. coreutils-from-b is being extracted, and as there is no diversion it will take over ls

Alternatively, when APT is fixed/dpkg is also used directly, it executes in the order of 1, 3, 4, 2 which also works (as the file ownership of ls moved to coreutils-from-b by the point we remove coreutils-from-a files).

Known issues

8 Likes

Why oxidizr will not be used to switch between rust and gnu-coreutils ? Only the other way around as now.

That is surprisingly huge. I’ve inspected the .deb package contents, and the /usr/bin/coreutils file is in the package is 23.7Mb. According to the debian/rules file, the build command it runs is make SELINUX_ENABLED=1 PROFILE=release MULTICALL=y, which on my machine creates a 13.1MB file when building uutils from git (on commit 044b33d8cb147d63205f1ef562b5a2ce2bcee7c7) and running strip on the resulting binary. I wonder where the extra 10MB came from?

The debian/rules file messes around with LTO by editing the Cargo.toml file under some conditions; perhaps it disables full LTO in more cases than it should. Switching from lto = true to lto = thin in Cargo.toml when building from git results in a 17MB file.

Not sure where the remaining 6MB came from. Doesn’t seem to be debug info, or at least running strip on the binary from the .deb package doesn’t do anything. I would be surprised if this overhead isn’t avoidable.

There is also an easy win for binary size by using release-fast profile instead of release. That applies panic = "abort" and would cut another 1MB off the binary size. Unfortunately building through make does not support the release-fast profile currently; make just silently falls back to debug profile if you try to tell it to use release-fast. You would need to either get the upstream makefile fixed, or just patch in panic = "abort" into the Cargo.toml as a distribution patch.

If I run cargo build --profile=release-fast --features feat_selinux,unix (which the makefile seems to boil down to) on the git repo of uutils, I get a 12MB binary after stripping. This is half of the current size of the binary in Ubuntu repositories.

It is possible I am missing something, but there are clearly opportunities to reduce binary size still remaining, via panic = abort if anything, although I would be very much surprised if LTO is working as intended when that 23.7MB binary is produced.

This looks good to me. On the size topic I think it is worth spending some time to see if there are things we can do to reduce the size without too much effort (maybe upstream can help here?) before we decide what to do about containers.

Would be upstream keen to provide a way where coreutils-uutils is exposed as a library only and all the simple binaries are generated? As otherwise I’d see it quite annoying to maintain such wrappers downstream for all the tools.

One annoying bit about rust shared libraries is that they can only be used with unsafe and do not propagate enough of rust type information. There are various proposals to improve on this, but they are not implemented yet.

What is apparmor interaction with hard links? Could they be used for those applets that need different profiles?

We could have rust-coreutils use hardlinks in /usr/lib/cargo/bin/coreutils, and then do the symlink farm to that rather than /usr/bin/coreutils, that should work fine; usual caveats about hardlinks in packages apply but I don’t remember them.

In any case, the initial version that follows the spec is now in NEW.

1 Like

Hard links are annoying across volume mounts, but I think we are mostly fine here as everything is under /usr these days.

Also, if one doesn’t need correct setuid/apparmor/selinux at unpack time, I wonder if symlinks can be there in the package, and subsequently replaced with hard links in postinst.

It does sort of need to possibly have anything that needs setuid/apparmor/selinux to be potentially delayed to triggers, or retriggered a second time to fix things up

But yes this is largely optimizations which can be iterated upon after the initial puzzle for essential:yes is landed.

1 Like

Blockers - recording them here, temporarily uploading GNU as default to allow gnu-coreutils to migrate, but this gives us a record.