To fsck or not to fsck at boot by default

mwhudson · March 23, 2021, 12:59am

We recently started work on a pair of bugs which reported that systems installed by curtin (including subiquity) do not fsck their filesystems on boot by default (because the “fs_passno” field of the fstab file is written as “0”). The question arose as to whether this is something we want to do: in the age of journalling filesystems, is it still desirable to run fsck?

Some filesystems do not support fsck in any meaningful way, for example xfs, zfs, btrfs. There are fsck.$fstype binaries for these filesystems but they do nothing.

ext2, ext3 and ext4 do have a non-trivial fsck, although for journalled ext3 and ext4 systems all it usually does is replay the journal, the same as the kernel driver would do on mount. But there are situations where it does more, and unjournalled filesystems (which aiui are usually either ext2 or filesystems that were created as ext2 and then upgraded, so not something that we need to worry about too much in a modern installer) should be fscked.

One thing that came up in this investigation is that nothing in modern Ubuntu cares about the historical distinction between 1 and 2 for the passno field talked about in the fstab man page.

It also seems wise to fsck the /boot/efi partition, if any.

I did a quick survey of other installers.

anaconda (used by fedora, centos and rhel) writes passno as 1 or 2 for certain filesystem types (ext?, /boot/eft, the ones with _check = True in https://github.com/storaged-project/blivet/blob/3.4-devel/blivet/formats/fs.py) and 0 otherwise
calamares writes passno as 1 or 2 for all filesystems apart from swap partitions
partman (and so both d-i based installers and ubiquity) writes 1 or 2 for ext?, fat32, btrfs, /boot/efi, and writes 0 for xfs and ntfs
the arch fstab generator writes passno 1 or 2 for any filesystem type that ships a fsck.$fstype binary and 0 otherwise
fedora cloud images have an ext4 / and have passno 1 in their fstab

My takeaway from this is that we should certainly write passno=1 for a ext-family filesystem and for /boot/efi. Whether we want to be smart about writing passno=0 for e.g. xfs, I don’t know. It doesn’t make much difference either way.

demyers · March 23, 2021, 1:29pm

As an end user I found the change to fs_passno confusing, but even if you revert it to 1 for ext4 don’t you also need to use tune2fs -i or tune2fs -c to get fsck to actually run?

Edited to add: Here’s more detail on what I’ve found confusing. These are three different physical systems using ext4 for /:

Legacy-installed bionic: Never checked on its own. Used tune2fs -i and it works as expected.
Subiquity-installed focal: Used tune2fs -i but it still wouldn’t check until I found the 0 in fstab.
dd-installed focal on Raspberry Pi 4: Wouldn’t check with either of these changes. Turns out the lack of an RTC breaks tune2fs -i, had to use tune2fs -c.

jimduchek · March 23, 2021, 6:25pm

The Arch behavior seems correct to me. I feel that “run the recommended checks of the filesystem” is a necessary step before mounting them, and should be the default state of things. In the case of some filesystems, those checks may be no (or little) op executables that will take no (or little) time to run.

The installer should not attempt to be smarter than the maintainers of the utilities of that particular fs. If they included a fsck, it ought to be run. They may be no-ops now, but you cannot foresee the future, and while it seems unlikely for these filesystems, the maintainers may see fit to include some actual check/repair functionality in a future update.

mwhudson · March 23, 2021, 8:30pm

Hmm yes, I think you’re right. e2fsck definitely does some things before it checks the mount count but not much from a quick glance. However, passno = 1 is still required to get the behaviour one expects

Ha! I guess that sort of makes sense, when you already know what’s going on.

I get where you’re coming from but I’m not sure this is quite right, it seems some utilities ship a fsck because in the past they felt pressure to, for example, man fsck.zfs says this:

It is installed by ZoL because some Linux distributions expect a fsck helper for all filesystems.

Having said that, the arch behaviour does seem reasonable.

jimduchek · March 24, 2021, 2:02am

I kinda thought about this, but ultimately I think just running the no-op, included fsck, is the most ‘future-proof’. I’d put 99.99+% odds on the zfs fsck never doing anything, but I can’t put 100% on it. It costs basically nothing to run it now, but may potentially be bad not to run it in the future. Should it be required later, an system update will have to mess with the user’s fstab (this seems bad), or updates may leave the (now necessary) fsck unrun, which is also bad. Spending 2 extra microseconds every boot for most users (A savvy admin who cares about those 2 us can change their fstab manually) seems good insurance against the future.