Systemd service ExecStop fails because disks have been unmounted

henrylaw · December 5, 2024, 10:13pm

I have a simple backup application, in the form of a bash shell script, which has to run immediately the user shuts down. I have a systemd service “run_at_shutdown.service” which calls the script via the “ExecStop” directive. The bash script then uses tar, rsync, etc to copy files from the local machine to a couple of local (LVM) drives, mounted via fstab: the drives are /dd and /e

The problem is that the tar-ing and rsync-ing takes a couple of minutes, during which time systemd has blithely thundered on with the shutdown, so that either or both of the required network drives have already been unmounted when they are needed.

I have implemented various bits of advice found on the web, including: an “After” dependency on network.target (which during shutdown should act as a “before” requirement); listing the required mounts in the “RequiresMountsFor” directive; and “RemainAfterExit”, but still the problem persists. How do I debug this? Is there some way I can make systemd list out exactly what it’s doing and why?

Or can anyone suggest something else to try?

Here is my service definition:

$ systemctl cat run_at_shutdown.service
# /etc/systemd/system/run_at_shutdown.service

[Unit]
Description=Run programs before shutdown
RequiresMountsFor=/home /e /dd /shared /mnt/WinW
Wants=system_backups.service
After=network.target

[Service]
Type=oneshot
User=myuser
Group=users
ExecStop=/usr/local/sbin/run_at_shutdown.sh
# This calls backup_and_replicate.sh, referred to in the log
RemainAfterExit=yes
TimeoutSec=480

[Install]
WantedBy=multi-user.target

And here’s an extract from the system log, showing an example of the failure

Dec 05 21:10:20 ceres myuser[2787]: /usr/local/sbin/backup_and_replicate.sh (2780) 2.001 running
...
Dec 05 21:10:21 ceres systemd[1933]: Stopped target Basic System.
...
Dec 05 21:10:21 ceres blkdeactivate[2655]:   [UMOUNT]: unmounting vg02-e (dm-4) mounted on /e... done
Dec 05 21:10:21 ceres blkdeactivate[2655]:   [UMOUNT]: unmounting vg01-dd (dm-2) mounted on /dd... done
...
Dec 05 21:10:21 ceres myuser[2880]: backup_and_replicate.sh (2780) tar-ing /dd/mozilla/Thunderbird/W
Dec 05 21:10:21 ceres run_at_shutdown.sh[2881]: tar: /e/W/TBird.tar: Cannot open: No such file or directory
Dec 05 21:10:21 ceres run_at_shutdown.sh[2881]: tar: Error is not recoverable: exiting now
...
Dec 05 21:10:21 ceres run_at_shutdown.sh[2888]: rsync: [sender] change_dir "/dd/mozilla/Thunderbird/W" failed: No such file or directory (2)
... etc

This is Mint 21.3

skaperen · December 6, 2024, 5:16am

i am confused about what you are actually doing. are you trying run backup for only that user when that user logs out? can users run cron jobs or leave processes running when they detach? are you trying to run whole system backup when doing shutdown? both? are your backups intended to be incremental? or are you faux-incremental with periodic tarballs?

henrylaw · December 6, 2024, 8:06am

The whole purpose of this is to transfer the user’s Thunderbird profile, and the home directory to a central server, so that they can be used on an alternate computer if the everyday one fails. So the TBird profile is rsync’ed to one mounted drive and /home/myuser to another (they can then be accessed by the alternate machine in emergency). The TBird profile is also tar’ed up for a more conventional archiving process. All of this running as “myuser”, not root.

Yes, there are system backups on shutdown but they’re separate from this and run as root (and they work fine). The user does not leave cron jobs or running processes: all this happens when the user powers off at end-of-day.

henrylaw · December 6, 2024, 2:57pm

I’ve done a bit more investigation. The issue is that systemd will wait for my processes to finish before finally powering down the system, it continues with stopping other processes, most particularly mounts.

I replaced my run_at_shutdown.sh (which is ExecStop-ed by my run_at_shutdown.service) with a small bash program that uses mountpoint to check every 30 seconds whether the file systems are still mounted. It runs for 6 minutes currently before exiting. Its output shows that the various filesystems are unmounted after 30 seconds; but the monitor program is allowed by systemd to run for its full six minutes.

My reading of the RequiresMountsFor= directive is that it translates into Requires= and After= for the mount units for all named file systems. I also think I understand that everything that happens under “ExecStop” is in the reverse order, so “After” some file system is mounted at start time becomes “Let this unit finish before stopping the mount unit” when the service is stopping.

So my question boils down to this: why is systemd not respecting the RequiresMountsFor= directive, and what do I have to do to make it hold off stopping the mount units until my run_at_shutdown unit has finished?

henrylaw · December 6, 2024, 3:15pm

I’ve just found this bug which seems to suggest that I’m not the only one with this problem.