Lxd failed starting and btrfs rootfs available disk space was drained and now unaccounted for

Tag: Storage

Questions:

  1. How to inspect an Ubuntu Server 22.04 system using BTRFS as rootfs(/) with a snap based lxd 5.15 stable/latest installation, to identify what folders and files or otherwise is occupying space on the rootfs partition, in a scenario where du and btrfs fi du doesn’t show it (see Output Reference below for more details).

  2. is snap or lxd writing some log or data files somewhere in a hidden volume/namespace/container ?

  3. is btrfs doing some magic and hidden stuff behind the scenes, like some default CoW Copy-on-Write behaviour or duplication or caching or other stuff which keeps data blocks in use without beeing freed ?

  4. …anything else worth asking and disclosing that can lead to more insights and resolutions ?

Incident Background:

After I moved a machine to a different network, where it was allocated a different IP address, lxd didn’t start because it used the old IP address that was still stored in lxd’s dqlite database, until I manually updated it using lxd sql local “UPDATE config SET value=’<NEW_IP_ADDRESS>:8443’ WHERE key=‘core.https_address’”. And similar for core.storage_buckets_address.

Storage Problem:

During the failed lxd start attempts, the btrfs /dev/sdb2 partition mounted as / rootfs, used 10+ GB of additional disk space, but I cannot see where it is used.

It didn’t stop draining the available space on the btrfs rootfs partition until the NEW_IP_ADDRESS was set and lxd started running.

Now there is an approximately ~20GB discrepancy that I cannot account for and I need to understand how it is used, and how to reclaim the space and avoid it happening again.

Output Reference:


$ uname -a
Linux server 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ btrfs version
btrfs-progs v5.16.2

$ snap info lxd
...
installed:          5.15-002fa0f             (25112) 181MB -

$ lsblk -d -o name,rota,size,type,mountpoints /dev/sdb2 # ROTA: 0 = SSD
NAME ROTA SIZE TYPE MOUNTPOINTS
sdb2    0  32G part /

$ lsblk -a
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0     0B  1 loop 
loop1    7:1    0  73.9M  1 loop /snap/core22/858
loop2    7:2    0 173.5M  1 loop /snap/lxd/25112
loop3    7:3    0  53.3M  1 loop /snap/snapd/19457
loop4    7:4    0     0B  0 loop 
loop5    7:5    0     0B  0 loop 
loop6    7:6    0     0B  0 loop 
loop7    7:7    0     0B  0 loop 
sdb      8:16   0 476.9G  0 disk 
├─sdb1   8:17   0     1G  0 part /boot/efi
├─sdb2   8:18   0    32G  0 part /
├─sdb3   8:19   0     2G  0 part [SWAP]

$ sudo parted /dev/sdb unit GiB print
Model: ATA WALRAM 512GB (scsi)
Disk /dev/sdb: 477GiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 
Number  Start    End      Size     File system     Name   Flags
 1      0.00GiB  1.05GiB  1.05GiB  fat32                  boot, esp
 2      1.05GiB  33.1GiB  32.0GiB  btrfs
 3      33.1GiB  35.1GiB  2.00GiB  linux-swap(v1)         swap

$ sudo parted /dev/sdb print
Model: ATA WALRAM 512GB (scsi)
Disk /dev/sdb: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 
Number  Start   End     Size    File system     Name   Flags
 1      1049kB  1128MB  1127MB  fat32                  boot, esp
 2      1128MB  35.5GB  34.4GB  btrfs
 3      35.5GB  37.6GB  2147MB  linux-swap(v1)         swap

$ sudo parted /dev/sdb2 print
Model: Unknown (unknown)
Disk /dev/sdb2: 34.4GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 
Number  Start  End     Size    File system  Flags
 1      0.00B  34.4GB  34.4GB  btrfs

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.1G  1.7M  3.1G   1% /run
/dev/sdb2        32G   30G  1.4G  96% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb1       1.1G  6.1M  1.1G   1% /boot/efi
tmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/ns
tmpfs           3.1G  4.0K  3.1G   1% /run/user/1000

$ sudo du -h -d 1 /
260M	/boot
0	/dev
6.8M	/etc
3.8G	/home
0	/media
0	/mnt
0	/opt
du: cannot access '/proc/27363/task/27363/fd/3': No such file or directory
du: cannot access '/proc/27363/task/27363/fdinfo/3': No such file or directory
du: cannot access '/proc/27363/fd/4': No such file or directory
du: cannot access '/proc/27363/fdinfo/4': No such file or directory
0	/proc
104K	/root
1.7M	/run
0	/srv
0	/sys
0	/tmp
3.0G	/usr
1.3G	/var
998M	/snap
9.1G	/

$ btrfs fi df /
Data, single: total=30.48GiB, used=29.60GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=488.00MiB, used=163.56MiB
GlobalReserve, single: total=18.52MiB, used=0.00B

$ sudo btrfs fi du -s --si /
     Total   Exclusive  Set shared  Filename
cannot access: 'efi:' Inappropriate ioctl for device
cannot access: 'dev:' Inappropriate ioctl for device
cannot access: 'proc:' Inappropriate ioctl for device
cannot access: 'run:' Inappropriate ioctl for device
cannot access: 'sys:' Inappropriate ioctl for device
cannot access: 'ns:' Inappropriate ioctl for device
cannot access: '19457:' Inappropriate ioctl for device
cannot access: '858:' Inappropriate ioctl for device
cannot access: '25112:' Inappropriate ioctl for device
    8.45GB      8.27GB     90.96MB  /

$ sudo btrfs fi du -s /*
     Total   Exclusive  Set shared  Filename
 110.29MiB   110.29MiB       0.00B  /bin
cannot access: 'efi:' Inappropriate ioctl for device
 253.26MiB   253.26MiB       0.00B  /boot
ERROR: cannot check space of '/dev': Inappropriate ioctl for device
   2.00MiB     2.00MiB       0.00B  /etc
   3.64GiB     3.64GiB     2.13MiB  /home
   2.30GiB     2.30GiB       0.00B  /lib
     0.00B       0.00B       0.00B  /lib32
     0.00B       0.00B       0.00B  /lib64
     0.00B       0.00B       0.00B  /libx32
     0.00B       0.00B       0.00B  /media
     0.00B       0.00B       0.00B  /mnt
     0.00B       0.00B       0.00B  /opt
ERROR: cannot check space of '/proc': Inappropriate ioctl for device
  56.00KiB    56.00KiB       0.00B  /root
ERROR: cannot check space of '/run': Inappropriate ioctl for device
  28.79MiB    28.79MiB       0.00B  /sbin
cannot access: '19457:' Inappropriate ioctl for device
cannot access: '858:' Inappropriate ioctl for device
cannot access: '25112:' Inappropriate ioctl for device
     0.00B       0.00B       0.00B  /snap
     0.00B       0.00B       0.00B  /srv
ERROR: cannot check space of '/sys': Inappropriate ioctl for device
     0.00B       0.00B       0.00B  /tmp
   2.77GiB     2.77GiB       0.00B  /usr
cannot access: 'ns:' Inappropriate ioctl for device
   1.20GiB     1.05GiB    84.61MiB  /var

$ sudo btrfs fi usage /
Overall:
    Device size:		  32.00GiB
    Device allocated:		  31.50GiB
    Device unallocated:		 513.00MiB
    Device missing:		     0.00B
    Used:			  29.92GiB
    Free (estimated):		   1.38GiB	(min: 1.13GiB)
    Free (statfs, df):		   1.38GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		  18.52MiB	(used: 0.00B)
    Multiple profiles:		        no
Data,single: Size:30.48GiB, Used:29.60GiB (97.11%)
   /dev/sdb2	  30.48GiB
Metadata,DUP: Size:488.00MiB, Used:163.56MiB (33.52%)
   /dev/sdb2	 976.00MiB
System,DUP: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/sdb2	  64.00MiB
Unallocated:
   /dev/sdb2	 513.00MiB

$ journalctl --disk-usage
Archived and active journals take up 112.0M in the file system.

$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=15835356k,nr_inodes=3958839,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3178728k,mode=755,inode64)
/dev/sdb2 on / type btrfs (rw,relatime,ssd,space_cache=v2,subvolid=5,subvol=/)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18845)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
none on /run/credentials/systemd-sysusers.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
/var/lib/snapd/snaps/core22_858.snap on /snap/core22/858 type squashfs (ro,nodev,relatime,errors=continue,x-gdu.hide)
/var/lib/snapd/snaps/lxd_25112.snap on /snap/lxd/25112 type squashfs (ro,nodev,relatime,errors=continue,x-gdu.hide)
/var/lib/snapd/snaps/snapd_19457.snap on /snap/snapd/19457 type squashfs (ro,nodev,relatime,errors=continue,x-gdu.hide)
/dev/sdb1 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
tmpfs on /run/snapd/ns type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3178728k,mode=755,inode64)
nsfs on /run/snapd/ns/lxd.mnt type nsfs (rw)
tmpfs on /var/snap/lxd/common/ns type tmpfs (rw,relatime,size=1024k,mode=700,inode64)
nsfs on /var/snap/lxd/common/ns/shmounts type nsfs (rw)
nsfs on /var/snap/lxd/common/ns/mntns type nsfs (rw)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=3178724k,nr_inodes=794681,mode=700,uid=1000,gid=1000,inode64)

$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sdb2 during curtin installation
/dev/disk/by-uuid/31b13acd-b3c9-4fee-aa76-da9896c4d58b / btrfs defaults 0 1
/dev/disk/by-uuid/1b5aae77-e100-4fb8-aa54-571d3453f2b9 none swap sw 0 0
# /boot/efi was on /dev/sdb1 during curtin installation
/dev/disk/by-uuid/1C75-EDE6 /boot/efi vfat defaults 0 1

$ wget https://speed.hetzner.de/10GB.bin
--2023-08-15 10:53:06--  https://speed.hetzner.de/10GB.bin
Resolving speed.hetzner.de (speed.hetzner.de)... 88.198.248.254, 2a01:4f8:0:59ed::2
Connecting to speed.hetzner.de (speed.hetzner.de)|88.198.248.254|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10485760000 (9.8G) [application/octet-stream]
Saving to: ‘10GB.bin’
10GB.bin                                            14%[==============>                                                                                                  ]   1.38G  7.99MB/s    in 2m 57s  
Cannot write to ‘10GB.bin’ (Success).

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.1G  1.7M  3.1G   1% /run
/dev/sdb2        32G   32G  4.0K 100% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sdb1       1.1G  6.1M  1.1G   1% /boot/efi
tmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/ns
tmpfs           3.1G  4.0K  3.1G   1% /run/user/1000

Hi,

Do you have any logs of the failed LXD startup attempts and the syslog during this period? Additionally, what was your process for moving the machine to the new network? Is it possible that the instances were still running even though the daemon itself had been stopped?

Logs are written for each instance but it is unlikely they would grow so large.

I have isolated the problem and there is a resolve, however for general lxd resilience on btrfs rootfs, it deserves to be investigated further when time allows.

  • Moving the machine to another network

    • shutdown (used either shutdown or systemctl, can’t remember)
    • power off
    • disconnected the ethernet cable and machine from the router
    • connected the ethernet cable and machine to other router
    • power on
    • lxd automatically attempting to start but failed
  • fixing lxd not starting and disk space drainage by updating the lxd dqlite database

    • …lxd couldn’t start because core was configured to use ip:port not just :port
    • …something continuously wrote to /var/snap/lxd/common/lxd/database/global/…files
    • …but I couldn’t see this on the filesystem, only that the available disk space continued to get drained
    • …until I tried lxd shutdown, snap stop lxd, snap disable lxd
    • …and the drainage continued every time I tried enabling and starting lxd again.
    • …I think to remember, at least until I ran the lxd sql local update to update the core addresses, since lxd config set didn’t work since of course the server was in a limbo.
  • fixing the df and du disk space discrepancy by deleting btrfs UNREACHABLE lxd database files with btdu

    • I installed btdu (https://github.com/CyberShadow/btdu)
    • ran sudo btdu /
    • dived into //< SINGLE >/< DATA >/< UNREACHABLE >/var/snap/lxd/common/lxd/database/global
    • …found the 20+ GB in there
    • then I copied /var/snap/lxd/common/lxd/database to a tmp folder
    • …and made sure lxd was not running and the snap was disabled
    • used btdu to delete the UNREACHABLE EXTENT path at the …/database level
    • …and the 20+ GB was made available to the system again :slight_smile:
    • also deleted “/< SINGLE >/< DATA >/< UNREACHABLE >/var/log”
    • …but I shouldn’t have, without having copied it to make a backup first like I did with the database folder
    • …this not only deleted the “UNREACHABLE” files and freed the space
    • …it also deleted the live reachable database folders from the active rootfs filesystem
    • then I copied back the temporary database folder
    • and enabled the lxd snap: sudo snap enable lxd
    • …however it wouldn’t start, because /var/log had also been deleted with btdu when deleting the “UNREACHABLE”
    • so I recreated /var/log with: mkdir -p /var/log
    • and retried reenabling the lxd snap: sudo snap enable lxd
    • …and then it worked
    • …although btdu showed something in “/< SINGLE >/< DATA >/< ERROR >”
    • …however after a restart, as suggested by btdu, that disappeared.
    • I restarted and it was gone
    • …and then lxd was running again, with the space reclaimed.
  • thougts

    • …If I hadn’t configured lxd with the ip address, but just the port, then the problem with lxd trying to start using the old ip that it found in the database would not have been an issue
    • …if I hadn’t used btrfs on the rootfs but eg. ext4 or had the lxd database on a btrfs filesystem, no issue, but that shouldn’t be an issue
    • …it may have something to do with btrfs’ default CoW, but then lxd could handle that or at least mention it in the docs, and how to disable CoW for the database files or do itself to ensure resilience.

if you are going to work on making lxd more resilient to handle this edge case for the future, let me see what logs I can dig out from my notes, while working on the resolve.

…while this comment, can be a solution, and I’ll mark it as such for now, I think a real solution is to build in some resilience, to handle lxd in a btrfs context and scenarios such as this.