once i killed the forkfile all hanging snapshot operations completed (and I had quite a bit since I’m using lxd-snapper)
to me it looks like someone started copying a file from the server and that somehow hung indefinitely (the dates in /proc are from 16 hours earlier) and blocked the snapshot creation/deletion.
Am I interpreting this right? Shouldn’t forkfile have some kind of timeout? How do I debug this further? Shall I open an issue?
Which storage driver are you using? I have just tried copying a big blob from a container (on ZFS) and the deletion doesn’t block, it interrupts the lxc file pull command. What version of LXD are you using?
I’m using zfs, lxd is lxd 5.19-31ff7b6 26093 5.19/stable
I tried the same, pulled a big file over vpn, and while it was pulling, issued lxc snapshot of the same container and it was blocked while the pull was running, like this
~$ time lxc snapshot git
real 0m0.307s
user 0m0.111s
sys 0m0.025s
$ lxc file pull git/backups/git-1701753905-16.4.1-ee.0_gitlab_backup.tar /scratch/
Pulling /scratch/git-1701753905-16.4.1-ee.0_gitlab_backup.tar from backups/git-1701753905-16.4.1-ee.0_gitlab_backup.tar: 2.75GB (109.55MB/s)
and in another shell
~$ time lxc snapshot git
real 1m14.336s
user 0m0.098s
sys 0m0.064s
normally snapshots takes 1 second, but here it waited until the pull was over
I have talked to my team today and what happened is that an admin started a pull and his laptop ran out of battery in the middle of it, so I guess the idle detection is not working as it should, or maybe I have some specific config that confuses it.
and indeed I have started a pull from my laptop and disconnected my VPN right after and managed to reproduce the exact situation, forkfile process hanging forever, OK, half an hour , but long after pull timed out on client side
Please can you confirm whether a lxc stop --force <instance> or lxc delete --force <instance> works in these scenarios where a stalled remote client is holding forkfile open for extended periods of time?