VM in ERROR state

I’m getting a lxd VM in ERROR state but the logs from lxc info --show-log are empty, and the lxd.log doesn’t have anything unusal either – and the VM itself won’t start up.

This is a throw away VM so I don’t need to salvage it, but I’d be curious on what went wrong, tips on troubleshooting / gathering more diagnostics?

For context, I’m deploying a large-ish juju bundle inside that VM using containers, could be resource shortage. The VM is running noble, as are the containers inside. LXD version 5.21.2-2f4ba6b

The main idea would be checking the kernel logs (dmesg) after trying to start the instance. There could be something helpful from QEMU there.

Run lxc monitor --pretty while trying to start the instance to check for events on LXD, could me more informative than lxd.log. Trying to start the instance with lxc start <vm-name> --debug doesn’t hurt, although I don’t think this contains any info that isn’t already on lxc monitor.

If it is a problem with resource limits, try resizing the VM with time lxc config device set c root size="100GiB" or if that fails lxc config device override c root size="100GiB". Make sure to adjust the size of your storage pool if needed. If you have a full storage pool, this can be checked with lxc storage info <pool_name>.

If none of that helps you could try making a backup from the instance with lxc export vm-name ./out.tar.xz --optimized-storage and then impoting it with lxc import ./out.tar.xz test-vm to check for disk integrity. If this doesn’t help recovering the instance at least it could fail in a more informative way. If the export succeds and import doesn’t, importing it back on another storage pool or even other LXD could also be valid (if doing so use the same storage driver as the original instance).
.
Lastly, what exactly happens when attempting to start the instance? If the process just hangs, stracing the qemu process (if it is spawned) could be an option as well.