How to restart a container that isn't stopped with "--force"

I have a situation where one container is not possible to stop or restart.

The container is listed as “RUNNING” but has lost its network and refuses to be stopped.

This is what I have in the logs:

Oct 07 18:21:30 lxdhost7 lxd.daemon[2627]: time="2023-10-07T18:21:30Z" level=error msg="Error calling 'lxd forknet" err="Failed to run: /snap/lxd/current/bin/lxd forknet info -- 14051 3: exit status 1 (Failed setns to container network namespace: No such file or directory)" instance=juju-f1f013-3 instanceType=container pid=14051 project=default

I’ve tried to stop it with “–force” to no avail.

lxc stop juju-f1f013-3 --force

I need some advice as how to bring it back.

Rebooting the host is not my first option since there are many production grade containers still running. E.g. “systemctl restart lxd” isn’t what I prefer either as this would restart all container as far as I know.

What can I do?

I’ve tried using “nsenter”, but it won’t let me in.

nsenter -t 14051 --mount --uts --ipc --net --pid
nsenter: cannot open /proc/14051/ns/ipc: No such file or directory

The host has RAM free, but I still get:

free -h
              total        used        free      shared  buff/cache   available
Mem:          251Gi       211Gi        13Gi        12Mi        26Gi        38Gi
Swap:         8.0Gi       8.0Gi       0.0Ki


nsenter -t 14051 --pid
nsenter: fork failed: Cannot allocate memory

The other containers on the host seems just fine.

Using “kill” isn’t removing the process.

kill -9 14051
ps 14051
PID TTY      STAT   TIME COMMAND
14051 ?        Ss     0:58 [systemd]

I was forced to reboot the host in the end.

If you encounter this again it would be useful to see the output of lxc monitor --pretty at the same time as you run lxc stop -f <instance>. Thanks.

1 Like