Zombie nvidia-containe(r) process hanging off lxd daemon

since recently I started noticing a zombie nvidia-containe(r) process hanging off of lxd daemon

2821313 ?        Sl   290:37  \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
2114785 ?        Z      0:00      \_ [nvidia-containe] <defunct>
# cat /proc/2114785/status
Name:   nvidia-containe
State:  Z (zombie)
Tgid:   2114785
...

systemctl reload snap.lxd.daemon “fixes” it but it keeps reapearring. I’m still running old RHEL8 and lxd 5.21 LTS and there hasn’t been any updates since forever, what is rthis and where does it come from, any ideas?

# lxd --version
5.21.3 LTS
# snap info lxd
name:      lxd
...
snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     5.21/stable
refresh-date: 2025-04-01
...

That “[nvidia-containe] <defunct>” is a leftover nvidia-container-cli helper that LXD launches when it probes GPUs. In the libnvidia-container version bundled with LXD 5.21.3 the child process exits but never gets wait()-ed, so it stays as a harmless zombie under the lxd daemon.

Fix it in two lines:

# pull the snap that has the patched libnvidia-container
sudo snap refresh lxd --channel=5.21/stable   # or 6.0/stable if you prefer

# restart the daemon so it uses the new toolkit
sudo systemctl restart snap.lxd.daemon

No NVIDIA in your containers? You can also skip GPU probing altogether:

lxc config set core.nvidia_runtime false

Either way, the “nvidia-containe” zombies stop appearing.

Please may you open an issue about this here GitHub · Where software is built

In case LXD is missing a process wait call.

opened Zombie nvidia-containe(r) process hanging off lxd daemon · Issue #15796 · canonical/lxd · GitHub

1 Like

sudo snap refresh lxd --channel=5.21/stable

snap “lxd” has no updates available

which makes sense as I’m already tracking 5.21/stable

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.