Hi,
We have spotted a weird error with LXD and the ubuntu:20.04
and ubuntu:22.04
containers recently. The problem is that the /var/run/nologin
file is “eternally present” after bootup of the containers, making it impossible to use ssh
or lxc exec
to access the content in the container.
When using SSH, we get the famous “System is botting up” error, because the presense of the nologin
file:
$ ssh 10.34.32.179
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."
Connection closed by 10.34.32.179 port 22
We have seen this on various versions of LXD (5.18 on Ubuntu and 5.0.2 on Debian Bookworm, my own machine). Here are the versions of the images where we’ve seen this:
Image versions used
slovdahl@desk:~$ lxc image info ubuntu:20.04
Fingerprint: 1ff1055f3820e9f25c3c49ad3a9dba2e6c585cb5de3440dfe83150ddfc4a643d
Size: 426.11MiB
Architecture: x86_64
Type: container
Public: yes
Timestamps:
Created: 2023/10/11 00:00 UTC
Uploaded: 2023/10/11 00:00 UTC
Expires: 2025/05/29 00:00 UTC
Last used: never
plundberg@delorean:~$ lxc image info ubuntu:22.04
Fingerprint: b948dd91cd5a8da89f6dcd4949d7189f064cf6d4dc5bd70b7f9b7aff1883babf
Size: 435.08MB
Architecture: x86_64
Type: container
Public: yes
Timestamps:
Created: 2023/10/10 00:00 UTC
Uploaded: 2023/10/10 00:00 UTC
Expires: 2027/06/01 00:00 UTC
Last used: never
Properties:
os: ubuntu
release: jammy
version: 22.04
architecture: amd64
label: release
serial: 20231010
description: ubuntu 22.04 LTS amd64 (release) (20231010)
type: squashfs
Aliases:
- 22.04
- 22.04/amd64
- j
- j/amd64
- jammy
- jammy/amd64
- lts
- lts/amd64
- default
- default/amd64
Cached: no
Auto update: disabled
Profiles: []
One of my colleagues thought that perhaps ~ubuntu-core-dev/ubuntu/+source/systemd - [no description] could be affecting this, making the startup take longer? However, I saw the /var/run/nologin
being present six minutes after bootup, so if there was a timeout I would have expected it to expire by then.
systemctl status
output
$ lxc exec sure-macaw -- systemctl status
● sure-macaw
State: starting
Jobs: 21 queued
Failed: 3 units
Since: Mon 2023-10-16 06:08:57 UTC; 4min 24s ago
CGroup: /
├─.lxc
│ ├─904 systemctl status
│ └─905 less
├─init.scope
│ └─1 /sbin/init
└─system.slice
├─systemd-networkd.service
│ └─295 /lib/systemd/systemd-networkd
├─systemd-udevd.service
│ └─94 /lib/systemd/systemd-udevd
├─cron.service
│ └─329 /usr/sbin/cron -f -P
├─networkd-dispatcher.service
│ └─335 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
├─snap-snapd-20092.mount
│ └─104 snapfuse /var/lib/snapd/snaps/snapd_20092.snap /snap/snapd/20092 -o ro,nodev,allow_other,suid
├─systemd-journald.service
│ └─54 /lib/systemd/systemd-journald
├─unattended-upgrades.service
│ └─362 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
├─ssh.service
│ └─358 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
├─snapd.service
│ └─796 /usr/lib/snapd/snapd
├─snapd.seeded.service
│ └─525 /usr/bin/snap wait system seed.loaded
├─rsyslog.service
│ └─336 /usr/sbin/rsyslogd -n -iNONE
├─systemd-resolved.service
│ └─297 /lib/systemd/systemd-resolved
├─dbus.service
│ └─331 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
└─systemd-logind.service
└─340 /lib/systemd/systemd-logind
Any other logs/diagnostics we could provide? I’ll happily help debug this, but I think we’d need a pointer in what direction to look.