I just purchased a couple of used Lenovo P510’s to experiment with LXD. Unfortunately, once I create a cluster and launch a couple of containers, the workstations become utterly unusable. I was able to capture output from dmesg, and this is the (I think) problem logs (the “cut here” notation helps!). Hoping somebody can point me in the correct direction.
Latest BIOS from Lenovo (5/24/2022 IIRC), latest Ubuntu (24.04.1) and latest LXD snap (5.21.2). Kernel is 6.8.0-48. (Workstation locked up at that point – so I have like 30 seconds, tops.)
Vanilla Ubuntu ran all day, no issues. The snap install and ZFS didn’t seem to cause issues, but I didn’t let it sit. I think it was either the cluster join, or maybe launching the containers that initiated the bad behavior.
Thanks for the quick reply. I’ll try with btrfs, and no fan (have no clue if I created one or not).
Follow up questions:
Would I be better off using Ubuntu 22.04 or even jumping ahead to 24.10? This is a home project (I have admittedly weird hobbies ) … so I’m not tied to anything specifically.
I’m pretty sure that 90% of users of this forum (including myself) share your hobby
yes, it will be better, but only if you won’t be using HWE kernel (which is the same as being shipped with Ubuntu Noble). As this is not an issue of Ubuntu Noble itself, but a kernel+ZFS incompatibility-related issue. As Ubuntu 22.04 uses old kernel 5.15 it has no that problem and works reliably with ZFS.
Cool. That seems to be the route I’m moving towards. Bridge doesn’t communicate between the servers. Fan does. Likely want to confine the default network tough… 16 million IP addrs is a bit unwieldy! (240.0.0.0/8)
I don’t have enough machines/network ports to get MicroOVN to work automagically (nor am I network savvy enough to have the patience to figure it out).
-Rob
I found that ZFS isn’t super stable/reliable for Ubuntu 22.04 (although it seems pretty good in 24.04). So I switched over to btrfs and turned off quotas (based on the LXD docs) since what I’m working with is VMs. As far as funky disk stuff, this seems to be stable - I haven’t seen anything resembling the unable to find zvol errors I was getting.
What I’m now finding is that the Ubuntu Fan seems to have a bunch of “hiccups”. For instance:
Error: Action Failed get_task: Task e0c8dfc6-a50e-4ad9-474e-f412627cded1 result: Preparing apply spec: Preparing package nats-v2-migrate: Fetching package blob: Getting blob from inner blobstore: Getting blob from inner blobstore: Shelling out to bosh-blobstore-dav cli: Running command: ‘bosh-blobstore-dav -c /var/vcap/bosh/etc/blobstore-dav.json get a630f86c-4678-4f3e-94de-2774ea4ca362 /var/vcap/data/tmp/bosh-blobstore-externalBlobstore-Get3073628143’, stdout: ‘Error running app - read tcp 240.4.0.20:48154->240.4.0.4:25250: read: connection reset by peer’, stderr: ‘’: exit status 1
I get random connection reset by peer messages. Note than on my single host LXD setup (ZFS, Ubuntu 24.04, just a network bridge), I don’t think I’ve ever seen that occur.
My only assumption is that this is something with the fan. Thoughts?
Current versions, and this is a two node cluster where hydra2 is identical:
$ uname -a
Linux hydra1 5.15.0-126-generic #136-Ubuntu SMP Wed Nov 6 10:38:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy
$ lxc version
Client version: 5.21.2 LTS
Server version: 5.21.2 LTS
I think I figured it out. (This is all automated to some degree, so sometimes it’s a process of discovery…)
If I let the BOSH VM assign it’s own IP, the routing isn’t quite correct for the Ubuntu fan. However, since it’s a managed network, I can configure everything to work via DHCP (I “sneakily” just set the ipv4.address to what is expected). That 3rd route (begins 240.4.0.1) didn’t exist before being sneaky.
bosh/0:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc mq state UP group default qlen 1000
link/ether 00:16:3e:38:51:cb brd ff:ff:ff:ff:ff:ff
altname enp5s0
inet 240.4.0.4/8 metric 1024 brd 240.255.255.255 scope global dynamic eth0
valid_lft 3085sec preferred_lft 3085sec
bosh/0:~# ip route
default via 240.4.0.1 dev eth0 proto dhcp src 240.4.0.4 metric 1024
240.0.0.0/8 dev eth0 proto kernel scope link src 240.4.0.4 metric 1024
240.4.0.1 dev eth0 proto dhcp scope link src 240.4.0.4 metric 1024