Oops: Changed bridge to unmanaged. Now LXD crashes

ian-weisser · March 28, 2025, 5:26pm

I have LXD running on Ubuntu Core on bare metal on a small LAN.

It’s convenient for my workflow to let the LAN router assign all IP addresses on the network, so the Ubuntu Core machine has a bridge that works properly…

network:
    version: 2
    renderer: networkd
    ethernets:
        enp2s0:
            dhcp4: no
    bridges:
        br0:
            dhcp4: yes
            interfaces:
                - enp2s0

After setting up the netplan bridge, during LXD init, I (recall) choosing to use the existing br0.

However, there was a problem: LXD’s dnsmasq starts faster than the router. So on LAN powerup, devices saw a DHCP server and started leasing 10.91.113.* addresses…ignoring the real router and losing network connectivity.

I can fix that: Disconnect that machine’s network cable for an hour, and then machines would properly lease from the router instead. Eventually the Ubuntu Core machine’s bridge would also lease it’s IP address from the router (good). I confirmed that dnsmasq is the culprit using /var/snap/lxd/common/lxd/networks/br0/dnsmasq.pid:

name: dnsmasq
args: [--keep-in-foreground, --strict-order, --bind-interfaces, --except-interface=lo,
  --pid-file=, --no-ping, --interface=br0, --dhcp-rapid-commit, --no-negcache, --quiet-dhcp,
  --quiet-dhcp6, --quiet-ra, --listen-address=10.91.113.1, --dhcp-no-override, --dhcp-authoritative,
  --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/br0/dnsmasq.leases, --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/br0/dnsmasq.hosts,
  --dhcp-range, '10.91.113.2,10.91.113.254,1h', '--listen-address=fd42:33d5:27a2:1616::1',
  --enable-ra, --dhcp-range, '::,constructor:br0,ra-stateless,ra-names', -s, lxd,
  --interface-name, '_gateway.lxd,br0', -S, /lxd/, --conf-file=/var/snap/lxd/common/lxd/networks/br0/dnsmasq.raw,
  -u, lxd, -g, lxd]
apparmor: lxd_dnsmasq-br0_</var/snap/lxd/common/lxd>
pid: 2896
uid: 0
gid: 0
set_groups: false
sysprocattr: null

I mistakenly thought this meant my LXD init was wrong: That I had two bridges (one netplan, one LXD), apparently sharing the same name.

So I went into sudo lxc network edit br0 and found that the network config showed managed = "true". I set both to “false”, save…

Ah, that was a mistake. Now ANY lxc command seems to crash the Ubuntu Core machine. No response from any ssh connection, and no further ssh attempts connect. Power-cycling brings the Ubuntu Core machine back up, available to ssh, and responsive to any command…except any lxc command which again promptly crashes the system.

What’s the smart way to restore LXD functionality on an Ubuntu Core system?
What’s the right way to prevent LXD’s dnsmasq from working across the bridge and hijacking the rest of the LAN in a dhcp environment?

If a reinstall of LXD or Ubuntu Core is needed, it would be great to preserve the container’s data without use of the lxc command.

tomp · March 31, 2025, 7:49am

Hi,

Which question in the lxd init process did you input br0 to?

What does /var/snap/lxd/common/lxd/logs/lxd.log show?

ian-weisser · March 31, 2025, 3:26pm

Would you like to create a new local network bridge? (yes/no) [default=yes]: No
I recall answering “no” because I already had a br0 set up. After setup, I added br0 to the default profile.
Setup was two months ago. I might misremember, being human.
LXD and containers and networking worked properly for two months (exception: the dnsmasq papercut), until I changed that config.

Complete output of sudo less /var/snap/lxd/common/lxd/logs/lxd.log (sudo is required on Ubuntu Core)

time="2025-03-31T10:17:08-05:00" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"

Only the one line. A few minutes after boot.

ian-weisser · April 7, 2025, 12:47am

Any suggestions on either…

Reverting the config, since the lxc command cannot be used.
Recovering data (or containers) from the pool --without use of the lxc command-- before reinstalling Ubuntu Core.

…would be welcome.

tomp · April 9, 2025, 1:09pm

I’m still not entirely following what you did. I think you’re saying you created a managed bridge network called br0, and added it to the default profile.

Or did you just specify br0 as the parent in the profile for a bridged NIC device?

The reason I ask is I can’t quite follow where LXD’s dnsmasq comes into it.

So I went into sudo lxc network edit br0 and found that the network config showed managed = "true" . I set both to “false”

This isn’t a property you can set, so I think its a red herring, but perhaps it triggered some other underlying problem.

Anyway, onto:

Reverting the config, since the lxc command cannot be used.

Creating a DB patch file is a possible solution:

https://documentation.ubuntu.com/lxd/en/latest/debugging/#running-custom-queries-at-lxd-daemon-startup

You can inspect the current database (in readonly mode) by doing:

sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin

tomp · April 9, 2025, 1:12pm

One thing is for certain, LXD does not support managing bridges that are also managed by external systems (netplan in this case). So whilst you can use an existing bridge interface as the parent for bridged NIC device (https://documentation.ubuntu.com/lxd/en/latest/reference/devices_nic/#nic-bridged) you should not be able to create a managed LXD bridge using the same of an existing unmanaged bridge.

As it will reconfigure the interface with a new static IP.

So it sounds like potentially the two management systems are fighting over each other.