Very Odd Behavior: Adding more than 7 devices via `lxc config edit` results in heavy breakage on VMs?

teward · November 27, 2023, 1:22am

I have discovered some VERY odd behavior on LXD (snapped).

When working with a pfsense VM (created via https://discuss.linuxcontainers.org/t/lxd-pfsense-vm-installation/12786), I have a total of 6 network interfaces and a disk definition. This works perfectly fine.

However, when I add a seventh device, another NIC on a different bridge, the system totally explodes, and “forgets” the fact the root disk even exists and ends up in a PXE boot loop because the system can’t detect a hard drive.

This is the config for the corresponding VM which breaks booting the VM:

architecture: x86_64
config:
  limits.cpu: "4"
  limits.memory: 4GB
  raw.qemu: -machine pc-q35-2.6
  security.secureboot: "false"
  volatile.cloud-init.instance-id: 811a6386-21e4-4298-83b5-9b15a618a7b3
  volatile.eth0.host_name: tape2da2bda
  volatile.eth1.host_name: tap9c4383bf
  volatile.eth2.host_name: tap4b04e580
  volatile.eth3.host_name: tapce87a570
  volatile.eth4.host_name: tap1d1251ad
  volatile.eth5.host_name: tap848cc5e4
  volatile.last_state.power: RUNNING
  volatile.uuid: 2bfa6377-365b-421a-bfaf-93264b96b043
  volatile.vsock_id: "22"
devices:
  eth0:
    hwaddr: 00:16:3e:df:ea:00
    name: eth0
    network: lxdbr0
    type: nic
  eth1:
    hwaddr: 00:16:3e:df:ea:01
    name: eth1
    network: lxdbr208
    type: nic
  eth2:
    hwaddr: 00:16:3e:df:ea:02
    name: eth2
    network: lxdbr209
    type: nic
  eth3:
    hwaddr: 00:16:3e:df:ea:03
    name: eth3
    network: lxdbr210
    type: nic
  eth4:
    hwaddr: 00:16:3e:df:ea:04
    name: eth4
    network: lxdbr211
    type: nic
  eth5:
    hwaddr: 00:16:3e:df:ea:05
    name: eth5
    network: lxdbr212
    type: nic
  eth6:
    hwaddr: 00:16:3e:df:ea:06
    name: eth6
    network: lxdbr213
    type: nic
  root:
    boot.priority: "15"
    path: /
    pool: zfs
    size: 32GB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

If we remove eth6 / lxdbr213 from the configuration, then the system boots properly, properly detecting the root disk. If we add eth6 in, though, the entire VM seems to forget any device after the seventh (eth6) and then causes problems.

On my system, I have this pfSense handling handoff and protection of web facing services within the LXD network. Anything going to one of my containers from outside the networks has to pass through the pfSense, and I need the seventh network adapter for the 7th public IP on this machine.

Unfortunately, SOMETHING about the environment does not want to allow me to add more than 7 devices and silently just forgets all other devices on it when added. And I have no idea what’s causing it.

Host system running LXD is Ubuntu 22.04 and running the snapped version, lxc version shows both client and server running 5.0.2.

tomp · November 28, 2023, 10:47am

Do you see the same issue when using ubuntu:22.04 as the image of the VM?
I’m wondering if this is something specific to BSD or whether it is more general.

Also, it would be worth trying the latest/stable channel of LXD (as long as you understand that you won’t be able to downgrade back to the 5.0/stable LTS channel you are on now), using:

sudo snap refresh lxd --channel=latest/stable

As this comes with quite a few changes to how NICs are configured in QEMU as well as a more recent version of QEMU too.

teward · January 8, 2024, 3:37pm

The issue happens before it even gets into the system, at the QEMU KVM level, it showed some odd log output.

Also, it would be worth trying the latest/stable channel of LXD (as long as you understand that you won’t be able to downgrade back to the 5.0/stable LTS channel you are on now), using:
sudo snap refresh lxd --channel=latest/stable
As this comes with quite a few changes to how NICs are configured in QEMU as well as a more recent version of QEMU too.

I haven’t upgraded the LXD snap yet because this is a production environment. I may nuke the VM and redo it entirely even.

Sorry for insane slow reply, I’ve had a ton of things the past few months for EOY and other cycles taking my attention.

tomp · January 17, 2024, 9:22am

I saw this post yesterday that seems related:

https://github.com/lxc/incus/issues/397#issuecomment-1894673037

Once I removed the qemu machine parameter the boot menu finds the disk and OS starts

teward · February 12, 2024, 1:50am

Yeah, I’ve noticed that and it booted up but still gave me some oddness.

I ended up saying “Screw it” for making the VM via LXD for control, and put virt-manager on the box, so I can do the “screw it, tried and true” method of virtual machine management. And that works without issue, so. shrugs

In either case, at least I have a workaround solution.