Hi, I have been playing with MicroCloud over the weekend as an experiment with 3 Raspberry Pi 4. Unfortunately, it does not seem to work properly with my setup, and I am having difficulties debugging things further. As I do not feel I have enough information to open an issue yet, I am starting this conversation to describe my setup and issues, and hopefully get some pointers from the community on how to debug things.
First of all, the hardware. I am running 3 raspberry pi 4 Model B (8 Gb of RAM each). They are running 22.04. For storage, they all run from a 64 Gb microSD card for the OS, and have a NVMe SSD attached over USB 3.0. They are connected over gigabit ethernet to my network, with three different VLANs. The interfaces are as follow:
eth0: 10.10.0.x/24 # used for management
homelab: 10.10.10.x/24 # used in this case for mDNS
ovn: unconfigured # attached to subnet 10.10.11.0/24, with router listening on 10.10.11.1
I initially tried to initialize microcloud but failed because of missing modules in the kernel. Installing the kernel from -proposed
solved that issue. I was then able to initialize properly. I have not configured local storage, and configured the external NVMe SSD for Ceph. I used the homelab
interface for MicroCloud mDNS and configured OVN with the ovn
interface, with the gateway being 10.10.11.1/24
, with an available range of 10.10.11.51-10.10.11.254
.
I was able to start a container instance of 22.04, but for some reason it seemed to have frozen, and I was not able to access the console logs, the terminal or exec anything into it. Trying to stop the container also failed silently, even if using --force
. I was able to get rid of it by rebooting all the nodes, and then deleting the now stopped container.
I was then able to start a new container, and was able to use the terminal for a bit. I tried using rockcraft in destructive-mode in that container, and it seemed to be working. I was expecting that to take a long time and let it continue overnight. The following day, it was still going on, but the container was in a weird state. I am now not able to stop it, or execute anything in it again. I am also not able to start more containers, or vms.
I do not have much logs to go on, but sudo snap logs lxd
outputs this:
2023-11-27T13:53:35-05:00 lxd.daemon[2211]: time="2023-11-27T13:53:35-05:00" level=warning msg="Failed to retrieve network information via netlink" instance=u1 instanceType=container pid=3714 project=default
2023-11-27T13:53:35-05:00 lxd.daemon[2211]: time="2023-11-27T13:53:35-05:00" level=error msg="Error calling 'lxd forknet" err="Failed to run: /snap/lxd/current/bin/lxd forknet info -- 3714 3: exit status 1 (Failed setns to container network namespace: No such file or directory)" instance=u1 instanceType=container pid=3714 project=default
2023-11-27T13:53:40-05:00 lxd.daemon[2211]: time="2023-11-27T13:53:40-05:00" level=error msg="Failed to retrieve PID of executing child process" instance=u1 instanceType=container project=default
2023-11-27T13:53:42-05:00 lxd.daemon[2211]: time="2023-11-27T13:53:42-05:00" level=error msg="Failed to retrieve PID of executing child process" instance=u1 instanceType=container project=default
2023-11-27T13:56:53-05:00 lxd.daemon[2211]: time="2023-11-27T13:56:53-05:00" level=warning msg="Failed to retrieve network information via netlink" instance=u1 instanceType=container pid=3714 project=default
2023-11-27T13:56:53-05:00 lxd.daemon[2211]: time="2023-11-27T13:56:53-05:00" level=error msg="Error calling 'lxd forknet" err="Failed to run: /snap/lxd/current/bin/lxd forknet info -- 3714 3: exit status 1 (Failed setns to container network namespace: No such file or directory)" instance=u1 instanceType=container pid=3714 project=default
2023-11-27T13:56:55-05:00 lxd.daemon[2211]: time="2023-11-27T13:56:55-05:00" level=error msg="Failed to retrieve PID of executing child process" instance=u1 instanceType=container project=default
2023-11-27T13:57:31-05:00 lxd.daemon[2211]: time="2023-11-27T13:57:31-05:00" level=error msg="Failed to retrieve PID of executing child process" instance=u1 instanceType=container project=default
2023-11-27T13:57:40-05:00 lxd.daemon[2211]: time="2023-11-27T13:57:40-05:00" level=warning msg="Failed to retrieve network information via netlink" instance=u1 instanceType=container pid=3714 project=default
2023-11-27T13:57:40-05:00 lxd.daemon[2211]: time="2023-11-27T13:57:40-05:00" level=error msg="Error calling 'lxd forknet" err="Failed to run: /snap/lxd/current/bin/lxd forknet info -- 3714 3: exit status 1 (Failed setns to container network namespace: No such file or directory)" instance=u1 instanceType=container pid=3714 project=default
So, I am at this point right now, where MicroCloud seems to report everything operational, but it is completely unusable. Any pointers on what I am doing wrong, or things I could check to make this work would be appreciated.