Hi, I’ve been having some problems lately.
Some node of our LXD cluster hang up and recover automatically after several hours.
I found error “Dqlite: attempt 1: server 172.16.0.20:8443: no known leader” in /var/snap/lxd/common/lxd/logs/lxd.log
.
How can I check why lxd throw error “no known leader”? Can I list the leader changelog?(I can list current leader by lxd sql local "SELECT * FROM raft_nodes"
, but I don’t know how to list the raft list).
At the same time, I found many error like:
time="2024-03-24T16:11:39Z" level=warning msg="Failed to retrieve network information via netlink" instance=shpc-49125-instance-LYN8YRcY instanceType=container pid=1829414 project=default
time="2024-03-24T16:11:39Z" level=error msg="Error calling 'lxd forknet" err="Failed to run: /snap/lxd/current/bin/lxd forknet info -- 1829414 3: exit status 1 (Failed setns to container network namespace: No such file or directory)" instance=shpc-49125-instance-LYN8YRcY instanceType=container pid=1829414 project=default
When cluster recover automatically, these error disappear.
Any idea to debug?