I have a 3-node LXD cluster where the workload is currently negligible (only a single VM has been commissioned so far). However, I’m seeing repeated warnings for both non-leader nodes in the UI:
Offline cluster memberFailed to send heartbeat request: Put “https://:8443/internal/database”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Questions:
-
Is there existing documentation or notes on investigating heartbeat timeouts in an LXD cluster?
-
Any recommended starting points for debugging (networking, DB sync, certs, time drift, etc)?
-
Could this be harmless under low load, or does it indicate an underlying connectivity issue?
Environment:
Cluster size: 3 nodes Nodes: 1 leader, 2 non-leaders Workload: minimal (1 VM)
Errors are intermittent and infrequent
Any advice, troubleshooting steps, or relevant references would be appreciated. Thanks!
Hi, the URL https://:8443/internal/database in the error message looks odd. Can you please share the output of snap list? Can you please also share the output of lxc config show --target <cluster member> for each of the cluster members. You can get a list of cluster members by running lxc cluster list.
I made some configuration changes last last week, the error warnings now include IPs of the two nodes which are not the leader, they continue to increment in count despite being marked as resolved.
Failed to send heartbeat request: Put "https://94.228.70.112:8443/internal/database": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Failed to send heartbeat request: Put "https://94.228.70.111:8443/internal/database": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
$ snap list
Name Version Rev Tracking Publisher Notes
core20 20251031 2686 latest/stable canonical✓ base
core22 20251125 2216 latest/stable canonical✓ base
core24 20251210 1267 latest/stable canonical✓ base
lxd 5.21.4-9eb1368 36971 5.21/stable canonical✓ -
microceph 19.2.1+snap8295212ceb 1601 squid/stable canonical✓ -
microcloud 2.1.2-9cfe3d5 1977 2/stable canonical✓ -
microovn 24.03.6+snap22fe3c6c02 972 24.03/stable canonical✓ -
snapd 2.73 25935 latest/stable canonical✓ snapd
$ lxc config show --target mc01
config:
acme.agree_tos: “true”
acme.ca_url: ``https://acme-v02.api.letsencrypt.org/directory
acme.domain: REDACTED
acme.email: REDACTED
cluster.https_address: 94.228.70.110:8443
core.https_address: ‘[::]:8443’
core.proxy_http: ``http://REDACTED:9999
core.proxy_https: ``http://REDACTED:9999
user.microcloud: 2.1.2
$ lxc config show --target mc02
config:
acme.agree_tos: “true”
acme.ca_url: ``https://acme-v02.api.letsencrypt.org/directory
acme.domain: REDACTED
acme.email: REDACTED
cluster.https_address: 94.228.70.111:8443
core.https_address: ‘[::]:8443’
core.proxy_http: ``http://REDACTED:9999
core.proxy_https: ``http://REDACTED:9999
storage.backups_volume: local/backups
storage.images_volume: local/images
user.microcloud: 2.1.2
$ lxc config show --target mc03
config:
acme.agree_tos: “true”
acme.ca_url: ``https://acme-v02.api.letsencrypt.org/directory
acme.domain: REDACTED
acme.email: REDACTED
cluster.https_address: 94.228.70.112:8443
core.https_address: ‘[::]:8443’
core.proxy_http: ``http://REDACTED:9999
core.proxy_https: ``http://REDACTED:9999
user.microcloud: 2.1.2
Thanks,
George
Have you unset the cluster.https_addressconfiguration key in the first place? Because that should not be possible through the LXD API post cluster setup.
What does microcloud status report? Can none of the members see each other, meaning do you see the same errors in the logs of the other LXD cluster members?
microcloud status
Status: HEALTHY
┌──────┬───────────────┬──────┬─────────────────┬────────────────────────┬────────┐
│ Name │ Address │ OSDs │ MicroCeph Units │ MicroOVN Units │ Status │
├──────┼───────────────┼──────┼─────────────────┼────────────────────────┼────────┤
│ mc01 │ 94.228.70.110 │ 1 │ mds,mgr,mon │ central,chassis,switch │ ONLINE │
│ mc02 │ 94.228.70.111 │ 1 │ mds,mgr,mon │ central,chassis,switch │ ONLINE │
│ mc03 │ 94.228.70.112 │ 1 │ mds,mgr,mon │ central,chassis,switch │ ONLINE │
└──────┴───────────────┴──────┴─────────────────┴────────────────────────┴────────┘
root@mc01:/home/george.horsley# journalctl -fu snap.lxd.daemon
Hence my confusion, the error suggest that none of the nodes can communicate with eachother.
I am not sure abut unsetting the cluster.https_address key - I ran up a single node then added the other two later on.
Currently the leader reports more of the same:
Jan 19 11:10:42 mc01 lxd.daemon[3382]: time=“2026-01-19T11:10:42Z” level=warning msg=“Failed heartbeat” err=“Failed to send heartbeat request: Put "``https://94.228.70.112:8443/internal/database\”:`` context deadline exceeded (Client.Timeout exceeded while awaiting headers)" remote=“94.228.70.112:8443”
Jan 19 11:19:35 mc01 lxd.daemon[3382]: time=“2026-01-19T11:19:35Z” level=warning msg=“Failed heartbeat” err=“Failed to send heartbeat request: Put "``https://94.228.70.111:8443/internal/database\”:`` context deadline exceeded (Client.Timeout exceeded while awaiting headers)" remote=“94.228.70.111:8443”
the other two nodes:
MC02 logs infrequent:
Jan 19 10:32:07 mc02 lxd.daemon[8731]: time=“2026-01-19T10:32:07Z” level=warning msg=“Excluding offline member from DNS peers refresh” ID=2 address=“94.228.70.112:8443” driver=bridge lastHeartbeat=“2026-01-19 10:31:34.085979489 +0000 UTC” network=lxdfan0 project=default raftID=2
as does MC03
Jan 19 10:30:51 mc03 lxd.daemon[222607]: time=“2026-01-19T10:30:51Z” level=warning msg=“Excluding offline member from DNS peers refresh” ID=1 address=“94.228.70.111:8443” driver=bridge lastHeartbeat=“2026-01-19 10:30:26.462210113 +0000 UTC” network=lxdfan0 project=default raftID=1
In which cadence/how often do you see those errors? I wonder if you can see something similar in snap logs microcloud on any of the cluster members mc01-3.
On average it is every couple of minutes. It varies though, which would indicate it works occasionally.
snap logs microcloud
The last entry on each for each node is the 16th, nothing has been logged since.
Can you rule out that there is an issue on your infrastructure preventing the connections between the LXD cluster members? Maybe some rule affecting port 8443? Both LXD and the Micro* services to regular heartbeats so if it’s a general problem we should see similar messages in all of the services logs.
Sorry for the slow reply,
Based on multiple packet captures taken on both nodes, we are confident there is nothing preventing network connectivity between the cluster members. Bidirectional TCP traffic on port 8443 is observed, TLS handshakes complete successfully, and the LXD API responds correctly.
Something else I just wondered whilst checking the logs you have provided is that you are using a lxdfan0network in LXD but your microcloud status output indicates that you have OVN installed.
Is this intentional? Because it might be that there was an error during installation/setup, so MicroCloud fell back to FAN networking.
Do you still have the logs available from the MicroCloud installation? Especially the questionnaire would be interesting.
No, it wasn’t intentional to use the FAN network.
The lxdfan0 network was created during the initial setup. I’ve since removed the network.
This is currently a single-link cluster (one network interface handling cluster + storage traffic), so there is no dedicated storage network in this setup.
Unfortunately I don’t have the original MicroCloud installation logs or questionnaire output anymore.
It is clear that nodes are leaving and rejoining the cluster at regular rate ~ 10mins, occasionally when performing operations they will fail as the node hosting the instance will leave the cluster