Unable to start LXD cluster because dqlite certs expired

I haven’t used my test cluster for a while. Wanted to do some testing today and the LXD daemon fails to start because dqlite cannot start because the certificate is expired.

these lines spam in my logs until the daemon exits and it starts over:

Oct 07 09:23:11 lxd-candidate lxd.daemon[2136]: time="2025-10-07T09:23:11+02:00" level=warning msg="Dqlite: attempt 10: server 192.168.3.197:8443: dial: Failed connecting to HTTP endpoint \"192.168.3.197:8443\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-10-07T09:23:11+02:00 is after 2025-07-16T23:59:59Z"
Oct 07 09:23:12 lxd-candidate lxd.daemon[2136]: 2025/10/07 09:23:12 http: TLS handshake error from 192.168.3.197:33786: remote error: tls: bad certificate
Oct 07 09:23:12 lxd-candidate lxd.daemon[2136]: time="2025-10-07T09:23:12+02:00" level=warning msg="Dqlite: attempt 11: server 192.168.3.197:8443: dial: Failed connecting to HTTP endpoint \"192.168.3.197:8443\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-10-07T09:23:12+02:00 is after 2025-07-16T23:59:59Z"

How do I get out of this loop? It is a test cluster so no real data is lost besides a lot of time to set it up again. But I find it scary that I am not able to do anything but watch these logs repeat themself.

Hi @vosdev

Did you use a custom cluster certificate or ACME issues certificate with a shorter expiry than the default self-signed ones generated by LXD?

Please can you also provide the output of snap list lxd.

Thanks
Tom

Also would you mind opening a GH issue about this so we can investigate a possible fix.

Thanks!

Now that you mention it, yes I did use a custom ACME wildcard certificate for the https UI endpoint.

root@lxd-candidate:~# snap list lxd
Name  Version      Rev    Tracking  Publisher   Notes
lxd   git-8f6b7a7  36090  6/edge    canonical✓  -

I can move this to a GH issue :slight_smile:

1 Like

Right so this sounds like an issue where if the cluster has been off for a period of time or for some other reason isn’t able to renew the ACME certificate, then if it expires, the cluster is not able to to function sufficiently to allow auto-update of the certificate when it comes back online.

As each member of the LXD cluster knows what the cluster certificate should be, we should be able to continue trusting it, even if its expired, for intra-cluster communication, allowing the cluster to continue to operate to allow the renewal to be applied.

When using a manual certificate, we may need to add a --force option to lxc cluster update-certificate to allow the update to proceed even if the client detects the cluster’s cert has expired.

You should be able to change the certificate and associated key file by replacing on every member:

/var/snap/lxd/common/lxd/{cluster.crt,cluster.key}

And then reloading LXD sudo systemctl reload snap.lxd.daemon.

1 Like

Yes, right now the lxc cluster update-certificate command hangs.

But this means that this is also a problem if the certificate expired yesterday. I wonder what happens if the certificate expires while the cluster is running.

It’s been around for so long but previously barely any people used custom certificates but now with the UI a lot of people do. :slight_smile:

I will replace them manually as we do with a standalone node. I thought some database stuff had to happen and that is why we had to use lxc cluster update-certificate

Thank you, my (single node) cluster is now up and running again :slight_smile: I will make a github issue for this problem.

I wonder what happens if the certificate expires while the cluster is running.

I think this is something that needs to be tested hehe.

1 Like

For the cluster certificate change no DB changes are needed.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.