I haven’t used my test cluster for a while. Wanted to do some testing today and the LXD daemon fails to start because dqlite cannot start because the certificate is expired.
these lines spam in my logs until the daemon exits and it starts over:
Oct 07 09:23:11 lxd-candidate lxd.daemon[2136]: time="2025-10-07T09:23:11+02:00" level=warning msg="Dqlite: attempt 10: server 192.168.3.197:8443: dial: Failed connecting to HTTP endpoint \"192.168.3.197:8443\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-10-07T09:23:11+02:00 is after 2025-07-16T23:59:59Z"
Oct 07 09:23:12 lxd-candidate lxd.daemon[2136]: 2025/10/07 09:23:12 http: TLS handshake error from 192.168.3.197:33786: remote error: tls: bad certificate
Oct 07 09:23:12 lxd-candidate lxd.daemon[2136]: time="2025-10-07T09:23:12+02:00" level=warning msg="Dqlite: attempt 11: server 192.168.3.197:8443: dial: Failed connecting to HTTP endpoint \"192.168.3.197:8443\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-10-07T09:23:12+02:00 is after 2025-07-16T23:59:59Z"
How do I get out of this loop? It is a test cluster so no real data is lost besides a lot of time to set it up again. But I find it scary that I am not able to do anything but watch these logs repeat themself.
Right so this sounds like an issue where if the cluster has been off for a period of time or for some other reason isn’t able to renew the ACME certificate, then if it expires, the cluster is not able to to function sufficiently to allow auto-update of the certificate when it comes back online.
As each member of the LXD cluster knows what the cluster certificate should be, we should be able to continue trusting it, even if its expired, for intra-cluster communication, allowing the cluster to continue to operate to allow the renewal to be applied.
When using a manual certificate, we may need to add a --force option to lxc cluster update-certificate to allow the update to proceed even if the client detects the cluster’s cert has expired.
Yes, right now the lxc cluster update-certificate command hangs.
But this means that this is also a problem if the certificate expired yesterday. I wonder what happens if the certificate expires while the cluster is running.
It’s been around for so long but previously barely any people used custom certificates but now with the UI a lot of people do.
I will replace them manually as we do with a standalone node. I thought some database stuff had to happen and that is why we had to use lxc cluster update-certificate