I have fresh install of an LXD 5.21.4 cluster on physical hosts using microcloud with a preseed file. The clusluster uses an interal network:
lookup_subnet: 10.0.1.0/24
I am trying to obtain ACME sertificate using the acme.*
configuration keys and the HAproxy with the example config provided in the documentation:
# lxc config show
config:
acme.agree_tos: "true"
acme.ca_url: https://acme-staging-v02.api.letsencrypt.org/directory
acme.domain: node2.example.net
acme.email: my@email.com
cluster.https_address: 10.0.1.12:8443
core.https_address: '[::]:8443'
core.https_trusted_proxy: 10.0.1.12
instances.migration.stateful: "true"
network.ovn.northbound_connection: ssl:10.0.1.11:6641,ssl:10.0.1.13:6641,ssl:10.0.1.14:6641
user.microcloud: 2.1.1
Here are my cluster nodes:
# lxc cluster list
+-------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE | MESSAGE |
+-------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| node1 | https://10.0.1.11:8443 | database-leader | x86_64 | default | | ONLINE | Fully operational |
| | | database | | | | | |
+-------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| node2 | https://10.0.1.12:8443 | database-standby | x86_64 | default | | ONLINE | Fully operational |
+-------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| node3 | https://10.0.1.13:8443 | database | x86_64 | default | | ONLINE | Fully operational |
+-------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| node4 | https://10.0.1.14:8443 | database | x86_64 | default | | ONLINE | Fully operational |
+-------+------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
Upon trying to (re-)issue the sertificate (by changing acme keys) I obtain the error:
# lxc config unset acme.domain
# lxc config set acme.domain node2.example.net
Error: Failed to notify peer node1 at 10.0.1.11:8443: Put "https://10.0.1.11:8443/1.0": EOF
The LXD log shows the following messages right after updated configuration was sent to all cluster members (not copying because the config is too large):
location: node2
metadata:
context:
ip: 66.133.109.36:36851
url: /.well-known/acme-challenge/HbtigqBnLPHAq_yTU7M3Lq6tl5PlLmUokxGtdUf7c1Q
level: debug
message: Allowing untrusted GET
timestamp: "2025-09-11T14:29:27.609006324Z"
type: logging
location: node2
metadata:
context:
url: https://10.0.1.11:8443
level: debug
message: Connecting to a remote LXD over HTTPS
timestamp: "2025-09-11T14:29:27.621640394Z"
type: logging
location: node2
metadata:
context:
ip: 13.229.94.34:38162
url: /.well-known/acme-challenge/HbtigqBnLPHAq_yTU7M3Lq6tl5PlLmUokxGtdUf7c1Q
level: debug
message: Allowing untrusted GET
timestamp: "2025-09-11T14:29:28.844969483Z"
type: logging
location: node2
metadata:
context:
url: https://10.0.1.11:8443
level: debug
message: Connecting to a remote LXD over HTTPS
timestamp: "2025-09-11T14:29:28.857482946Z"
type: logging
location: node2
metadata:
context:
raftMembers: '[{{1 10.0.1.11:8443 voter} node1} {{2 10.0.1.13:8443 voter} node3}
{{3 10.0.1.14:8443 voter} node4} {{4 10.0.1.12:8443 stand-by} node2}]'
level: debug
message: Replace current raft nodes
timestamp: "2025-09-11T14:29:33.016031003Z"
type: logging
location: node2
metadata:
context:
fingerprint: 03b182c2dda539d788883af041f3989b90fe3dc5ee033f6f73abf7b2da9efdea
subject: CN=root@node1,O=LXD
level: debug
message: Matched trusted cert
timestamp: "2025-09-11T14:29:33.015816784Z"
type: logging
location: node2
metadata:
context:
local: 10.0.1.12:8443
name: dqlite
remote: 10.0.1.11:40906
level: info
message: Dqlite proxy stopped
timestamp: "2025-09-11T14:29:41.256171002Z"
type: logging
location: node2
metadata:
context:
listener: 1fbe1dd7-1e7c-4831-b204-8de975f1f8f5
local: 10.0.1.12:8443
remote: 10.0.1.11:40922
level: debug
message: Event listener server handler stopped
timestamp: "2025-09-11T14:29:41.256072234Z"
type: logging
location: node2
metadata:
context:
local: 10.0.1.12:48966
name: raft
remote: 10.0.1.11:8443
level: info
message: Dqlite proxy stopped
timestamp: "2025-09-11T14:29:41.256458036Z"
type: logging
location: node2
metadata:
context:
err: 'Failed to notify peer node1 at 10.0.1.11:8443: Put "https://10.0.1.11:8443/1.0":
EOF'
level: error
message: Failed to notify other members about config change
timestamp: "2025-09-11T14:29:41.256484546Z"
type: logging
location: node2
metadata:
context:
err: 'Failed to notify peer node1 at 10.0.1.11:8443: Put "https://10.0.1.11:8443/1.0":
EOF'
level: error
message: Failed to notify other members about config change
timestamp: "2025-09-11T14:29:41.256484546Z"
type: logging
location: node2
metadata:
context: {}
level: debug
message: 'Dqlite: EOF detected: call exec-sql (budget 9.999970569s): receive: header:
EOF'
timestamp: "2025-09-11T14:29:41.256742606Z"
type: logging
location: node2
metadata:
context: {}
level: debug
message: 'Dqlite: network connection lost: write tcp 10.0.1.12:48958->10.0.1.11:8443:
write: broken pipe'
timestamp: "2025-09-11T14:29:41.257002785Z"
type: logging
location: node2
metadata:
context: {}
level: debug
message: 'Dqlite: network connection lost: write tcp 10.0.1.12:48958->10.0.1.11:8443:
write: broken pipe'
timestamp: "2025-09-11T14:29:41.257156986Z"
type: logging
location: node2
metadata:
context: {}
level: debug
message: 'Dqlite: network connection lost: write tcp 10.0.1.12:48958->10.0.1.11:8443:
write: broken pipe'
timestamp: "2025-09-11T14:29:41.257275028Z"
type: logging
Followed by many similar “broken pipe” messages.
Repeated attemts to re-issue a certificate show similar errors but for different nodes.
With LXD 5.21.3 (same fresh setup) the same process finishes successfully. Here is the relevant part of LXD log for comparison:
location: node2
metadata:
context:
ip: 34.208.39.181:60360
url: /.well-known/acme-challenge/_KQ9NCVOIqpQTPbitr5lbj1pJ389oR3yhAzqxcDMRxw
level: debug
message: Allowing untrusted GET
timestamp: "2025-09-11T13:53:28.561107051Z"
type: logging
location: node2
metadata:
context:
url: https://10.0.1.11:8443
level: debug
message: Connecting to a remote LXD over HTTPS
timestamp: "2025-09-11T13:53:28.573394067Z"
type: logging
location: node2
metadata:
context:
fingerprint: 306bea6f374d7803d10abaaa7a33f75d4f644241166f22aa777e086adeebbb24
subject: CN=root@node1,O=LXD
level: debug
message: Matched trusted cert
timestamp: "2025-09-11T13:53:32.046090206Z"
type: logging
location: node2
metadata:
context:
fingerprint: 306bea6f374d7803d10abaaa7a33f75d4f644241166f22aa777e086adeebbb24
ip: 10.0.1.11:55870
method: PUT
protocol: cluster
url: /1.0/cluster/certificate
level: debug
message: Handling API request
timestamp: "2025-09-11T13:53:32.046171888Z"
type: logging
location: node2
metadata:
context: {}
level: info
message: 'http: TLS handshake error from 10.0.1.11:34562: remote error: tls: bad
certificate'
timestamp: "2025-09-11T13:53:34.178952287Z"
type: logging
location: node1
metadata:
context:
err: 'Failed to send heartbeat request: Put "https://10.0.1.12:8443/internal/database":
tls: failed to verify certificate: x509: certificate signed by unknown authority'
remote: 10.0.1.12:8443
level: warning
message: Failed heartbeat
timestamp: "2025-09-11T13:53:34.176864386Z"
type: logging
location: node1
metadata:
context:
err: 'Failed to send heartbeat request: Put "https://10.0.1.13:8443/internal/database":
tls: failed to verify certificate: x509: certificate signed by unknown authority'
remote: 10.0.1.13:8443
level: warning
message: Failed heartbeat
timestamp: "2025-09-11T13:53:34.911198392Z"
type: logging
I guess the errors “tls: failed to verify certificate” are due to the use of the staging let’s encrypt certificate (I already exhausted the rate limit for my domain while trying to debug the problem).
Also, is there a way to force re-issuing ACME certificate without changing the acme.*
keys? I noticed that after a while changing the keys does no longer trigger the certificate renewal.