Failing to set up DNS with OVN network zone AXFR transfer

fjraudez · October 22, 2025, 3:15pm

Hello everyone,

I’m trying to set up an authoritative secondary DNS server (using NSD) inside a container on a MicroCloud cluster. My goal is to have this container pull the zone information from LXD’s built-in DNS server via an AXFR transfer for an OVN network. I’m running into a connectivity issue and would appreciate some guidance.

My Environment:

LXD cluster managed by MicroCloud 4 nodes
A default OVN network is configured as follows:
default | ovn | YES | 10.94.219.1/24

Configuration Steps:

I have created the forward and reverse network zones:

    # lxc network zone list
+------------------------+-------------+---------+
|          NAME          | DESCRIPTION | USED BY |
+------------------------+-------------+---------+
| 219.94.10.in-addr.arpa |             | 1       |
+------------------------+-------------+---------+
| lxd.utn.ac.cr          |             | 1       |
+------------------------+-------------+---------+

The zones are configured with a peer (ns1) which has a static IP of 10.94.219.5:

    # lxc network zone show lxd.utn.ac.cr
config:
  dns.nameservers: ns1.lxd.utn.ac.cr
  peers.ns1.address: 10.94.219.5

I have enabled LXD’s built-in DNS server to listen on the OVN network’s gateway IP and a non-standard port:

    # lxc config show
config:
  core.dns_address: 10.94.219.1:8853

Inside the ns1 container, NSD is configured to request the zone transfer from the OVN gateway:

    # cat /etc/nsd/nsd.conf.d/server.conf
server:
  ip-address: 10.94.219.5

zone:
  name: "lxd.utn.ac.cr"
  request-xfr: AXFR 10.94.219.1@8853 NOKEY

zone:
  name: "219.94.10.in-addr.arpa"
  request-xfr: AXFR 10.94.219.1@8853 NOKEY

The Problem:

The zone transfer fails. When I query the ns1 server, I get a SERVFAIL response, as expected since it never received the zone data.

    # host ns1.lxd.utn.ac.cr 10.94.219.5
Host ns1.lxd.utn.ac.cr not found: 2(SERVFAIL)

To diagnose this, I checked the network connectivity from the ns1 container to the OVN gateway IP on the AXFR port. The connection fails (it times out with no response).

    # lxc exec ns1 -- nc -zv 10.94.219.1 8853
(Command hangs and eventually fails)

This appears to be the root of the problem: the container cannot reach the OVN gateway IP on the port specified in core.dns_address.

logs → sudo journalctl -u snap.lxd.daemon -f

Oct 22 14:58:27 m1 lxd.daemon[333282]: time=“2025-10-22T14:58:27Z” level=error msg=“Failed to bind TCP DNS address "10.94.219.1:8853": listen tcp 10.94.219.1:8853: bind: cannot assign requested address”
Oct 22 14:58:30 m1 lxd.daemon[333103]: => LXD is ready

if i change de config to accept for anyware ip
config:
cluster.https_address: 10.100.27.21:8443
core.dns_address: :8853

and add peers.local.address: 127.0.0.1 to test in the microcloud node

lxc network zone show lxd.utn.ac.cr
name: lxd.utn.ac.cr
description: “”
config:
dns.nameservers: ns1.lxd.utn.ac.cr
peers.local.address: 127.0.0.1
peers.ns1.address: 10.94.219.5
used_by:

/1.0/networks/default
project: “”

dig @127.0.0.1 lxd.utn.ac.cr AXFR -p 8853

; <<>> DiG 9.18.39-0ubuntu0.24.04.1-Ubuntu <<>> @127.0.0.1 lxd.utn.ac.cr AXFR -p 8853
; (1 server found)
;; global options: +cmd
lxd.utn.ac.cr. 3600 IN SOA lxd.utn.ac.cr. ns1.lxd.utn.ac.cr. 1761145829 120 60 86400 30
lxd.utn.ac.cr. 300 IN NS ns1.lxd.utn.ac.cr.
default.gw.lxd.utn.ac.cr. 300 IN A 10.94.219.1
cont-04.lxd.utn.ac.cr. 300 IN A 10.94.219.7
ns1.lxd.utn.ac.cr. 300 IN A 10.94.219.5

The lxd dns seems to work but there is no zone transfer from the ns1 container and it does not work from the external network.

Any help or clarification on the correct architecture for this would be greatly appreciated.

Thank you

jpelizaeus · October 23, 2025, 7:53am

Hi, likely the virtual address of the OVN network isn’t reachable by the host which is running LXD (which is expected). Therefore LXD cannot bind to this address to spawn the DNS server.

In your logs (snap logs lxd) you should see something like this:

2025-10-23T07:47:29Z lxd.daemon[2091]: time="2025-10-23T07:47:29Z" level=error msg="Failed to bind TCP DNS address \"10.138.128.1:1234\": listen tcp 10.138.128.1:1234: bind: cannot assign requested address"

This explains the various timeouts/failed connections you see.

Make sure to bind the DNS to an address available on each of the LXD cluster members. Then ensure that the traffic leaving the 10.94.219.0/24 network from your ns1 instance is routed properly so that it can reach the address the DNS is listening on.

fjraudez · October 23, 2025, 1:36pm

Thanks for responding, I will check out your recommendations.

pclerie · October 23, 2025, 8:43pm

I am running into the same issue, only with routable IPv6 addresses while trying to set up a V6 only network. The IPv6 network in question is routed via the LXD BGP server. Internet connectivity is fine as well as access to two other internal V6 subnets. No node on the cluster run dnsmasq and there are no sign in the logs of any attempt at loading it. The cluster is running LXD 5.21 latest/stable.

I did get network zones working on an independant single server. So I’m pretty sure my settings are correct as well as the routing on my nets. That server does have dnsmasq running and the LXD version 6.5 6/latest/stable.

I am actually very pleased to find this thread. It reassures me that I’m probably not doing something dumb on this one.

fjraudez · October 24, 2025, 12:32am

You are right
Oct 23 22:22:25 m1 lxd.daemon[20559]: time=“2025-10-23T22:22:25Z” level=error msg=“Failed to bind TCP DNS address "10.94.219.1:8853": listen tcp 10.94.219.1:8853: bind: cannot assign requested address”

the IP, which is a GW IP of the OVN, is not accessible from the host, so in a healthy cluster that network should be accessible from the nodes? Is that expected?

about → Make sure to bind the DNS to an address available on each of the LXD cluster members.

It refers to the NSD DNS, that is, the NS1 VM? Isn’t it assumed that the IP 10.94.219.5 lives on all members of the cluster, since it is an OVN network? or is there something I’m not understanding?

Then ensure that the traffic leaving the 10.94.219.0/24 network from your ns1 instance is routed properly so that it can reach the address the DNS is listening on.

ping is working
ping 10.94.219.1
PING 10.94.219.1 (10.94.219.1) 56(84) bytes of data.
64 bytes from 10.94.219.1: icmp_seq=1 ttl=254 time=4.28 ms
64 bytes from 10.94.219.1: icmp_seq=2 ttl=254 time=1.02 ms

but not access de port 8853

nc -vz 10.94.219.1 8853

I have added the address 127.0.0.1 to the pair to see if the lxd dns works
lxc network zone show lxd.utn.ac.cr
name: lxd.utn.ac.cr
description: “”
config:
dns.nameservers: ns1.lxd.utn.ac.cr
peers.local.address: 127.0.0.1
peers.ns1.address: 10.94.219.5

dig @127.0.0.1 lxd.utn.ac.cr AXFR -p 8853

; <<>> DiG 9.18.39-0ubuntu0.24.04.2-Ubuntu <<>> @127.0.0.1 lxd.utn.ac.cr AXFR -p 8853
; (1 server found)
;; global options: +cmd
lxd.utn.ac.cr. 3600 IN SOA lxd.utn.ac.cr. ns1.lxd.utn.ac.cr. 1761265447 120 60 86400 30
lxd.utn.ac.cr. 300 IN NS ns1.lxd.utn.ac.cr.
default.gw.lxd.utn.ac.cr. 300 IN A 10.94.219.1
ns1.lxd.utn.ac.cr. 300 IN A 10.94.219.5

If you have documentation on this, I would appreciate it.

fjraudez · October 24, 2025, 12:34am

Congratulations on your progress

I am using version 2.1.1 LTS of Microcloud

jpelizaeus · October 24, 2025, 7:46am

It’s the IP of the virtual router from within the virtual network. By default your host doesn’t have a route to it. You can check with ip route command.
What you might be able to reach (depending on routing) is the external facing IP of the virtual router of your OVN network. You can retrieve this address using the following command:

lxc network get <network name> volatile.network.ipv4.address

It’s the address the router of your OVN network uses on the UPLINK network.

core.dns_address is a cluster member specific setting. You have to set it for each of your cluster members as it instructs LXD to bind to a specific address/port on this specific member. You can check by running lxc config show on any of the other cluster members. If you haven’t configured it there it should be unset.
Please take not of this sentence in the docs:

Note that in a LXD cluster, the address may be different on each cluster member.
https://documentation.ubuntu.com/lxd/latest/howto/network_zones/#enable-the-built-in-dns-server

This is because you can of course ping the internal facing IP of your networks virtual router from within the network, but this doesn’t mean that you can do the same from the LXD host itself as it doesn’t know how to reach this network (no route).

You have confirmed that LXD cannot bind to this address (see the error message you have linked) so therefore you also cannot reach the port as there is nothing listening on this address/port.

fjraudez · October 24, 2025, 2:28pm

Thank you very much for responding.

I executed the command. → lxc config set core.dns_address <IP_DEL_NODO>:8853 on each node in the cluster.

subsequently I have added the IP as peers in the network zone

lxc network zone show lxd.utn.ac.cr
name: lxd.utn.ac.cr
description: “”
config:
dns.nameservers: ns1.lxd.utn.ac.cr
peers.ns1.address: 10.10.28.4 #this is the external dns nsd
used_by:

/1.0/networks/default
project: “”

in nsd server config

cat /etc/nsd/nsd.conf.d/server.conf
server:
ip-address: 10.10.28.4

zone:
name: “lxd.utn.ac.cr”
request-xfr: AXFR ip-node-1-cluster@8853 NOKEY
request-xfr: AXFR ip-node-2-cluster@8853 NOKEY
request-xfr: AXFR ip-node-3-cluster@8853 NOKEY
request-xfr: AXFR ip-node-4-cluster@8853 NOKEY

zone:
name: “219.94.10.in-addr.arpa”
request-xfr: AXFR ip-node-1-cluster@8853 NOKEY
request-xfr: AXFR ip-node-2-cluster@8853 NOKEY
request-xfr: AXFR ip-node-3-cluster@8853 NOKEY
request-xfr: AXFR ip-node-4-cluster@8853 NOKEY

journalctl -f
the zone is synchronized
Oct 24 14:09:06 base nsd[2635]: zone lxd.utn.ac.cr serial 1761314834 is updated to 1761314946
Oct 24 14:10:07 base nsd[2635]: zone 219.94.10.in-addr.arpa serial 1761314888 is updated to 1761315007

and dig resolve all the records in the 4 IPs of the nodes
dig @ip-node-x-cluster lxd.utn.ac.cr AXFR -p 8853

dig @10.100.27.21 lxd.utn.ac.cr AXFR -p 8853

; <<>> DiG 9.18.39-0ubuntu0.24.04.2-Ubuntu <<>> @10.100.27.21 lxd.utn.ac.cr AXFR -p 8853
; (1 server found)
;; global options: +cmd
lxd.utn.ac.cr. 3600 IN SOA lxd.utn.ac.cr. ns1.lxd.utn.ac.cr. 1761315103 120 60 86400 30
lxd.utn.ac.cr. 300 IN NS ns1.lxd.utn.ac.cr.
default.gw.lxd.utn.ac.cr. 300 IN A 10.94.219.1
cont-02.lxd.utn.ac.cr. 300 IN A 10.94.219.4
cont-03.lxd.utn.ac.cr. 300 IN A 10.94.219.6

The question is, is this correct? Is this the recommended architecture? Can DNS bind or NSD be external to the cluster? And if so,

The question is, does dig return the internal IPs of the containers? How do I access the services from outside? I want to find the utility.
I know port forwarding is possible. In this case, it would be to the floating IP, I suppose.

lxc network show default
name: default
description: “”
type: ovn
managed: true
status: Created
config:
bridge.mtu: “1442”
dns.zone.forward: lxd.utn.ac.cr
dns.zone.reverse.ipv4: 219.94.10.in-addr.arpa
ipv4.address: 10.94.219.1/24
ipv4.nat: “true”
network: UPLINK
volatile.network.ipv4.address: 10.100.28.150

lxc network forward show default 10.100.28.150
listen_address: 10.100.28.150
location: “”
description: “”
config: {}
ports:

description: “”
protocol: tcp
listen_port: “80”
target_port: “80”
target_address: 10.94.219.4 # this is a container with nginx

pclerie · October 24, 2025, 5:51pm

I have two terminals opened on one of my cluster’s node. On one of them I start LXD log monitoring:

$ lxc monitor --pretty --loglevel=info --type=logging

On the second terminal, first I cleared the current setting for core.dns_address, then added it again:

$ lxc config set core.dns_address=
$ lxc config set code.dns_addres=10.219.58.1:2053

There's no output.

But on the other terminal, this error immediately pops up:

ERROR  [2025-10-24T17:04:05Z] Failed to bind TCP DNS address "10.219.58.1:2053": listen udp 10.219.58.1:2053: bind: cannot assign requested address

Then I created a new project, a new OVN type network within that project. Then I went through the sequence without having created an instance or a network zone. Got the same error.

This is not a configuration error. On the other hand, other than this thread, I have not seen anything at all reported regarding this particular scenario. Every single reference to this error message claim there is a port number collision. That’s easy to test. Change the port number, and of course, the same error pops up.

I created a Github issue.

ykazakov · October 27, 2025, 6:18pm

I hope code was just a typo.

Where is the address 10.219.58.1 allocated? Does it exist on every cluster memeber? What does the following command return on your cluster hosts?

ip a | grep 10.219.58

In my setup, I have created an managed bridge with the address of core.dns_address without the parent network, just to make sure that this IP address exists on every host and they are isolated not to cause any conflict. Then each host can use this (local) address for DNS zone transfers.

Alternatively, if subnet 10.219.58.* is connected to a shared interface for your cluster nodes, you need to configure different (static) IP addresses on this network for each host and set code.dns_address accordingly for each cluster member: it is a member-specific key, just like cluster.https_address and core.https_address.

fjraudez · October 27, 2025, 6:32pm

Thank you very much, this is extremely helpful!

That is exactly how I have it configured. By setting the core.dns_address with the --target flag for each node’s real IP, my external BIND server is now able to successfully perform AXFR zone transfers from all members of the cluster. I really appreciate the clarification.

This brings up a conceptual question for me, though, and I’m curious about your experience and the intended scenarios for this feature.

My external BIND server (which lives on my LAN, e.g., 10.10.2.2) can now successfully fetch all the real container IP addresses from the overlay networks ( 10.94.219.x).

However, none of my other clients on that same LAN (like an app server at 10.50.1.20) can actually route to or access those container IPs. Those networks are completely private to the MicroCloud cluster. The only way for my LAN clients to access services is by using the OVN floating IPs (e.g., 10.100.28.150) that live on the uplink.

Given this, what is the intended functionality or common scenario for having an external BIND server pull these private, non-routable IPs? In what situations has this design proven useful for you?

ykazakov · October 28, 2025, 10:07am

In my specific case, I want to resolve DNS over multiple OVN networks. It is possible to connect OVN networks using peer relationships, but this does not enable DNS resolution across different networks. After publishing all OVN DNS zones to an external DNS server (e.g., BIND), one can use this DNS server on the uplink network using the key dns.nameservers, and this should provide DNS resolution on all OVN networks that use this uplink network.