Many thanks for the detailed explanations!
Unfortunately, our servers do not have a spare physical interface with external connectivity, hence I am looking for workarounds.
Each of our cluster nodes has two NICs:
- One NIC, say
eth0
is assigned with a static (public) IPv4 address, providing (external) internet connectivity.
- Another NIC, say
eth1
is connected to a (100GBit) switch, not connected to the outside network (only connecting the cluster nodes).
Currently I have assigned the second NIC for the microcloud internal traffic (lxd + ceph + ovn) by configuring static IPs on one of the VLANs:
network:
version: 2
ethernets:
eth1:
dhcp4: false
dhcp6: false
vlans:
lxd-vlan10:
id: 10
link: eth1
mtu: 9600
addresses:
- 10.0.1.x/24
Since I cannot use the physical NIC eth0
for the OVN uplink network, I would either need to use the NIC eth1
(with a different VLAN) or a bridge.
I tried creating another VLAN interface for eth1
and using it for the OVN uplink, which was accepted in the setup process, but my containers would not be able to reach the external network (ping 8.8.8.8
). I guess this is because my second network is isolated. The containers could ping each other without problems.
If I create a bridge, then (I guess) I need to use the same gateway as for the external interface eth0
, but I do not have any spare (public) IP addresses to be assigned for ipv4.ovn.ranges
(which, as I understand, need to be on the same subnet as the gateway).
Hence I was interested in the above-mentioned tutorial that used an LXD-managed bridge as the OVN uplink interface, without using any physical interface. This is the only solution that worked for me so far: the containers could ping each other as well as the external network.
I have now tested the failover for virtual switches, and so far I have not noticed any problems. Here is what I tried:
My configuration:
# lxc network show lxdbr0 --target node1
name: lxdbr0
description: ""
type: bridge
managed: true
status: Created
config:
ipv4.address: 10.0.2.1/24
ipv4.dhcp: "false"
ipv4.nat: "true"
ipv4.ovn.ranges: 10.0.2.101-10.0.2.254
ipv6.address: fd42:69a:afe:aafa::1/64
ipv6.nat: "true"
used_by:
- /1.0/networks/default
- /1.0/networks/ovn1
locations:
- node1
- node2
- node3
- node4
Note that dhcp is disabled. It does not look like it is being used. I guess, OVN itself assigns the IP addresses to virtual switches. Just like in the original tutorial, the bridge also does not have any parent (physical) interface.
Here is my OVN network, which is used in the default
profile:
# lxc network show default
name: default
description: ""
type: ovn
managed: true
status: Created
config:
bridge.mtu: "1500"
ipv4.address: 10.128.134.1/24
ipv4.nat: "true"
ipv6.address: fd42:5964:943:bad9::1/64
ipv6.nat: "true"
network: lxdbr0
volatile.network.ipv4.address: 10.0.2.101
volatile.network.ipv6.address: fd42:69a:afe:aafa:216:3eff:fe20:e45a
used_by:
- /1.0/instances/c1
- /1.0/instances/c2
- /1.0/instances/c3
- /1.0/instances/c4
- /1.0/profiles/default
locations:
- node1
- node2
- node3
- node4
I can only ping 10.0.2.101
from node3
, so I guess this means that the virtual switch for the OVN network default
is currently running on this node.
root@node3:~# ping -c 3 10.0.2.101
PING 10.0.2.101 (10.0.2.101) 56(84) bytes of data.
64 bytes from 10.0.2.101: icmp_seq=1 ttl=254 time=0.922 ms
64 bytes from 10.0.2.101: icmp_seq=2 ttl=254 time=0.759 ms
64 bytes from 10.0.2.101: icmp_seq=3 ttl=254 time=0.677 ms
--- 10.0.2.101 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2019ms
rtt min/avg/max/mdev = 0.677/0.786/0.922/0.101 ms
Now, while on node3
I turn off the microovn switch service:
root@node3:~# microovn disable switch
Service switch disabled
After that the virtual switch becomes unreachable:
root@node3:~# ping -c 3 10.0.2.101
PING 10.0.2.101 (10.0.2.101) 56(84) bytes of data.
From 10.0.2.1 icmp_seq=1 Destination Host Unreachable
From 10.0.2.1 icmp_seq=2 Destination Host Unreachable
From 10.0.2.1 icmp_seq=3 Destination Host Unreachable
--- 10.0.2.101 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2024ms
pipe 3
But now I can ping it on node1
:
root@node1:~# ping -c 3 10.0.2.101
PING 10.0.2.101 (10.0.2.101) 56(84) bytes of data.
64 bytes from 10.0.2.101: icmp_seq=1 ttl=254 time=2.07 ms
64 bytes from 10.0.2.101: icmp_seq=2 ttl=254 time=0.691 ms
64 bytes from 10.0.2.101: icmp_seq=3 ttl=254 time=0.750 ms
--- 10.0.2.101 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.691/1.169/2.067/0.635 ms
So I guess, that means that the failover is working?
All containers on node1
, node2
and node4
can ping each other and the external network (8.8.8.8
).
Only the containers on node3
lost the network connectivity. I do not know if it is to be expected.
Sorry for a long reply.