Advertising OVN networks over BGP

Hey,

I managed to succesfully advertise my OVN networks over BGP but traffic is not routed from the host onto the OVN network.

<removed old information, new information in the first reply>

Some information about my setup for nmezhenskyi

I run a 3-node cluster LXD 6.4 that share a physical layer 2 network.

This L2 network is 192.168.3.0/24, has a bgp capable router on 192.168.3.1 and the three nodes run

  • 192.168.3.61
  • 192.168.3.62
  • 192.168.3.63

The goal is to create OVN networks with a /24 private IPv4 (Somewhere in 172.16.0.0/12) and a /64 public IPv6. The BGP capable router will be handling NAT and simply route the public IPv6 /64. LXD will not do any NAT at all.

We have Canonical k8s with Cilium/BGP running the exact same setup :slight_smile: fully routed dual stack, no NAT.

I have tried both physical and managed bridge type uplinks for the OVN networks. My preference is physical as it allows me to also use the uplink for my host. The use of physical uplinks is also assumed by the OVN setup guide.

The nodes have 2 physical links, one to be used for guest uplink and the other to be used for management, lxd and ceph cluster traffic. This is so that spikes in traffic do not impact the other. Wouldn’t want my guests without network if ceph is doing some rebalancing, or my storage to have impact if a client fully utilizes the uplink :slight_smile: .

As this is still a test setup I have not given them IPv6 yet but of course each node gets a static IPv6 when we move to production. I couldn’t get things to work to i simplified and temporarily reverted to IPv4 only.

network:
  version: 2
  ethernets:
    enp5s0:
      dhcp4: false
    enp6s0:
      dhcp4: false
  bridges:

    ## Guest uplink
    br0:
      dhcp4: false
      interfaces: [ enp5s0 ]

    ## Management, LXD and Ceph
    br1: 
      dhcp4: false
      interfaces: [ enp6s0 ]
      addresses:
        - 192.168.3.61/24
      routes:
        - to: default
          via: 192.168.3.1
1 Like

Hi @vosdev I’ll let @nikita-mezhenskyi reply re BGP setup, however one thing I noted in your setup notes is that you don’t need to setup a br0 bridge as use as the OVN uplink network, and instead you can specify enp5s0 as the OVN uplink interface directly.

Then LXD will create an OVS bridge for you and connect enp5s0 to it and then connect the bridge to OVN.

Otherwise currently in your setup the connectivity will be established as follows:

enp5s0 <-> br0 bridge <-veth pair-> OVS bridge <-> OVN

By specifying enp5s0 as the parent in your physical network definition in LXD you can remove the br0 bridge and the veth pair so it becomes:

enp5s0 <-> OVS bridge <-> OVN
2 Likes

Thanks! I will keep that in mind. I started out with a single interface setup so it was the only option because physical requires the interface to be fully unmanaged from the OS. If letting LXD fully handle the physical interface is better for latency/less complexity then i’m all for!

The OVN setup docs are based on a physical type uplink but at the same time mention they assume you use an unmanaged bridge setup.

Therefore, you must specify either an unmanaged bridge interface or an unused physical interface as the parent for the physical network that is used for OVN uplink. The instructions assume that you are using a manually created unmanaged bridge. See How to configure network bridges for instructions on how to set up this bridge.

And I have also tried it with a managed bridge setup, like so:

root@node1:~# lxc network create uplink0 --type=bridge --target node1
Network uplink0 pending on member node1
root@node1:~# lxc network create uplink0 --type=bridge --target node2
Network uplink0 pending on member node2
root@node1:~# lxc network create uplink0 --type=bridge --target node3
Network uplink0 pending on member node3
root@node1:~# lxc network set uplink0 bridge.external_interfaces enp6s0 --target node1
root@node1:~# lxc network set uplink0 bridge.external_interfaces enp6s0 --target node2
root@node1:~# lxc network set uplink0 bridge.external_interfaces enp6s0 --target node3
root@node1:~# lxc network create uplink0 --type=bridge \
  ipv4.address=192.168.3.1/24 \
  ipv4.nat=false \
  ipv6.address=none
Network uplink0 created

It seems there are 3 ways to rome :slight_smile:

Also, the following isn’t mentioned on the OVN setup guide

physical requires the interface to be fully unmanaged from the OS.

1 Like

Yes the benefit of using a (managed or unmanaged) bridge is that you can have the host present with an IP on the uplink network - which can be useful if you don’t have a spare physical port or VLAN/bond interface.

1 Like

@vosdev I see that you have set up BGP listener on the LXD host (192.168.3.197). Could you try doing that on a cluster node instead?

For example, on the 192.168.3.61 you could try this config:

lxc config set core.bgp_address=192.168.3.61
lxc config set core.bgp_asn=65197
lxc config set core.bgp_routerid=192.168.3.61

And then, have your BGP router ( 192.168.3.1) config updated to connect to the BGP listener on 192.168.3.61.

Please let me know whether this setup routes traffic to your OVN network correctly.

@vosdev please show output of lxc network show ovn25 as in your original post

You OVN network is not showing any volatile address keys, which would represent the IPs of the OVN router on your uplink network.

This looks problematic to me.

Hey, sorry for the confusion :frowning: . The information in the original post was from a different test standalone node (192.168.3.197)

All three nodes in my cluster have BGP set up and succesfully announce every lxd network configured on the cluster.

root@node1:~# lxc config show --target node1 | grep bgp
  core.bgp_address: 192.168.3.61
  core.bgp_asn: "65061"
  core.bgp_routerid: 192.168.3.61
root@node1:~# lxc config show --target node2 | grep bgp
  core.bgp_address: 192.168.3.62
  core.bgp_asn: "65061"
  core.bgp_routerid: 192.168.3.62
root@node1:~# lxc config show --target node3 | grep bgp
  core.bgp_address: 192.168.3.63
  core.bgp_asn: "65061"
  core.bgp_routerid: 192.168.3.63
root@node1:~# lxc query /internal/testing/bgp
{
        "peers": [
                {
                        "address": "192.168.3.1",
                        "asn": 65000,
                        "count": 1,
                        "holdtime": 0,
                        "password": ""
                }
        ],
        "prefixes": [
                {
                        "nexthop": "192.168.3.5",
                        "owner": "network_17",
                        "prefix": "172.16.5.0/24"
                },
                {
                        "nexthop": "192.168.3.6",
                        "owner": "network_19",
                        "prefix": "172.16.6.0/24"
                },
                {
                        "nexthop": "0.0.0.0",
                        "owner": "network_12",
                        "prefix": "192.168.3.0/24"
                }
        ],
        "server": {
                "address": "192.168.3.61",
                "asn": 65061,
                "router_id": "192.168.3.61",
                "running": true
        }
}
root@node1:~# lxc network ls
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
|   NAME   |   TYPE   | MANAGED |      IPV4       |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| br-int   | bridge   | NO      |                 |                           |             | 0       |         |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| enp5s0   | physical | NO      |                 |                           |             | 0       |         |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| enp6s0   | physical | NO      |                 |                           |             | 0       |         |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| lxdovn12 | bridge   | NO      |                 |                           |             | 0       |         |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| ovn5     | ovn      | YES     | 172.16.5.1/24   |                           |             | 4       | CREATED |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| ovn6     | ovn      | YES     | 172.16.6.1/24   | fd42:11ef:adee:d29f::1/64 |             | 0       | CREATED |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
| uplink0  | bridge   | YES     | 192.168.3.60/24 | none                      |             | 2       | CREATED |
+----------+----------+---------+-----------------+---------------------------+-------------+---------+---------+
root@node1:~# lxc network show ovn5
name: ovn5
description: ""
type: ovn
managed: true
status: Created
config:
  bridge.mtu: "1442"
  ipv4.address: 172.16.5.1/24
  ipv6.nat: "true"
  network: uplink0
  volatile.network.ipv4.address: 192.168.3.5
used_by:
- /1.0/instances/c1
- /1.0/instances/c2
- /1.0/instances/c3
- /1.0/profiles/ovn5
locations:
- node2
- node3
- node1
project: default
root@node1:~# lxc network show uplink0
name: uplink0
description: ""
type: bridge
managed: true
status: Created
config:
  bgp.peers.opnsense.address: 192.168.3.1
  bgp.peers.opnsense.asn: "65000"
  ipv4.address: 192.168.3.60/24
  ipv4.dhcp.ranges: 192.168.3.26-192.168.3.49
  ipv4.nat: "false"
  ipv4.ovn.ranges: 192.168.3.5-192.168.3.25
  ipv4.routes: 172.16.0.0/16
  ipv6.address: none
used_by:
- /1.0/networks/ovn5
- /1.0/networks/ovn6
locations:
- node1
- node2
- node3
project: default

image
(btw why is it also announcing the 192.168.3.0/24 network? :frowning: )

Please let me know whether this setup routes traffic to your OVN network correctly.

This volatile ipv4 address 192.168.3.5 is not reachable anywhere on my network. Not even from the lxd hosts themself.

You OVN network is not showing any volatile address keys, which would represent the IPs of the OVN router on your uplink network.

This looks problematic to me.

The ovn networks on my 3-node cluster do have this volatile address but it’s not reachable anywhere.

root@node1:~# ping 192.168.3.5
PING 192.168.3.5 (192.168.3.5) 56(84) bytes of data.
From 192.168.3.61 icmp_seq=1 Destination Host Unreachable
From 192.168.3.61 icmp_seq=2 Destination Host Unreachable

the uplink0 interface only has the 192.168.3.60/24 as configured in the LXD uplink0 network, not the 192.168.3.5 or 192.168.3.6 for ovn5 and ovn6

root@node1:~# ip a show uplink0
8: uplink0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:11:cb:75 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.60/24 scope global uplink0
       valid_lft forever preferred_lft forever

I hope to have cleared the confusion with my .197 node from the first post. If there is any more information you require i’ll be happy to provide!

1 Like

Because you’re using a managed bridge network for the uplink with bgp.peers settings, see:

https://documentation.ubuntu.com/lxd/latest/howto/network_bgp/#configure-next-hop-bridge-only

Oh! Another reason for going back to a bare physical link.

This is because you’re using a managed bridge network as the uplink.

This is arguably a bug in LXD, because its not an intended configuration setup to combine OVN BGP announcements and managed bridge networks.

Lets consider the configuration you have:

  1. Uplink gateway IP 192.168.3.1.
  2. 3x LXD members in a cluster connected to uplink network via enp6s0 using br1 bridge with subnet 192.168.3.0/24 (I think) with each member having a unique IP on that network (e.g. 192.168.3.61).
  3. A managed private bridge called uplink0 that also has the range 192.168.3.0/24, with every cluster member having the same address 192.168.3.60.
  4. This will likely cause conflicting automatic routes on your LXD hosts, as you effectively have two routes to 192.168.3.0/24, one via the external uplink interface br0 and one into the managed bridge uplink0.
  5. The managed bridge uplink0 announces routes to its own subnet using its peer source address as the next hop (because each cluster member should have its own address on the remote router’s network).
  6. The OVN network is then connected to the uplink0 private bridge, which then causes route announcements to the OVN network’s volatile address on its uplink (uplink0 rather than br1).
  7. Because of the use of the private managed bridge, neighbour resolution cannot occur and so OVN network’s volatile addresses remain unreachable from the br1 network.
2 Likes

Are enp5s0 and enp6s0 connected to the same physical network?

My recommendation would be to simplify.

If enp5s0 and enp6s0 are both connected to the same physical network segment, then you have a couple of choices:

  1. Dedicated management/cluster on one interface, and an unnumbered interface exclusively for OVN (no netplan bridges needed). Then create an uplink physical network in LXD that uses that unnumbered interface. LXD will setup an OVS bridge for you and connect the unnumbered interface and OVN together.
  2. Create a netplan bond of enp5s0 and enp6s0 for redundancy and then create a netplan bridge ontop of that (ideally an OVS bridge to avoid the additional veth pair interconnect) and use it both for management/cluster and as an OVN uplink. Then create an uplink physical network in LXD that uses that netplan bridge. LXD will detect the bridge and connect it to OVN, either by directly connecting to the OVS bridge or by setting up an OVS bridge and using a veth pair to connect to the native Linux bridge.
1 Like

I think you nailed it! All these 7 points are spot on.

I will re-create with a physical uplink, using a netplan ovs bridge and come back to you :slight_smile:

  1. This will likely cause conflicting automatic routes on your LXD hosts, as you effectively have two routes to 192.168.3.0/24, one via the external uplink interface br0 and one into the managed bridge uplink0.

Yes

Are enp5s0 and enp6s0 connected to the same physical network?

Yes, but one is 1gbit and the other is 10Gbit. The 10Gbit link is for Ceph and the 1Gbit link would be for OVN and node management. Ideally I would get LXD to also use the 10Gbit link for instance migration but that’s for later to figure out. (Some instances will run on local ZFS)

For now the goal is to get things to work. Thank you so much for the help so far, it’s been very detailed!

2 Likes

:heart_eyes: it works, both IPv4 and IPv6 fully routed. Using the right bridge it was actually pretty quick to get everything to work. I spent more time fixing ipv6 bgp and firewall issues on my router than setting up both OVN and LXD.

Reason for using a managed bridge was because of a conversation I had with @edlerd on GitHub because it was the “preferred” way, but I now reverted that decision and went with a bridge created in netplan.

This is arguably a bug in LXD, because its not an intended configuration setup to combine OVN BGP announcements and managed bridge networks.

Maybe this is something that can be blocked.

Thank you so much @tomp for your assistance! It seems the guide was enough after all.

I think a good addition to the current guide is to give some guidance related to a NAT free setup and point you in the direction of the BGP guide, plus some notes to get everything up & running with microovn. Your notes regarding the ups and downs of bridges and openvswitch bridges in netplan are also valuable information that could be added :slight_smile:

I might move from microovn to local ovn in the future so that I can work with an openvswitch based bridge directly in netplan.

Below is a write-up with my complete setup, my config and the commands used to get this to work.

I have three nodes sharing a L2 network (192.168.3.0/24). node1, node2 and node3.

  • enp5s0 (1Gbit, bridged, for management, OVN and LXD API)
  • enp6s0 (10Gbit, Ceph + LXD Cluster)

To prevent routing issues, the 10Gbit link has a subnet that is not routed on this L2 network.

Example Netplan node1
network:
  version: 2
  ethernets:
    enp5s0: {}

    # 10Gbit; Ceph + LXD Cluster
    enp6s0:
      dhcp4: false
      dhcp6: false
      accept-ra: false
      addresses:
        - 10.0.0.61/24 # Local use only

  bridges:

    # 1Gbit; Management, OVN, LXD API
    br-enp5s0:
      #openvswitch:
      #  protocols: [OpenFlow13, OpenFlow14, OpenFlow15]
      #  external-ids:
      #    iface-id: node1
      interfaces:
        - enp5s0
      addresses:
        - 192.168.3.61/24
      routes:
        - to: default
          via: 192.168.3.1

Because I am currently running OVN via the microovn snap, i am not able to make an OpenvSwitch out of this bridge because openvswitch cannot run both on my host and inside the snap, therefore I am using a standard linux bridge.

I have reserved 172.16.0.0/16 and 2001:db8:1234:abc0::/60 for this LXD cluster’s OVN networks.

Set up OVN using microovn:

snap install microovn
microovn cluster bootstrap
microovn cluster add node2
microovn cluster add node3

Configure LXD for use with this OVN:

lxc config set network.ovn.ca_cert="$(cat /var/snap/microovn/common/data/pki/cacert.pem") \
    network.ovn.client_cert="$(cat /var/snap/microovn/common/data/pki/client-cert.pem)" \
    network.ovn.client_key="$(cat /var/snap/microovn/common/data/pki/client-privkey.pem)"

lxc config set network.ovn.northbound_connection=ssl:192.168.3.61:6641,ssl:192.168.3.62:6641,ssl:192.168.3.63:6641

@nikita-mezhenskyi the pki steps are currently not in the docs and it’s hard to get the formatting right so I think these commands would be a good addition to the documentation. Also it might be good to include that network.ovn.northbound_connection allows comma separated values.

Next up, creating the uplink on each node:

lxc network create uplink0 --type=physical parent=br-enp5s0 --target=node1
lxc network create uplink0 --type=physical parent=br-enp5s0 --target=node2
lxc network create uplink0 --type=physical parent=br-enp5s0 --target=node3

lxc network create uplink0 --type=physical \
   ipv4.ovn.ranges=192.168.3.5-192.168.3.25 \
   ipv4.gateway=192.168.3.1/24 \
   ipv4.routes=172.16.0.0/16 \
   ipv6.ovn.ranges=2001:db8:1234:3::5-2001:db8:1234:3::25 \
   ipv6.gateway=2001:db8:1234:3::1/64 \
   ipv6.routes=2001:db8:1234:abc0::/60 \
   dns.nameservers=1.1.1.1,2606:4700:4700::1111 \
   bgp.peers.opnsense.address: 192.168.3.1 \
   bgp.peers.opnsense.asn: "65000"

And now you’re able to create the virtual cluster networks:

lxc network create ovn1 --type=ovn \
  ipv4.address=172.16.1.1/24 \
  ipv4.nat="false" \
  ipv6.address=2001:db8:1234:abc1::1/64 \
  ipv6.nat="false" \
  network=uplink0
lxc network create ovn2 --type=ovn \
  ipv4.address=172.16.2.1/24 \
  ipv4.nat="false" \
  ipv6.address=2001:db8:1234:abc2::1/64 \
  ipv6.nat="false" \
  network=uplink0

Now we need to configure LXD’s BGP:

lxc config set core.bgp_asn=65060

lxc config set core.bgp_address=192.168.3.61 --target node1
lxc config set core.bgp_address=192.168.3.62 --target node2
lxc config set core.bgp_address=192.168.3.63 --target node3

lxc config set core.bgp_routerid=192.168.3.61 --target node1
lxc config set core.bgp_routerid=192.168.3.62 --target node2
lxc config set core.bgp_routerid=192.168.3.63 --target node3

Here’s a command to give you some debug data for the BGP setup if you run into any issues: (This works with --target)

lxc query /internal/testing/bgp

And finally, some changes were required on my router (OPNSense)

  1. Enable multi-protocol BGP on your peers to allow IPv6 announces over ipv4 unicast.
  2. Add 172.16.0.0/16 to NAT Outbound Traffic to enable NAT. By default OPNSense does not NAT your traffic if it is not locally configured but instead dynamically learned.
  3. Allow 172.16.0.0/16 and 2001:db8:1234:abc0::/60 to your vlan’s interface firewall inbound traffic.
Working result of a guest lxc connected to ovn1
root@c1:~# ip a show eth0
16: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1442 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:3e:a2:2e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.16.1.2/24 metric 100 brd 172.16.1.255 scope global dynamic eth0
       valid_lft 3142sec preferred_lft 3142sec
    inet6 2001:db8:1234:abc1:216:3eff:fe3e:a22e/64 scope global mngtmpaddr noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe3e:a22e/64 scope link 
       valid_lft forever preferred_lft forever
root@c1:~# ping google.com -c 1
PING google.com (2a00:1450:400e:801::200e) 56 data bytes
64 bytes from ams17s10-in-x0e.1e100.net (2a00:1450:400e:801::200e): icmp_seq=1 ttl=118 time=7.30 ms

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.299/7.299/7.299/0.000 ms
root@c1:~# ping google.com -c 1 -4
PING google.com (142.251.36.46) 56(84) bytes of data.
64 bytes from ams17s12-in-f14.1e100.net (142.251.36.46): icmp_seq=1 ttl=118 time=11.3 ms

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 11.302/11.302/11.302/0.000 ms
3 Likes

Yes I think this is a known issue that @fnordahl is aware of.

Excellent glad you got it sorted :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.