Network debugging

This page presents a collection of techniques for interrogating your cloud’s virtual networking system (OVN). Whether prompted by sheer interest or by necessity (an issue has arisen), this page will assist you in looking into the internals of your cloud’s networking layer.

Note: This page was inspired by upstream OVN documentation. Many OVN troubleshooting techniques can be applied equally to a Sunbeam environment.

Contents:

Accessing OVN databases

There are four containers in each ovn-chassis pod:

  • Northd
  • Northbound database
  • Southbound database
  • the charm itself

These containers can each be accessed with Juju over SSH. Once connected, start a Bash shell and create aliases for accessing the OVN tooling:

For the Northbound DB container:

juju ssh -m openstack --container ovn-nb-db-server ovn-central/0

Set up aliases:

bash
alias ovn-nbctl='ovn-nbctl --db=ssl:127.0.0.1:6641 -c /etc/ovn/cert_host -p /etc/ovn/key_host -C /etc/ovn/ovn-central.crt'

For the Southbound DB container:

juju ssh -m openstack --container ovn-sb-db-server ovn-central/0

Set up aliases:

bash
alias ovn-sbctl='ovn-sbctl --db=ssl:127.0.0.1:6642 -c /etc/ovn/cert_host -p /etc/ovn/key_host -C /etc/ovn/ovn-central.crt'

Querying OVN databases

Assuming that all the defaults for a single-node install were used and sunbeam launch was used to create a guest, then there will be a demo-network, external-network, demo-router, and a guest.

These are some of the entities that are present from an OpenStack perspective:

openstack server list --all-projects
+--------------------------------------+-----------+--------+-------------------------------------------+--------+---------+
| ID                                   | Name      | Status | Networks                                  | Image  | Flavor  |
+--------------------------------------+-----------+--------+-------------------------------------------+--------+---------+
| 6c446cb5-4934-401a-917d-e3bc215c0b64 | rapid-owl | ACTIVE | demo-network=10.20.20.138, 192.168.122.83 | ubuntu | m1.tiny |
+--------------------------------------+-----------+--------+-------------------------------------------+--------+---------+

openstack network list
+--------------------------------------+------------------+--------------------------------------+
| ID                                   | Name             | Subnets                              |
+--------------------------------------+------------------+--------------------------------------+
| 3f9bc3b1-2520-4658-85f0-545a69e8b06a | demo-network     | 17e394f9-e12c-4f31-a269-62ddf3308fc8 |
| 856fe9e3-60bf-4177-bb8b-831f68bb55c0 | external-network | 14c63eaf-eeb7-476d-a99d-0a05f6a674f8 |
+--------------------------------------+------------------+--------------------------------------+

openstack subnet list
+--------------------------------------+-----------------+--------------------------------------+------------------+
| ID                                   | Name            | Network                              | Subnet           |
+--------------------------------------+-----------------+--------------------------------------+------------------+
| 14c63eaf-eeb7-476d-a99d-0a05f6a674f8 | external-subnet | 856fe9e3-60bf-4177-bb8b-831f68bb55c0 | 10.20.20.0/24    |
| 17e394f9-e12c-4f31-a269-62ddf3308fc8 | demo-subnet     | 3f9bc3b1-2520-4658-85f0-545a69e8b06a | 192.168.122.0/24 |
+--------------------------------------+-----------------+--------------------------------------+------------------+

openstack router list
+--------------------------------------+-------------+--------+-------+----------------------------------+
| ID                                   | Name        | Status | State | Project                          |
+--------------------------------------+-------------+--------+-------+----------------------------------+
| 5c300bae-bf1f-4773-ac98-1d71c23e1bc7 | demo-router | ACTIVE | UP    | b8c896d15bb247448edd2d97f7d99f1f |
+--------------------------------------+-------------+--------+-------+----------------------------------+

openstack port list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                            | Status |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+
| 418c3e5d-87fa-467c-b1c1-b9832fa1e752 |      | fa:16:3e:09:d4:a6 | ip_address='192.168.122.2', subnet_id='17e394f9-e12c-4f31-a269-62ddf3308fc8'  | DOWN   |
| 56a18b9e-07d4-4249-b28b-b6446961a587 |      | fa:16:3e:23:60:97 | ip_address='10.20.20.239', subnet_id='14c63eaf-eeb7-476d-a99d-0a05f6a674f8'   | ACTIVE |
| 98835e99-8ab5-4cd3-8b17-207e15538c03 |      | fa:16:3e:2d:6e:82 |                                                                               | DOWN   |
| ae7b9a8e-48e8-4c3a-9ef0-710ccba00776 |      | fa:16:3e:70:93:8c | ip_address='192.168.122.1', subnet_id='17e394f9-e12c-4f31-a269-62ddf3308fc8'  | ACTIVE |
| cd9f7cce-77cb-4fae-ae1c-94964248d8d5 |      | fa:16:3e:00:53:35 | ip_address='10.20.20.138', subnet_id='14c63eaf-eeb7-476d-a99d-0a05f6a674f8'   | N/A    |
| d8174cec-c5ae-4bd0-abb4-9420c3b87e76 |      | fa:16:3e:dd:8f:4d | ip_address='192.168.122.83', subnet_id='17e394f9-e12c-4f31-a269-62ddf3308fc8' | ACTIVE |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------+--------+

To make the structure in OVN more readable, it helps to label the above ports. Firstly, there are clearly two ports related to the rapid-owl guest:

openstack port set --name rapid-owl-internal d8174cec-c5ae-4bd0-abb4-9420c3b87e76
openstack port set --name rapid-owl-floating cd9f7cce-77cb-4fae-ae1c-94964248d8d5

Similarly, there are two ports connected to the demo-router:

openstack port set --name demo-router-internal ae7b9a8e-48e8-4c3a-9ef0-710ccba00776
openstack port set --name demo-router-floating 56a18b9e-07d4-4249-b28b-b6446961a587

This leaves two ports unaccounted for. By showing the details of these ports, we see that they are used internally for guest metadata:

openstack port show -c device_id -c device_owner -c network_id 418c3e5d-87fa-467c-b1c1-b9832fa1e752
+--------------+----------------------------------------------+
| Field        | Value                                        |
+--------------+----------------------------------------------+
| device_id    | ovnmeta-3f9bc3b1-2520-4658-85f0-545a69e8b06a |
| device_owner | network:distributed                          |
| network_id   | 3f9bc3b1-2520-4658-85f0-545a69e8b06a         |
+--------------+----------------------------------------------+

openstack port show -c device_id -c device_owner -c network_id 98835e99-8ab5-4cd3-8b17-207e15538c03
+--------------+----------------------------------------------+
| Field        | Value                                        |
+--------------+----------------------------------------------+
| device_id    | ovnmeta-856fe9e3-60bf-4177-bb8b-831f68bb55c0 |
| device_owner | network:distributed                          |
| network_id   | 856fe9e3-60bf-4177-bb8b-831f68bb55c0         |
+--------------+----------------------------------------------+

Note: The two metadata ports are marked as down and each of the guests floating IP ports is in a N/A state. In both cases, this is normal and not an indication of any kind of problem.

These entities are reflected in the configuration of the Northbound DB.

ovn-nbctl show
switch 7fd2fe36-74b6-41a4-9005-d521d2a9a0fd (neutron-3f9bc3b1-2520-4658-85f0-545a69e8b06a) (aka demo-network)
    port d8174cec-c5ae-4bd0-abb4-9420c3b87e76 (aka rapid-owl-internal)
        addresses: ["fa:16:3e:dd:8f:4d 192.168.122.83"]
    port 418c3e5d-87fa-467c-b1c1-b9832fa1e752
        type: localport
        addresses: ["fa:16:3e:09:d4:a6 192.168.122.2"]
    port ae7b9a8e-48e8-4c3a-9ef0-710ccba00776 (aka demo-router-internal)
        type: router
        router-port: lrp-ae7b9a8e-48e8-4c3a-9ef0-710ccba00776
switch 31f5c4f7-725b-4313-86a5-2b5c47d4f03a (neutron-856fe9e3-60bf-4177-bb8b-831f68bb55c0) (aka external-network)
    port 98835e99-8ab5-4cd3-8b17-207e15538c03
        type: localport
        addresses: ["fa:16:3e:2d:6e:82"]
    port 56a18b9e-07d4-4249-b28b-b6446961a587 (aka demo-router-floating)
        type: router
        router-port: lrp-56a18b9e-07d4-4249-b28b-b6446961a587
    port provnet-f5363a0a-8963-4271-a844-e545ba5f931b
        type: localnet
        addresses: ["unknown"]
router 1a6ddfff-8a1e-45a6-bdf8-6f13e7c5d8f9 (neutron-5c300bae-bf1f-4773-ac98-1d71c23e1bc7) (aka demo-router)
    port lrp-ae7b9a8e-48e8-4c3a-9ef0-710ccba00776
        mac: "fa:16:3e:70:93:8c"
        networks: ["192.168.122.1/24"]
    port lrp-56a18b9e-07d4-4249-b28b-b6446961a587
        mac: "fa:16:3e:23:60:97"
        networks: ["10.20.20.239/24"]
        gateway chassis: [microk8s06.maas]
    nat aba8126c-612d-4de5-9445-6aacb813714a
        external ip: "10.20.20.138"
        logical ip: "192.168.122.83"
        type: "dnat_and_snat"
    nat cf7cfd04-ebfa-4407-b14e-1d43f999e233
        external ip: "10.20.20.239"
        logical ip: "192.168.122.0/24"
        type: "snat"

Over in the Southbound DB, the chassis for this deployment can be examined:

ovn-sbctl show
Chassis microk8s06.maas
    hostname: microk8s06.maas
    Encap geneve
        ip: "10.177.200.18"
        options: {csum="true"}
    Port_Binding "d8174cec-c5ae-4bd0-abb4-9420c3b87e76"
    Port_Binding cr-lrp-56a18b9e-07d4-4249-b28b-b6446961a587

The flows can also be listed:

ovn-sbctl lflow-list
...

Capturing and tracing an ingress packet

The example below captures and then traces an ICMP echo request packet destined for a guest. The first step is to capture an echo request packet. The tcpdump command can be used for this. In this example, there is a single-node install with access to the guests available from the installation node. The guests floating IP address is 10.20.20.138. The routes on the box show that traffic for this subnet will be routed to br-ex.

ip route | grep '10.20.20.0/24'
10.20.20.0/24 dev br-ex proto kernel scope link src 10.20.20.1

Listen on the br-ex interface, filter for echo request packets (an ICMP code of 8), and store the captured packets in a file for later usage:

Window 1:

sudo tcpdump -i br-ex "icmp[0] == 8" -w ping.pcap

Window 2:

ping -c3 10.20.20.138

The ping.pcap file should now contain the echo requests generated by the ping command. To use these with the ovs trace utility the pcap file needs to be converted. The utility for doing this is called ovs-pcap. At the time of writing, this command is included in the openstack-hypervisor snap but is not exposed. However it can still be used:

/snap/openstack-hypervisor/current/usr/bin/ovs-pcap ping.pcap > ping.hex

The ping.hex file will contain three entries corresponding to each of the echo requests. For this example only the first is needed.

IN_PORT="br-ex"
BRIDGE="br-ex"
PACKET=$(head -1 ping.hex)
sudo openstack-hypervisor.ovs-appctl ofproto/trace $BRIDGE in_port="$IN_PORT" $PACKET

If all is well the last rule in the output should end with:

...
65. reg15=0x3,metadata=0x2, priority 100, cookie 0x3d326af3
    output:2

This shows that the packet was sent out of OpenFlow port number 2. This corresponds to the intended guest (See “Resloving OpenFlow port numbers” below).

Tracing a hypothetical ingress packet

By default, a guest launched in the demo project will respond to an echo request.

ping -q -c3 10.20.20.138
PING 10.20.20.138 (10.20.20.138) 56(84) bytes of data.

--- 10.20.20.138 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2045ms
rtt min/avg/max/mdev = 0.351/0.472/0.692/0.155 ms

This request can be simulated using ovs-appctl. Sunbeam installs this utility as part of the openstack-hypervisor snap and can be accessed via openstack-hypervisor.ovs-appctl:

sudo openstack-hypervisor.ovs-appctl --help
ovs-appctl, for querying and controlling Open vSwitch daemon
...

To simulate the echo request above, some information needs to be gathered. Since the packet enters ovs via the br-ex bridge the first step is to gather the MAC and IP address of the bridge:

ip address show  br-ex
48: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 46:fc:d8:8d:05:49 brd ff:ff:ff:ff:ff:ff
    inet 10.20.20.1/24 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::44fc:d8ff:fe8d:549/64 scope link
       valid_lft forever preferred_lft forever

BR_EX_MAC="46:fc:d8:8d:05:49"
BR_EX_IP="10.20.20.1"

OpenFlow assigns each port a number so the next step is to find what number has been assigned to the br-ex port on the br-ex bridge:

sudo openstack-hypervisor.ovs-vsctl get Interface br-ex ofport
65534
PORT_BR_EX=65534

Next, gather data about the destination of the request. The IP address that was ping’d earlier was 10.20.20.138:

GUEST_FLOATING_IP="10.20.20.138"

The demo-router is going to handle this traffic so the destination MAC address in this case is actually the MAC address of the demo-routers port on the external network:

openstack port list --router demo-router
+--------------------------------------+----------------------+-------------------+------------------------------------------------------------------------------+--------+
| ID                                   | Name                 | MAC Address       | Fixed IP Addresses                                                           | Status |
+--------------------------------------+----------------------+-------------------+------------------------------------------------------------------------------+--------+
| 56a18b9e-07d4-4249-b28b-b6446961a587 | demo-router-floating | fa:16:3e:23:60:97 | ip_address='10.20.20.239', subnet_id='14c63eaf-eeb7-476d-a99d-0a05f6a674f8'  | ACTIVE |
| ae7b9a8e-48e8-4c3a-9ef0-710ccba00776 | demo-router-internal | fa:16:3e:70:93:8c | ip_address='192.168.122.1', subnet_id='17e394f9-e12c-4f31-a269-62ddf3308fc8' | ACTIVE |
+--------------------------------------+----------------------+-------------------+------------------------------------------------------------------------------+--------+

ROUTER_EXT_MAC="fa:16:3e:23:60:97"

Since this is going to trace a single packet, information about the type of packet is needed. In this case, it is the echo request which is part of the ping. An ipv4 icmp echo request has an icmp_type of 8 and a code of 0. Lastly, nw_ttl needs to be set to accommodate the number of hops needed. In this case ‘64’ is a reasonable value.

Putting this all together:

sudo openstack-hypervisor.ovs-appctl ofproto/trace \
   br-ex \
   icmp,\
   in_port=$PORT_BR_EX,\
   dl_src=$BR_EX_MAC,\
   dl_dst=$ROUTER_EXT_MAC,\
   nw_src=$BR_EX_IP,\
   nw_dst=$GUEST_FLOATING_IP,\
   nw_ttl=64,\
   icmp_type=8,\
   icmp_code=0

This produces a large amount of output - details of how the packet is traversing the OpenFlow rules - but the important piece is at the end:

...
65. reg15=0x3,metadata=0x2, priority 100, cookie 0x3d326af3
    output:2

This shows that the packet was sent out of OpenFlow port number 2. This corresponds to the intended guest (see “Resolving OpenFlow port numbers” below).

Finally, delete the security group rule that is permitting ICMP traffic and check that the trace command now drops the traffic.

openstack security group list --project demo
+--------------------------------------+---------+------------------------+----------------------------------+------+
| ID                                   | Name    | Description            | Project                          | Tags |
+--------------------------------------+---------+------------------------+----------------------------------+------+
| 00aed662-f303-47fa-82a7-86cde90a4ee1 | default | Default security group | b8c896d15bb247448edd2d97f7d99f1f | []   |
+--------------------------------------+---------+------------------------+----------------------------------+------+

openstack security group rule list --ingress --protocol icmp 00aed662-f303-47fa-82a7-86cde90a4ee1
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| ID                                   | IP Protocol | Ethertype | IP Range  | Port Range | Direction | Remote Security Group | Remote Address Group |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+
| 33237298-6052-45d9-9a7e-1fee0a7587b7 | icmp        | IPv4      | 0.0.0.0/0 |            | ingress   | None                  | None                 |
+--------------------------------------+-------------+-----------+-----------+------------+-----------+-----------------------+----------------------+

openstack security group rule delete 33237298-6052-45d9-9a7e-1fee0a7587b7

This time the trace command ends with:

...
44. ip,reg0=0x200/0x200,reg15=0x3,metadata=0x2, priority 2001, cookie 0x5eeee244
    drop

Resolving OpenFlow port numbers

When looking at OpenFlow rules or tracing a packet, the ports are given numbers. These are the OpenFlow port numbers. For example, to find what port 2 corresponds to:

sudo openstack-hypervisor.ovs-vsctl find interface ofport=2 | grep -E "^name"
name                : tapd8174cec-c5

Often the first part of the corresponding port’s UUID is included in the name of the device. This enables it to be traced back:

openstack port list | grep d8174cec-c5
| d8174cec-c5ae-4bd0-abb4-9420c3b87e76 | rapid-owl-internal   | fa:16:3e:dd:8f:4d | ip_address='192.168.122.83', subnet_id='17e394f9-e12c-4f31-a269-62ddf3308fc8' | ACTIVE |
3 Likes