Containers can ICMP the internet but TCP hangs (fails silently?) (apt, curl, etc.)

Hello everyone,

I am running a MicroCloud cluster with LXD using OVN networking. My containers can ping external IPv4 and IPv6 hosts, but any TCP connections fail (for example, curl, apt update, and HTTPS).
apt hangs on IPv6 addresses unless forced to use IPv4, and even then the TCP connection times out.

Setup Details

  • MicroCloud cluster (4 nodes)
  • OVN managed default network: 10.80.186.0/24
  • Public uplink network: 87.62.82.105/29
  • Uplink configured in LXD:
name: UPLINK
type: physical
ipv4.gateway: 87.62.82.105/29
ipv4.ovn.ranges: 87.62.82.108-87.62.82.110
dns.nameservers: 1.1.1.1,8.8.8.8

Inside a container:


# containers (Using ES-2 as test)

lxc list
+------------+---------+---------------------+-----------------------------------------------+-----------+-----------+----------+
|    NAME    |  STATE  |        IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS | LOCATION |
+------------+---------+---------------------+-----------------------------------------------+-----------+-----------+----------+
| ES-1       | RUNNING | 10.80.186.7 (eth0)  | fd42:5531:4632:7640:216:3eff:fe03:7f48 (eth0) | CONTAINER | 0         | flap     |
+------------+---------+---------------------+-----------------------------------------------+-----------+-----------+----------+
| ES-2       | RUNNING | 10.80.186.8 (eth0)  | fd42:5531:4632:7640:216:3eff:febb:4174 (eth0) | CONTAINER | 0         | roll     |
+------------+---------+---------------------+-----------------------------------------------+-----------+-----------+----------+

# apt update (Tried forcing ipv4 with the same result)
0% [Connecting to archive.ubuntu.com (2620:2d:4000:1::103)]

root@ES-2:~# ping -c 3 -v google.com
ping: sock4.fd: 3 (socktype: SOCK_RAW), sock6.fd: 4 (socktype: SOCK_RAW), hints.ai_family: AF_UNSPEC

ai->ai_family: AF_INET, ai->ai_canonname: 'google.com'
PING google.com (142.251.9.113) 56(84) bytes of data.
64 bytes from rc-in-f113.1e100.net (142.251.9.113): icmp_seq=1 ident=2414 ttl=105 time=30.6 ms
64 bytes from rc-in-f113.1e100.net (142.251.9.113): icmp_seq=2 ident=2414 ttl=105 time=30.4 ms
64 bytes from rc-in-f113.1e100.net (142.251.9.113): icmp_seq=3 ident=2414 ttl=105 time=29.0 ms

--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 28.974/29.980/30.587/0.716 ms

root@ES-2:~# curl -v google.com --max-time 5
* Host google.com:80 was resolved.
* IPv6: 2a00:1450:4025:c03::8a, 2a00:1450:4025:c03::65, 2a00:1450:4025:c03::8b, 2a00:1450:4025:c03::64
* IPv4: 142.251.9.113, 142.251.9.139, 142.251.9.138, 142.251.9.102, 142.251.9.101, 142.251.9.100
*   Trying 142.251.9.113:80...
*   Trying [2a00:1450:4025:c03::8a]:80...
* ipv4 connect timeout after 2499ms, move on!
*   Trying 142.251.9.139:80...
* ipv6 connect timeout after 2399ms, move on!
*   Trying [2a00:1450:4025:c03::65]:80...
* ipv4 connect timeout after 1249ms, move on!
*   Trying 142.251.9.138:80...
* ipv6 connect timeout after 1198ms, move on!
*   Trying [2a00:1450:4025:c03::8b]:80...
* ipv4 connect timeout after 624ms, move on!
*   Trying 142.251.9.102:80...
* ipv6 connect timeout after 598ms, move on!
*   Trying [2a00:1450:4025:c03::64]:80...
* ipv4 connect timeout after 311ms, move on!
*   Trying 142.251.9.101:80...
* Connection timed out after 5000 milliseconds
* Closing connection
curl: (28) Connection timed out after 5000 milliseconds


lxc network list
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
|   NAME    |   TYPE   | MANAGED |      IPV4      |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| UPLINK    | physical | YES     |                |                           |             | 1       | CREATED |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| br69      | bridge   | NO      |                |                           |             | 0       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| br-int    | bridge   | NO      |                |                           |             | 0       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| default   | ovn      | YES     | 10.80.186.1/24 | fd42:5531:4632:7640::1/64 |             | 12      | CREATED |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| eno8303   | physical | NO      |                |                           |             | 0       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| eno8403   | physical | NO      |                |                           |             | 1       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| enp69s0f0 | physical | NO      |                |                           |             | 0       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| enp69s0f1 | physical | NO      |                |                           |             | 2       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| lxdovn1   | bridge   | NO      |                |                           |             | 0       |         |
+-----------+----------+---------+----------------+---------------------------+-------------+---------+---------+

lxc network show default
name: default
description: ""
type: ovn
managed: true
status: Created
config:
  bridge.mtu: "1442"
  ipv4.address: 10.80.186.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:5531:4632:7640::1/64
  ipv6.nat: "true"
  network: UPLINK
  volatile.network.ipv4.address: 87.62.82.108
used_by:
- /1.0/instances/ES-1
- /1.0/instances/ES-2
- /1.0/instances/caddy-lb
- /1.0/instances/caddy-ws-1
- /1.0/instances/caddy-ws-2
- /1.0/instances/grafana
- /1.0/instances/mariadb
- /1.0/instances/php-fpm
- /1.0/instances/prometheus
- /1.0/instances/redis
- /1.0/profiles/default
- /1.0/profiles/php-fpm
locations:
- flap
- roll
- tail
- wing
project: default

lxc network show UPLINK
name: UPLINK
description: ""
type: physical
managed: true
status: Created
config:
  dns.nameservers: 1.1.1.1,8.8.8.8
  ipv4.gateway: 87.62.82.105/29
  ipv4.ovn.ranges: 87.62.82.108-87.62.82.110
  ipv4.routes: 87.62.82.108/32
  volatile.last_state.created: "false"
used_by:
- /1.0/networks/default
locations:
- flap
- roll
- tail
- wing
project: default

Question

What could cause ICMP to work but all TCP traffic to the Internet to fail when using MicroCloud + OVN NAT?

Could it be mtu? tried adjusting with no luck

Is there any additional OVN configuration needed on routed uplinks for TCP flows, or does this indicate an uplink routing/filtering issue outside of LXD/OVN?

Hi,

I suspect this is definitely an MTU/MSS mismatch issue. I’ve debugged the exact same symptom with a KVM/QEMU setup using OVN.

Since OVN uses the Geneve protocol, it encapsulates the inner packet within UDP. This increases the packet size. When it goes from the logical port to the physical port(or vice versa) , an MTU mismatch happens because of this extra overhead.

The quickest way to solve this is using MSS clamping. You can verify the drop point by running tcpdump on the decapsulating port. If ICMP passes but TCP doesn’t, that’s where the drop is happening.

Applying MSS clamping on the interface (taking out the L3 and Geneve overhead(60~70) from the max MTU) should fix it. The LXD team might have a built-in solution, but this manual fix works well in the meantime.

But if you want to go deeper, check out the “Physical VLAN MTU Issues” section in ovn-architecture(7). There’s a way to bypass encapsulation for outbound traffic using localnet ports—basically avoiding the MTU mess entirely if your topology supports it.

1 Like