Suspected race condition error in systemd startup order

Ubuntu Version:

Ubuntu 24.04.3 LTS (GNU/Linux 6.14.0-1013-oracle x86_64) (Oracle Cloud image)

Desktop Environment (if applicable):

N/A (headless server)

Problem Description:

I’m setting up a WireGuard VPN server with NAT and forwarding rules applied via `iptables-persistent` (netfilter-persistent). After reboot, the VPN interface `wg0` comes up, but NAT masquerade rules and forwarding rules are **not consistently applied**, causing traffic from the VPN subnet (10.10.0.0/24) to fail.

WireGuard itself works when the rules are manually reapplied. The root cause appears to be systemd starting the wg-quick@wg0.service before netfilter-persistent.service has loaded firewall rules.

Expected behavior:

  • NAT and forwarding rules are loaded before the WireGuard interface comes up.
  • VPN clients can route traffic through the server immediately after boot.

Observed behavior:

  • After a reboot, health checks report missing NAT and forwarding rules.
  • Temporary fixes include manually running wg-quick down/up and reapplying NAT/forwarding rules.

Relevant System Information:

  • iptables-persistent version: 1.0.14
  • netfilter-persistent version: 1.0.14
  • WireGuard kernel module: 5.15.0-76-generic
  • Oracle Cloud Ubuntu image with default InstanceServices firewall rules

Screenshots or Error Messages:

Chain POSTROUTING (policy ACCEPT)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE 0 – * ens3 10.10.0.0/24 0.0.0.0/0

Health check log excerpt after reboot:

=== WireGuard VPN Health Check ===
✅ wg0 interface exists
❌ wg0 missing IP — attempting reapply via wg-quick
❌ NAT masquerade rule missing — adding source-specific rule
   ↳ NAT rule added and saved
✅ Forwarding rule present
✅ IPv4 forwarding enabled
✅ systemd dependency OK
=== Health check complete ===

What I’ve Tried:

  1. Verified that /etc/iptables/rules.v4 contains the correct NAT/forwarding rules.
  2. Ensured netfilter-persistent.service is enabled.
  3. Added systemd override for wg-quick@wg0.service:
[Unit]
After=netfilter-persistent.service
Requires=netfilter-persistent.service
  1. Saved firewall rules with sudo netfilter-persistent save and reloaded daemon.
  2. Created a health-check script that reapplies rules if missing.

Result: After this, the system behaves correctly if the script is run, but rules are still inconsistently applied immediately at boot without the script, suggesting a race condition in systemd startup order.


You could try studying the boot sequence diagram, to identify those possible race conditions, using the SVG chart generated from the following command:

pattern="boot"
report="systemd_analyze__sequence_${pattern}"
sudo systemd-analyze plot >${report}_plot.svg
eog ${report}_plot.svg

I don’t know enough to say if what you need to do is create a dependency related to one of the following:

  • ufw.service
  • network-pre.target
  • systemd-networkd-wait-online.service
  • networking.service
  • sockets.service
  • NetworkManager.service
  • network-dispatcher.service
  • network-online.target
  • openvpn.service

I created a dependency target for an unrelated issue, ensuring a secondary disk was fully mounted before proceeding with display of the login screen:

[Unit]
RequiresMountsFor=/DB001_F2

at the location

  • /etc/systemd/system/lightdm.service.d/override.conf

You can create a similar one for the condition you require to be met.


You can also look at the output of the following report to see the “heavy hitters” in terms of CPU usage during boot, which might give a related hint regarding which service or target needs to be incorporated into a local-custom systemd dependency target.

pattern="boot"
report="systemd_analyze__timing_${pattern}"
sudo systemd-analyze blame >${report}.txt
more ${report}.txt

Or … you could create an “rc.local” file to contain any specialized command sequence to be completed at boot time, for your case, possibly

wg-quick down wg0
{commands to restart iptables/nftables for your circumstance}
wg-quick up wg0

My own “/etc/rc.local” file deals with an unrelated issue, and works just fine!

#!/bin/bash

/DB001_F2/Oasis/bin/HW_Admin__Power_SetFreqCPU.sh --default --service

Hi @ericmarceau — thank you very much for the detailed suggestions and for taking the time to share your diagnostic approach.

In my particular case, it turned out to be a startup race condition between wg-quick@wg0.service and netfilter-persistent.service (iptables restore). I addressed it by creating a systemd override file at:

/etc/systemd/system/wg-quick@wg0.service.d/override.conf

with the following contents:

[Unit]
After=netfilter-persistent.service
Requires=netfilter-persistent.service

I also added a lightweight verification script (/usr/local/bin/wgcheck.sh) that runs at boot via cron (@reboot) to confirm that WireGuard, NAT, and forwarding rules are healthy — it automatically repairs the configuration if anything’s missing.

Since applying these changes, the system has been stable across multiple reboots, and WireGuard comes up cleanly every time.

Your suggestions about systemd-analyze and rc.local are excellent — I’ll keep those in mind for future troubleshooting. Thanks again for contributing your insights!

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.