Title | Definition of network-online.target |
Internal ID | FO020 |
Status | Pending Review |
Author | @slyon |
Abstract
With this specification we’re trying to formalize the definition of “online” in the context of systemd’s network-online.target, so we can implement a common behavior across the Distro and decide about the online state of given services based on this definition.
Rationale
Certain applications require an “online” network connection, before they can be started. They achieve this by making their systemd service order After=network-online.target
and pulling in the Wants=network-online.target
dependency. This is generally discouraged, as any application should try to handle the (changing) network conditions dynamically, but is unavoidable in some cases.
Pulling network-online.target into the boot transaction can lead to a delayed boot sequence and to confusion due to the wide variety of definitions of an “online”/”up”/”up-and-running” network (Link-layer up, IPv4/6 assigned, global route available, DHCP responded, DNS resolved, LAN reachable, WAN/internet reachable, …).
In cloud images network-online.target is pulled in automatically, as cloud-config.service and cloud-final.service define a Wants=network-online.target
dependency.
As an example, the nfs-server.service (src:nfs-utils
) needs the network to be “online” and DNS set up for name resolution, in order to mount the user’s NFS share, defined by hostname (LP: #1918141).
Specification
Status quo
Currently, the network-online.target service depends on systemd-networkd-wait-online.service for networkd status and/or NetworkManager-wait-online.service for NetworkManager status respectively. Usage of networkd and NetworkManager at the same time is discouraged, as there are slight differences (like differing definitions of “online”) which can lead to confusion (e.g. LP: #19516).
In case of networkd, “/lib/systemd/systemd-networkd-wait-online” (using default parameters) defines the “network-online” logic, while “/usr/bin/nm-online -s” defines that logic in case NetworkManager is in use.
systemd-networkd-wait-online’s logic [systemd-networkd-wait-online (8)]:
- By default, it will wait for all links it is aware of, which are managed by systemd-networkd and are configured as “RequiredForOnline=yes” (the default) to be fully configured or failed, and for at least one link to be online. Here, “online” means that the link’s operational state is equal to or higher than “degraded” (i.e. has a link-local IP). By default the loopback interface is ignored.
- The default timeout is 120 seconds, once hit it logs an error “Failed to start Wait for Network to be Configured” and marks the “system-networkd-wait-online.service” as failed; booting continues (after the delay) without the networking being “online”
- The operational status is one of the following:
- Missing: the device is missing
- Off: the device is powered down
- No-carrier: the device is powered up, but it does not yet have a carrier
- Dormant: the device has a carrier, but is not yet ready for normal traffic
- Degraded-carrier: for bond or bridge master, one of the bonding or bridge slave network interfaces is in off, no-carrier, or dormant state
- Carrier: the link has a carrier, or for bond or bridge master, all bonding or bridge slave network interfaces are enslaved to the master
- Degraded: the link has carrier and addresses valid on the local link configured
- Enslaved: the link has carrier and is enslaved to bond or bridge master network interface
- Routable: the link has carrier and routable address configured
- The setup status is one of the following:
- Pending: udev is still processing the link, we don’t yet know if we will manage it
- Failed: networkd failed to manage the link
- Configuring: in the process of retrieving configuration or configuring the link
- Configured: link configured successfully
- Unmanaged: networkd is not handling the link
- Linger: the link is gone, but has not yet been dropped by networkd
Nm-online’s logic [nm-online (1)], using the -s/--wait-for-startup
parameter:
- Startup is considered complete once NetworkManager has activated (or attempted to activate) every auto-activate connection (
autoconnect=true
) which is available given the current network state. By default, connections have the ipv4.may-fail and ipv6.may-fail properties set to yes; this means that NetworkManager waits for one of the two address families to complete configuration before considering the connection activated. - The default timeout is 30 seconds
Ubuntu’s definition of “online”
We consider a system to be “online” when ALL of the following conditions are met:
- all non-optional interfaces MUST be up on link layer (
optional: no
in netplan sense)- including completion of ipv6
RAlink-local and/or ipv4 link-local if enabled on the interface, except if those are explicitly marked as “optional” (see “MUST NOT” section below)
- including completion of ipv6
- at least one interface MUST be up on the link layer and have received layer 3 (IP) configuration
- incl. IP address of at least one address family and corresponding routes (ignoring IPv6/IPv4 link local addresses), c.f. systemd’s
RequiredFamilyForOnline=
configuration
- incl. IP address of at least one address family and corresponding routes (ignoring IPv6/IPv4 link local addresses), c.f. systemd’s
- there MUST be a default route for at least one configured address family
- discovery of the default routes for all other configured address families MUST have been attempted (succeeded or failed) – Including routes provided via DHCP, IPv6 RA, OSPF, BGP, … (if available/enabled)
- DNS MUST be configured
The status of “online” MUST NOT be delayed or blocked by the following:
- link status or configuration of interfaces that are marked optional (
optional: yes
in netplan sense) - address sources that are defined as “optional” for an interface (
optional-addresses
in netplan sense, e.g.[ipv4-ll, dhcp6]
) - Configuration of a default route for an address family of which no interfaces have addresses defined
Reaching of network-online.target:
- A “wait-online daemon” should be running in the background, checking for the definition of online, according to this specification.
- This can be either a (modified) version of “systemd-networkd-wait-online”, “nm-online”, a new daemon listening to netlink like “netplan-wait-online”, or a combination of those
- The wait-online service should exit with a success return code if the “online” state as described in this spec is reached, it shall keep running indefinitely otherwise, while the “online” state is not yet reached.
- This blocks the starting of services pulling in network-online.target via a
Wants=
orRequires=
dependency on purpose, in cases where networking is not available - This has the potential to block the whole boot process, if services pull in the network-online.target, sort
After=network-online.service
andBefore=multi-user.target
(or similar higher level target, even indirecly through other service dependencies or starting order) at the same time. Such services need to be identified and fixed, as the network being down should never delay or block the overall boot process. It should only block services that actually depend on the networking being “online”, while continuing to boot any other services and reaching the final target in parallel.
- This blocks the starting of services pulling in network-online.target via a
Q/A (from previous discussions)
-
(@slyon) What about WiFi (on Desktop)?
- those interfaces SHOULD be marked as “optional: true” or not be defined at all (“renderer: NetworkManager” for all interfaces), so they can be ignored by the waiting logic (@slyon)
-
(@xnox) what if network connectivity drops & gets re-established? Should we bounce the network-online.target (aka restart it)? We can declare for units to be restarted, when network-online.target is restarted, if they otherwise themselves are incapable to dynamically detect networking loss & networking resumption.
- Fix this in a “stage 2” attempt, focus on fixing newtork-online.target bringup for now (@vorlon)
-
(~any upstream) Do we really want to adopt Ubuntu’s new definition of “online”?
- It doesn’t have to be adopted upstream for us to be making Ubuntu better for our users. The goal is to get this agreed with the systemd upstream community; but that should not be a blocker. (@vorlon)
-
We should not try to change systemd-networkd-wait-online’s definition of “online” but only extend the tool (upstream, if possible) using “STATE_TAGS”, allowing to define and reach Ubuntu’s state of online from an external definition (e.g. netplan.io YAML). (@slyon)
-
(@rbasak) Some packages can be configured differently. For example, “named” serving DNS authoritatively bound to particular interfaces might need to wait until the “network is up”. But for “named” configured as a local recursive resolver, it’s the opposite.
- We can only handle a package’s default configuration case here. There will always be cases where if you configure one thing in one place, you’re expected/required to configure another thing in another place to match. (@rbasak)
-
(@vorlon) Does the Desktop Team need input on the changes to the definition of nm-online?
-
(@vorlon) if we are using NM which has support for detecting captive portals, is it a requirement that we have gotten through the captive portal?
- IMO we should not block on captive portals, as there isn’t really any way to get through those during boot. (@slyon)
Further information
- https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
- Packaging policy discussion: After=network-online.target
- https://warthogs.atlassian.net/browse/FR-10
- https://www.freedesktop.org/software/systemd/man/systemd-networkd-wait-online.service.html
- https://developer.gnome.org/NetworkManager/stable/nm-online.html
- https://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/facilname.html