Project | LXD |
---|---|
Status | Drafting |
Author(s) | @tomparrott |
Approver(s) | @egelinas |
Release | 6.x |
Internal ID | LX077 |
Abstract
Add support for using Bluefield DPU (Data Processing Unit) NIC cards for acceleration of LXD OVN networks.
Rationale
LXD already supports accelerating OVN networking flows when using an SR-IOV card that is compatible with switchdev mode. See SR-IOV hardware acceleration.
Existing SR-IOV OVN acceleration:
However in this mode the LXD host(s) still need to run the ovn-controller
and have access to an Open VSwitch OVN integration bridge on each host.
With OVN DPU acceleration it becomes possible to shift these components onto the DPU card that is attached to the LXD host. This provides both additional offloading of work away from the LXD host(s) as well as improved security/isolation, because there are fewer services running on the LXD host(s).
The Bluefield 2 card is a NIC and a separate ARM computer combined. It is connected to the LXD host using the PCIe bus.
On the LXD host the network interfaces from the card do not represent the physical ports on the card.
Instead there are physical functions (PFs) and associated virtual functions (VFs) which are joined to associated PF and VF “representor” interfaces on the DPU card itself. In this way packets can flow between the host and the DPU card, and the “representor” ports can then be connected to bridges on the DPU card to have their flows offloaded to the NIC.
In this scenario the LXD host(s) will only see the SR-IOV PF and VF interfaces and will pass them to the instances as needed. LXD will still communicate with the OVN northbound and southbound database services, but they do not necessarily need to be running on the same host.
Proposed DPU OVN acceleration:
Specification
Instance NIC connectivity
1. LXD needs to be told hold how to communicate with OVN southbound database
Currently LXD is told how to communicate with the OVN northbound database by way of the server setting network.ovn.northbound_connection
. In order for it to communicate with the OVN southbound database it takes this from the ovs-vsctl get open_vswitch . external_ids:ovn-remote
command. This is because each OpenVswitch chassis must be configured to communicate with the OVN southbound database and so there was no need to duplicate that configuration.
However in DPU acceleration mode, OpenVSwitch will not be running on the LXD host(s), so we will need a new setting, this is being proposed as network.ovn.southbound_connection
.
2. The ovn
NIC type will need to have a new acceleration
mode value
Currently ovn
NICs support two acceleration modes: sriov
and vdpa
.
We will need to extend this option to support a proposed dpu
mode in order to be able to indicate that a specific instance NIC should use the DPU acceleration mode.
3. The associated ovn
network will need to have a new per-member setting to indicate physical function (PF) interface
Currently when an ovn
NIC has acceleration
enabled the candidate VFs are selected by way of requiring its PF interface(s) to be connected to the host’s OVN integration bridge.
Set up OVS by enabling hardware offload and adding the PF NIC to the integration bridge (normally called
br-int
):
ovs-vsctl set open_vswitch . other_config:hw-offload=true
systemctl restart openvswitch-switch
ovs-vsctl add-port br-int enp9s0f0np0
This is problematic because the operator may want to use the PF interface for some other purpose (perhaps for host networking), and requiring it be added to the OVN integration bridge is restrictive.
So when using the acceleration
setting we need a better way for LXD to know which VFs from which PFs are candidates for use with instance ovn
NICs.
The proposal is to add a per-member setting to ovn
networks, named either parent.<mode>
, which will indicate which PFs to use for VF allocation when ovn
NICs connected to that network have acceleration
enabled in that mode.
This way it will be possible for ovn
NICs in the same network to use a mixture of acceleration modes.
E.g. lxc network set ovn1 parent.dpu=enp130s0f0np0
4. The DPU card will need to be configured (by the operator)
On the DPU card, OpenVswitch will need to be configured (by the operator) as follows
- Enable hardware offload:
ovs-vsctl set open_vswitch . other_config:hw-offload=true
- Store the DPU card’s serial number
This will be automatically synced to the OVN’s southbound database’schassis
table.
It can be retrieved usinglspci -vv
looking for the[SN] Serial number
value from theCapabilities: [48] Vital Product Data
section.
ovs-vsctl set open_vswitch . \
external_ids:ovn-cms-options=card_serial_number=<DPU_SERIAL_NUMBER>
- Connect it to OVN:
The steps from from https://documentation.ubuntu.com/lxd/en/latest/howto/network_ovn_setup/#set-up-a-lxd-cluster-on-ovn need to be followed to connect the DPU to OVN.
5. Instance ovn
NIC start procedure
When LXD starts an instance with an ovn
NIC configured with acceleration=dpu
, it will consult the NIC’s ovn
network’s settings and look for the parent.dpu
PF for that cluster member.
If none is found then acceleration cannot be used, so it should fail to start.
If a matching PF is found, LXD will need to check if there is a free VF, and if not try and activate VFs by modifying /sys/class/net/<PF_Interface>/device/sriov_numvfs
as it does today for sriov
mode.
LXD will then also need to instruct OVN to schedule the logical switch port on the associated DPU card and connect the VF’s representor port to the integration bridge (br-int) on the DPU card.
This can be done as follows:
- On the LXD host parse
/sys/class/net/<PF interface>/device/uevent
and extractPCI_SLOT_NAME
setting. - Get DPU card’s serial number using
lspci -s <PCI_SLOT_NAME> -vv
.
I’ve not found a good way to extract this in machine readable format.
But we are looking to get the[SN] Serial number
value from theCapabilities: [48] Vital Product Data
section.
This data is also available in/sys/class/net/<PF interface>/device/vpd
but would need to be decoded somehow. Seehexdump -C /sys/class/net/<PF_Interface>/device/vpd
. - Consult the OVN southbound database’s
chassis
table to find the matching ovn-controller chassis running on the DPU card. E.g.
ovn-sbctl find chassis \
external_ids:ovn-cms-options="card_serial_number\=<DPU_SERIAL_NUMBER>"
- LXD will then need to get the PF interface’s MAC address.
- The logical switch port then needs to be configured by LXD as follows:
sudo ovn-nbctl set logical_switch_port
<LOGICAL_SWITCH_PORT_NAME> \
requested-chassis="<OVN_DPU_CHASSIS_NAME>" \
options:"vif-plug\:representor\:pf-mac"="<PF_MAC_ADDRESS>" \
options:"vif-plug\:representor\:vf-num"=<VF_NUMBER> \
options:"vif-plug-type"=representor \
options:"vif-plug-mtu-request"=1500
If all is well, the VF interface on the host will now be connected to the OVN logical switch port by way of the VF’s representor port on the DPU card having been connected to the OVN integration bridge on the DPU card and being scheduled by OVN on the DPU’s ovn-controller chassis.
At this point the VF interface on the host can be passed into the instance as usual.
Uplink network connectivity
The above gets instances connected to the OVN network using a DPU.
However it does not add support for using a DPU card’s physical port to provide external uplink connectivity for ovn
networks. Nor does it actually allow an ovn
network to be created because LXD requires a functional uplink network to exist on the LXD host.
To get support for this the proposal is to add a new type of network called dpu
that can be used as an ovn
network uplink. This new dpu
network type would be very basic and would only contain IP settings (similar to the physical
network type today, but without the parent
or vlan
settings).
It would require the operator to configure an OpenVswitch bridge on the DPU card and connect physical port(s) to it, and then setup the bridge to uplink provider name mappings:
ovs-vsctl set open_vswitch . \
external-ids:ovn-bridge-mappings=<uplinkNetName>-<bridgeName>
API changes
An API extension will be added called ovn_nic_acceleration_dpu
to indicate support for the new network.ovn.southbound_connection
server setting, ovn
NIC acceleration
mode and the new dpu
network type.
CLI changes
None
Database changes
None
Upgrade handling
None
Further information
Diagrams reproduced and modified with the permission of @fnordahl from a presentation given at Open Infrastructure Summit.