RTX 5070 PCIe link speed stuck at Gen 1 on Ubuntu 24.04 — open kernel modules 580.x

Ubuntu version:
24.04.4 LTS

Kernel:
6.17.0-14-generic

Desktop environment:
LightDM + Gnome

Problem description:
RTX 5070 (GB205, 10de:2f04) runs at PCIe Gen 1 (2.5GT/s) on every boot under the open kernel module driver.

Relevant system info:

  • CPU: Intel Core i7-13700K (Raptor Lake)
  • Motherboard: Gigabyte Z790 AORUS Elite AX Rev 1.1, BIOS FN (latest available)
  • GPU 0: RTX 5070 12GB (HP OEM, 10de:2f04) @ 01:00.0 — primary PCIe x16 slot (CPU-direct lanes, Gen 5 capable)
  • GPU 1: RTX 4060 Ti 16GB @ 06:00.0 — secondary slot, display output
  • Driver: NVIDIA-Linux-x86_64-580.95.05 (Original driver installed was nvidia-driver-580-open 580.126.09 via apt; tested downgrade to 580.95.05 via .run installer and haven’t reverted back)
  • CUDA: 12.8 + 13.0 (primary is 13.0, 12.8 installed in a pyenv-virtualenv)

What lspci shows:
# Root port (00:01.0)
LnkCap: Speed 16GT/s, Width x16
LnkCap2: Supported Link Speeds: 2.5-16GT/s ← BIOS was capped at Gen4 during testing
LnkCtl2: Target Link Speed: 16GT/s

# GPU endpoint (01:00.0)
LnkCap: Speed 32GT/s, Width x16
LnkCap2: Supported Link Speeds: 2.5-32GT/s
LnkSta: Speed 2.5GT/s (downgraded)
LnkCtl2: Target Link Speed: 16GT/s ← driver correctly sets this on 580.95.05

With BIOS set to Auto (Gen 5 ceiling restored), nvidia-smi reports pcie.link.gen.max = 5 and pcie.link.gen.current = 1.

Failure sequence:
On 580.126.09, the driver writes 32GT/s to LnkCtl2, Gen 5 equalization fails (Phy32Sta: EquComplete-), and the link falls back to Gen 1 instead of Gen 4. On 580.95.05, the driver correctly sets a 16GT/s target, but the link still initializes at Gen 1 and cannot be recovered via setpci retrain — both passes leave the link unchanged.

What I’ve tried (all unsuccessful):

  • BIOS-side PCIe speed lock to Gen 4 — driver overrides LnkCtl2 on 580.126.09; has no effect on 580.95.05
  • NVreg_EnablePCIeGen3=1 kernel parameter — no effect
  • setpci retrain at runtime (both before and after display manager) — no effect on either driver version
  • Systemd service (sysfs polling + two retrain passes) — ran correctly, link didn’t move
    Removing pci=nomsi from kernel command line — no effect
  • pcie_aspm=off kernel parameter — in place, no change to PCIe link speed
  • BIOS update — FN is the latest available for this board
  • Downgrade to 580.95.05 — driver behavior improved (no longer forces 32GT/s) but Gen 1 stuck condition remains

[This is a confirmed upstream bug tracked at https://github.com/NVIDIA/open-gpu-kernel-modules/issues/1010 but that issue was filed against 590.x — I’m experiencing the same behavior on 580.126.09 and 580.95.05.]

Has anyone experienced and/or resolved a PCIe Gen 1 fallback on a similar system? Any and all help is much appreciated.

This is my first post to this forum, and I’m new to Linux as a whole.
Apologies if my methodology is a bit unorthodox or irrational, I’m learning as I go. Glad to be here with you all :face_savoring_food: