GPU Passthrough with the GPU device

Hello Folks,

I’m trying to pass through a GPU into my LXD VM, and having some great difficulties - here’s the steps I tried:

  • blacklist
    • nvidia
    • snd_hda_intel
  • Pass GPU via CDI notiation
    • Fails with a warning about not being able to find the GPU
  • Pass GPU directly via PCI address
  • Pass GPU vendor & model ID

These last two attempts “worked” in that the GPU will configure via LXD, but when starting the VM it will just hang as “Running”. I tried to get more information via lxd info --show-log gpu-test - this produces a more-or-less blank output. Additionally I cannot SSH to, or access the VM via the LXD Agent.

Any ideas how I can troubleshoot this? Here’s my VM config:

config:
  image.architecture: amd64
  image.description: ubuntu 22.04 LTS amd64 (release) (20241002)
  image.label: release
  image.os: ubuntu
  image.release: jammy
  image.serial: "20241002"
  image.type: disk-kvm.img
  image.version: "22.04"
  limits.cpu: "4"
  limits.memory: 8GiB
  security.secureboot: "false"
  volatile.base_image: 5a63bc87974e61c631567c0d171fefffb33ebb7525b8295672ef0c5bf2cbd898
  volatile.cloud-init.instance-id: 8cff43d3-0875-4b24-a059-b244145f78b8
  volatile.eth0.hwaddr: 00:16:3e:87:4b:cb
  volatile.last_state.power: STOPPED
  volatile.uuid: 7b3d4750-da34-43b1-8f29-5bf5e3658a24
  volatile.uuid.generation: 7b3d4750-da34-43b1-8f29-5bf5e3658a24
  volatile.vsock_id: "1433741272"
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  nvidia-gpu:
    pci: "01:00.0"
    type: gpu
  root:
    path: /
    pool: nvme
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Additional information:

  • LXD 6.1
  • NVIDIA RTX4000 Series GPU
  • AMD iommu enabled
  • main output is via the AMD iGPU in the CPU. GPU should be idle
[    1.253430] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

In the end, it was a firmware configuration. In my case - I have a 24G GPU, so passing it through I need to increase the MIMO Size. See https://edk2.groups.io/g/discuss/topic/ovmf_resource_assignment/59340711 for details.

In the end, the fix was adding the QEMU config: raw.qemu: -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536