No /proc/driver/nvidia/gpu directory in container

Hello,

On my Archlinux host i have the following directory,

$ ls /proc/driver/nvidia
capabilities  gpus  params  patches  registry  suspend  suspend_depth  version  warnings

However, the container is missing it.

# ls /proc/driver/nvidia
params  registry  version

I can run nvidia-smi inside the container,


$ lxc exec first -- nvidia-smi
Sun May  5 13:42:25 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650        Off |   00000000:05:00.0 Off |                  N/A |
| 35%   39C    P8             N/A /   75W |       6MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

My lxd container config file is:

 lxc config show first
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 24.04 LTS amd64 (release) (20240423)
  image.label: release
  image.os: ubuntu
  image.release: noble
  image.serial: "20240423"
  image.type: squashfs
  image.version: "24.04"
  nvidia.runtime: "true"
  volatile.base_image: c9fba5728bfe168aa73084b94deab3dd3a1e349b5f7e0b5e5a8e945899cb0378
  volatile.cloud-init.instance-id: 80da8d75-db2a-46f5-91ae-2536a353674c
  volatile.eth0.host_name: vethf773d754
  volatile.eth0.hwaddr: 00:16:3e:d5:07:be
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 1fbe510d-2e6b-43d9-a1c1-89cc7d39631f
  volatile.uuid.generation: 1fbe510d-2e6b-43d9-a1c1-89cc7d39631f
devices:
  mygpu:
    type: gpu
ephemeral: false
profiles:
- default
stateful: false
description: ""

How do i get a gpu directory in the container?

A workaround is mentioned here. It suggests to symlink manually using mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:01:00.0 or with a systemd script,

$ cat /etc/systemd/system/fix-gpu-passthrough.service

[Unit]
Description=Creates Symlink required for LXC/Nvidia to Docker passthrough
Before=docker.service

[Service]
User=root
Group=root
ExecStart=/bin/bash -c 'mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:01:00.0'
Type=oneshot

[Install]
WantedBy=multi-user.target
1 Like