NVIDIA/ffmpeg problem

Fedora 38. LXD installed using SNAP, latest channel.
snap run lxd --version shows 5.16.
NVIDIA GTX 750, checked NVDEC/NVENC matrix and it is able to do HVEC transcoding (it was doing it in my old installation)

GPU seems to be passed to the container, but is not working.

In the host

$ nvidia-smi
Fri Aug  4 09:19:57 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 750         Off | 00000000:03:00.0  On |                  N/A |
| 23%   35C    P8               1W /  38W |    264MiB /  1024MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1644      G   /usr/libexec/Xorg                           132MiB |
|    0   N/A  N/A      2735      G   xfwm4                                         0MiB |
|    0   N/A  N/A      2768      G   /usr/lib64/firefox/firefox                  125MiB |
+---------------------------------------------------------------------------------------+

and

$ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.13.5
commit: 6b8589dcb4dead72ab64f14a5912886e6165c079

In the ubuntu container,

$nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 750         Off | 00000000:03:00.0  On |                  N/A |
| 23%   34C    P8               1W /  38W |    264MiB /  1024MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

The configuration of the ubuntu container:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20230803_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20230803_07:42"
  image.type: squashfs
  image.variant: default
  nvidia.driver.capabilities: all
  nvidia.runtime: "true"
  raw.idmap: |-
    uid 1000 106
    gid 1000 113
  volatile.base_image: 8251b8e289dfb189517859d775d70947e302d4c2e312f2edaba64679ebee8ab0
  volatile.cloud-init.instance-id: e74d2090-49ff-452c-bdab-e5d964f09600
  volatile.eth0.host_name: vethcd654849
  volatile.eth0.hwaddr: 00:16:3e:b6:97:48
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":106},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":106,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000107,"Nsid":107,"Maprange":999999893},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":113},{"Isuid":false,"Isgid":true,"Hostid":1000,"Nsid":113,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000114,"Nsid":114,"Maprange":999999886}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":106},{"Isuid":true,"Isgid":false,"Hostid":1000,"Nsid":106,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1000107,"Nsid":107,"Maprange":999999893},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":113},{"Isuid":false,"Isgid":true,"Hostid":1000,"Nsid":113,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1000114,"Nsid":114,"Maprange":999999886}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 36f3db49-4e7d-4b48-a3c1-296fd4ffe7b5
  volatile.uuid.generation: 36f3db49-4e7d-4b48-a3c1-296fd4ffe7b5
devices:
  gpu:
    type: gpu
  media:
    path: /mnt/media/
    source: /home/miguel/Media/
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

All seems ok, but when I try decoding using ffmpeg (nvidia enabled), I’m getting that no capable device is found.

[hevc @ 0x5640c85d9680] Hardware is lacking required capabilities
[hevc @ 0x5640c85d9680] Failed setup for format cuda: hwaccel initialisation returned error.
Impossible to convert between the formats supported by the filter 'Parsed_setparams_0' and the filter 'auto_scale_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
Conversion failed!

This same configuration was working in my old Archlinux machine.

So, what am I missing/doing wrong?
EDIT: I’ve tried in my Almalinux 9.2 machine, and same result.

Hi @kinu, could you please try adding the GPU by explicitly specifying its DRM ID and see if this brings any difference:

lxc config device rm {container} gpu
lxc config device add {container} gpu gpu id=0

or PCI address

lxc config device add {container} gpu gpu pci=00000000:03:00.0

We made some changes to this in the past (https://github.com/canonical/lxd/pull/11799). Just to sort out any regression that might have been introduced.

Hello @jpelizaeus. Tried both and same result. Nvidia-smi works, but ffmpeg is unable to detect the hardware.

EDIT: Will using stable channel make any difference?

Have you rebooted the instance after you set the two nvidia.* config keys? Just checking the docs and those are not live updatable https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#nvidia-and-cuda-configuration.

Have you rebooted the instance.

Yes, of course, and did snap restart lxd too, just in case, and reboot the computer, because none of them worked. I’m thinking this is a RHL9 family related thing, because I’m suffering this in Almalinux too.