/dev/nvidia-caps keeps disappearing after reboot

YamiYukiSenpai · July 20, 2025, 4:10am

Ubuntu Version: 24.04 LTS

Desktop Environment (if applicable): Server

Problem Description:

/dev/nvidia-caps doesn’t show up immediately after reboot, preventing my Jellyfin from starting up after reboot.

Relevant System Information:

RTX 3060 Ti
- Planed to use nvidia-patch

Screenshots or Error Messages:

Jul 20 03:58:02 skarletsky nextcloud.apache[3880]: #13 /var/snap/nextcloud/49338/nextcloud/extra-apps/audioplayer/appinfo/register_command.php(20): OC\AppFramework\DependencyInjection\DIContainer->query()
Jul 20 03:58:02 skarletsky nextcloud.apache[3880]: #14 /snap/nextcloud/49338/htdocs/lib/private/Console/Application.php(114): require('...')
Jul 20 03:58:02 skarletsky nextcloud.apache[3880]: #15 /snap/nextcloud/49338/htdocs/console.php(81): OC\Console\Application->loadCommands()
Jul 20 03:58:02 skarletsky nextcloud.apache[3880]: #16 /snap/nextcloud/49338/htdocs/occ(33): require_once('...')
Jul 20 03:58:02 skarletsky docker.dockerd[2138]: time="2025-07-20T03:58:02.660312477Z" level=error msg="failed to start container" container=31a0712084da67409359303b2a6ce8f2889d7a96b1feae2ec96b7465678cfc83 error="error gathering device information while adding custom device \"/dev/nvidia-caps\": no such file or directory"
Jul 20 03:58:02 skarletsky nextcloud.apache[3880]: #17 {main}
Jul 20 03:58:02 skarletsky nextcloud.apache[3971]: An unhandled exception has been thrown:
Jul 20 03:58:02 skarletsky nextcloud.apache[3971]: TypeError: OCA\audioplayer\Db\DbMapper::__construct(): Argument #1 ($userId) must be of type string, null given in /var/snap/nextcloud/49338/nextcloud/extra-apps/audioplayer/lib/Db/DbMapper.php:21
Jul 20 03:58:02 skarletsky nextcloud.apache[3971]: Stack trace:
--
Jul 20 03:58:46 skarletsky kernel: audit: type=1326 audit(1752983926.552:679): auid=1000 uid=1000 gid=1000 ses=1 subj=snap.docker.docker pid=10065 comm="docker" exe="/snap/docker/3265/bin/docker" sig=0 arch=c000003e syscall=434 compat=0 ip=0x48084e code=0x50000
Jul 20 03:58:46 skarletsky kernel: audit: type=1400 audit(1752983926.801:680): apparmor="DENIED" operation="capable" class="cap" profile="snap.nextcloud.nextcloud-fixer" pid=10177 comm="php" capability=7  capname="setuid"
Jul 20 03:58:46 skarletsky kernel: audit: type=1400 audit(1752983926.801:681): apparmor="DENIED" operation="capable" class="cap" profile="snap.nextcloud.nextcloud-fixer" pid=10177 comm="php" capability=6  capname="setgid"
Jul 20 03:58:47 skarletsky kernel: audit: type=1400 audit(1752983927.453:682): apparmor="DENIED" operation="bind" class="net" profile="snap.docker.docker" pid=10065 comm="docker" family="unix" sock_type="stream" protocol=0 requested="bind" denied="bind" addr="@docker_cli_dcb679943f112a06b898bf18935e1ad9"
Jul 20 03:58:47 skarletsky docker.dockerd[2138]: time="2025-07-20T03:58:47.583062109Z" level=error msg="Handler for POST /v1.48/containers/31a0712084da67409359303b2a6ce8f2889d7a96b1feae2ec96b7465678cfc83/start returned error: error gathering device information while adding custom device \"/dev/nvidia-caps\": no such file or directory" spanID=34f91ba501d52399 traceID=c6057526a8be9261652242a1efd2fb82
Jul 20 03:58:48 skarletsky kernel: audit: type=1400 audit(1752983928.056:683): apparmor="DENIED" operation="capable" class="cap" profile="snap.nextcloud.nextcloud-fixer" pid=10274 comm="php" capability=7  capname="setuid"
Jul 20 03:58:48 skarletsky kernel: audit: type=1400 audit(1752983928.056:684): apparmor="DENIED" operation="capable" class="cap" profile="snap.nextcloud.nextcloud-fixer" pid=10274 comm="php" capability=6  capname="setgid"

What I’ve Tried:

I tried with or without patching.
Installed both nvidia-driver-575-open-server and nvidia-driver-570-open-server

cesebe · July 20, 2025, 3:52pm

Are you using ubuntu inside of docker container?
Nvidia kernel modules are heavy that’s the probably reason you can not get immediately.
Maybe you can create your own systemd and give more importance(priority) to loading nvidia modules
just like that:

[Unit]
Description=Nvidia Kernel Modules
After=network.target local-fs.target
Before=containerd.service docker.service

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-container-cli --load-kmods info
TimeoutStopSec=10
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target containerd.service docker.service

YamiYukiSenpai · July 22, 2025, 12:51pm

I have a Systemd that I created similar to the one you made, which creates a YAML file for CDI

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

I was using it with Podman a while back, and when I had to reinstall Ubuntu Server, that’s when I started to notice the problem. I just created that systemd service again, and maybe that might fix it.

Will use the one you suggested if it didn’t fix my problem (haven’t had to reboot it yet)

system · August 27, 2025, 1:10am

This topic was automatically closed after 37 days. New replies are no longer allowed.