Thanks Ubuntu Discourse forum team for giving me the space to slow down, read, and think about my problem. I really needed that. It can be so frustrating when so many demands are put on ourselves and our computers. I will describe what fixed my problem, and follow up with an analysis on how I got there.
- Edit the
initramfs-tools
modules file:
sudo nano /etc/initramfs-tools/modules
-
Add
amdgpu
on a new line at the end of the file. -
Update the initramfs to apply the change:
sudo update-initramfs -u
- Reboot
How I got there
I needed to analyze the ouotput of journalctl -b | grep -iE "gdm|amdgpu|drm"
.
Here’s my log and markup for anyone looking for a pattern:
##
## kernel begins loading the amdgpu driver
##
Jul 24 15:00:30 greenlake kernel: [drm] amdgpu kernel modesetting enabled.
Jul 24 15:00:30 greenlake kernel: amdgpu: vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
Jul 24 15:00:30 greenlake kernel: amdgpu: ATPX version 1, functions 0x00000801
Jul 24 15:00:30 greenlake kernel: amdgpu: ATPX Hybrid Graphics
Jul 24 15:00:30 greenlake kernel: amdgpu: Virtual CRAT table created for CPU
Jul 24 15:00:30 greenlake kernel: amdgpu: Topology: Add CPU node
Jul 24 15:00:30 greenlake kernel: amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
Jul 24 15:00:30 greenlake kernel: [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x7480 0xF111:0x0007 0xC1).
Jul 24 15:00:30 greenlake kernel: [drm] register mmio base: 0x90C00000
Jul 24 15:00:30 greenlake kernel: [drm] register mmio size: 1048576
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 0 <soc21_common>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 3 <psp>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 4 <smu>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 5 <dm>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ACPI VFCT table present but broken (too short #2),skipping
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
Jul 24 15:00:31 greenlake kernel: amdgpu: ATOM BIOS: 113-BRT125778.001
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
Jul 24 15:00:31 greenlake kernel: [drm] GPU posting now...
Jul 24 15:00:31 greenlake kernel: [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
Jul 24 15:00:31 greenlake kernel: [drm] Detected VRAM RAM=8176M, BAR=8192M
Jul 24 15:00:31 greenlake kernel: [drm] RAM width 128bits GDDR6
Jul 24 15:00:31 greenlake kernel: [drm] amdgpu: 8176M of VRAM memory ready
Jul 24 15:00:31 greenlake kernel: [drm] amdgpu: 31045M of GTT memory ready.
Jul 24 15:00:31 greenlake kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Jul 24 15:00:31 greenlake kernel: [drm] PCIE GART of 512M enabled (table at 0x00000081FEB00000).
Jul 24 15:00:31 greenlake kernel: [drm] Loading DMUB firmware via PSP: version=0x07002800
Jul 24 15:00:31 greenlake kernel: [drm] Found VCN firmware Version ENC: 1.19 DEC: 7 VEP: 0 Revision: 0
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x81fc000000 for PSP TMR
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000035, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x00525800 (82.88.0)
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
Jul 24 15:00:31 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
Jul 24 15:00:31 greenlake kernel: [drm] Display Core v3.2.316 initialized on DCN 3.2.1
Jul 24 15:00:31 greenlake kernel: [drm] DP-HDMI FRL PCON supported
Jul 24 15:00:31 greenlake kernel: [drm] DMUB hardware initialized: version=0x07002800
Jul 24 15:00:31 greenlake kernel: snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
##
## GDM starts too early, the driver is still getting set up
##
Jul 24 15:00:32 greenlake systemd[1]: Starting gdm.service - GNOME Display Manager...
Jul 24 15:00:32 greenlake systemd[1]: Started gdm.service - GNOME Display Manager.
Jul 24 15:00:32 greenlake gdm-launch-environment][2305]: pam_unix(gdm-launch-environment:session): session opened for user gdm(uid=120) by (uid=0)
Jul 24 15:00:32 greenlake systemd-logind[1876]: New session c1 of user gdm.
Jul 24 15:00:32 greenlake (systemd)[2352]: pam_unix(systemd-user:session): session opened for user gdm(uid=120) by gdm(uid=0)
Jul 24 15:00:33 greenlake systemd[2352]: drkonqi-coredump-cleanup.timer - Cleanup lingering KCrash metadata was skipped because of an unmet condition check (ConditionPathExistsGlob=/var/lib/gdm3/.cache/kcrash-metadata/*.ini).
Jul 24 15:00:33 greenlake systemd[2352]: drkonqi-coredump-cleanup.service - Cleanup lingering KCrash metadata was skipped because of an unmet condition check (ConditionPathExistsGlob=/var/lib/gdm3/.cache/kcrash-metadata/*.ini).
Jul 24 15:00:33 greenlake systemd[1]: Started session-c1.scope - Session c1 of User gdm.
##
## GDM tries to get start a wayland login, but it fails
##
Jul 24 15:00:33 greenlake /usr/libexec/gdm-wayland-session[2448]: dbus-daemon[2448]: [session uid=120 pid=2448] Activating service name='org.freedesktop.systemd1' requested by ':1.2' (uid=120 pid=2471 comm="/usr/libexec/gnome-session-binary --autostart /usr" label="unconfined")
Jul 24 15:00:33 greenlake /usr/libexec/gdm-wayland-session[2448]: dbus-daemon[2448]: [session uid=120 pid=2448] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Jul 24 15:00:33 greenlake kernel: amdgpu: HMM registered 8176MB device memory
Jul 24 15:00:33 greenlake kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Jul 24 15:00:33 greenlake kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Jul 24 15:00:33 greenlake kernel: amdgpu: Virtual CRAT table created for GPU
Jul 24 15:00:33 greenlake kernel: amdgpu: Topology: Add dGPU node [0x7480:0x1002]
Jul 24 15:00:33 greenlake kernel: kfd kfd: amdgpu: added device 1002:7480
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 32
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
Jul 24 15:00:33 greenlake kernel: [drm] ring gfx_32768.1.1 was added
Jul 24 15:00:33 greenlake kernel: [drm] ring compute_32768.2.2 was added
Jul 24 15:00:33 greenlake kernel: [drm] ring sdma_32768.3.3 was added
Jul 24 15:00:33 greenlake kernel: [drm] ring gfx_32768.1.1 ib test pass
Jul 24 15:00:33 greenlake kernel: [drm] ring compute_32768.2.2 ib test pass
Jul 24 15:00:33 greenlake kernel: [drm] ring sdma_32768.3.3 ib test pass
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: amdgpu: Using BOCO for runtime pm
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: [drm] Registered 4 planes with drm panic
Jul 24 15:00:33 greenlake kernel: [drm] Initialized amdgpu 3.61.0 for 0000:03:00.0 on minor 1
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Jul 24 15:00:33 greenlake kernel: [drm] pre_validate_dsc:1601 MST_DSC dsc precompute is not needed
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: enabling device (0006 -> 0007)
Jul 24 15:00:33 greenlake kernel: [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x15BF 0xF111:0x0005 0xC1).
Jul 24 15:00:33 greenlake kernel: [drm] register mmio base: 0x90500000
Jul 24 15:00:33 greenlake kernel: [drm] register mmio size: 524288
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 0 <soc21_common>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 3 <psp>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 4 <smu>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 5 <dm>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: Fetched VBIOS from VFCT
Jul 24 15:00:33 greenlake kernel: amdgpu: ATOM BIOS: 113-PHXGENERIC-001
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: vgaarb: deactivate vga console
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
Jul 24 15:00:33 greenlake kernel: [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: VRAM: 2048M 0x0000008000000000 - 0x000000807FFFFFFF (2048M used)
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
Jul 24 15:00:33 greenlake kernel: [drm] Detected VRAM RAM=2048M, BAR=2048M
Jul 24 15:00:33 greenlake kernel: [drm] RAM width 128bits DDR5
Jul 24 15:00:33 greenlake kernel: [drm] amdgpu: 2048M of VRAM memory ready
Jul 24 15:00:33 greenlake kernel: [drm] amdgpu: 31045M of GTT memory ready.
Jul 24 15:00:33 greenlake kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Jul 24 15:00:33 greenlake kernel: [drm] PCIE GART of 512M enabled (table at 0x000000807FD00000).
Jul 24 15:00:33 greenlake kernel: [drm] Loading DMUB firmware via PSP: version=0x08005000
Jul 24 15:00:33 greenlake kernel: [drm] Found VCN firmware Version ENC: 1.19 DEC: 7 VEP: 0 Revision: 0
Jul 24 15:00:33 greenlake /usr/libexec/gdm-wayland-session[2448]: dbus-daemon[2448]: [session uid=120 pid=2448] Activating service name='ca.desrt.dconf' requested by ':1.2' (uid=120 pid=2471 comm="/usr/libexec/gnome-session-binary --autostart /usr" label="unconfined")
Jul 24 15:00:33 greenlake /usr/libexec/gdm-wayland-session[2448]: dbus-daemon[2448]: [session uid=120 pid=2448] Successfully activated service 'ca.desrt.dconf'
Jul 24 15:00:33 greenlake gdm-launch-environment][2305]: pam_unix(gdm-launch-environment:session): session closed for user gdm
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: GdmDisplay: Session never registered, failing
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: reserve 0x4000000 from 0x8078000000 for PSP TMR
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: on_display_added: assertion 'GDM_IS_REMOTE_DISPLAY (display)' failed
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: Child process -2416 was already dead.
##
## The driver is not ready, and wayland login screen fails to start
##
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: GdmDisplay: Session never registered, failing
Jul 24 15:00:33 greenlake kernel: amdgpu 0000:c4:00.0: amdgpu: reserve 0x4000000 from 0x8078000000 for PSP TMR
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: on_display_added: assertion 'GDM_IS_REMOTE_DISPLAY (display)' failed
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: Child process -2416 was already dead.
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: GdmDisplay: Session never registered, failing
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: on_display_removed: assertion 'GDM_IS_REMOTE_DISPLAY (display)' failed
Jul 24 15:00:33 greenlake gdm3[2270]: Gdm: Child process -2416 was already dead.
Jul 24 15:00:33 greenlake gdm-launch-environment][2650]: pam_unix(gdm-launch-environment:session): session opened for user gdm(uid=120) by (uid=0)
##
## From here out, gdm defaults to x11
##
Jul 24 15:00:33 greenlake systemd-logind[1876]: New session c2 of user gdm.
Jul 24 15:00:33 greenlake systemd[1]: Started session-c2.scope - Session c2 of User gdm.
Jul 24 15:00:33 greenlake /usr/libexec/gdm-x-session[2661]: (--) Log file renamed from "/var/lib/gdm3/.local/share/xorg/Xorg.pid-2661.log" to "/var/lib/gdm3/.local/share/xorg/Xorg.0.log"
Jul 24 15:00:33 greenlake /usr/libexec/gdm-x-session[2661]: X.Org X Server 1.21.1.11
Jul 24 15:00:33 greenlake /usr/libexec/gdm-x-session[2661]: X Protocol Version 11, Revision 0
... trunc ...
##
## The gpu finishes loading way too late
##
Jul 24 15:00:38 greenlake /usr/libexec/gdm-x-session[4150]: (==) Automatically adding devices
Jul 24 15:00:38 greenlake /usr/libexec/gdm-x-session[4150]: (==) Automatically enabling devices
Jul 24 15:00:38 greenlake /usr/libexec/gdm-x-session[4150]: (==) Automatically adding GPU devices
Jul 24 15:00:38 greenlake /usr/libexec/gdm-x-session[4150]: (==) Automatically binding GPU devices
This is where the initramfs
fix comes into play. We added amdgpu
into /etc/initramfs-tools/modules
, which loaded the graphics driver and fully initialized the hardware very early in the boot process. This eliminated the race condition we observed in the journalctl
log.
An LLM helped me build a “timeline of failure” for the log. Between this and the fix using initramfs, I do not know I could have gotten there as quickly. Today was pretty demanding, and some of these problems can be cryptic. Thanks to the community again for hosting a space to talk about these problems.