Amd GPU crashing on Ubuntu 25.04 ring GFX_0.0.0 timeout and reset failure

Ubuntu Version: 25.04

Desktop Environment (if applicable): Gnome 48

Problem Description: When trying to play some games (the specific game doesn’t matter, it happens across various titles) the system crashes graphically. My screen turns black then the main screen becomes mirorred into the 2nd one and both have some form of graphical artifacts. After which it boots me back to the login screen and allows me to log back in. I’m dual booting and these problems never occurred under Windows 11. The amount of time playing or what I’m doing doesn’t seem to matter or be consistent when the problem occurs. Sometimes it works for a while sometimes it happens quickly.

Relevant System Information:
CPU: Ryzen 5 5600
GPU: RX 5700XT THICC III
RAM: 4*8GB 3200mhz Corsair Vengeance
Motherboard: MSI B450 Tomahawk Max
Secure boot, resizable bar and above 4G decoding enabled.

Screenshots or Error Messages:
This is from journalctl when the crash happened.

Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: Dumping IP State
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: Dumping IP State Completed
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=8651349, emitted seq=8651351
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: Process information: process main pid 240437 thread pyrogenesi:cs0 pid 240568
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: Ring gfx_0.0.0 reset failure
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset begin!
Jun 17 06:19:56 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: BACO reset
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: [drm] PCIE GART of 512M enabled (table at 0x00000081FEE00000).
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: [drm] VRAM is lost due to GPU reset!
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: PSP is resuming...
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: reserve 0x900000 from 0x81fd000000 for PSP TMR
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: SMU is resuming...
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: use vbios provided pptable
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: SMU is resumed successfully!
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 8
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 8
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 8
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
Jun 17 06:19:59 andrei-argeanu-MS-7C02 kernel: amdgpu 0000:28:00.0: amdgpu: GPU reset(10) succeeded!
Jun 17 06:19:59 andrei-argeanu-MS-7C02 gnome-shell[26963]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
Jun 17 06:19:59 andrei-argeanu-MS-7C02 gnome-shell[26859]: meta_wayland_buffer_process_damage: assertion 'buffer->resource' failed
Jun 17 06:19:59 andrei-argeanu-MS-7C02 0ad_0ad.desktop[240437]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.

These are the mesa drivers I have installed

libegl-mesa0/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
libgl1-mesa-dri/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
libgl1-mesa-dri/plucky,now 25.0.3-1ubuntu2 i386 [installed,automatic]
libglu1-mesa/plucky,now 9.0.2-1.1build1 amd64 [installed,automatic]
libglx-mesa0/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
libglx-mesa0/plucky,now 25.0.3-1ubuntu2 i386 [installed,automatic]
libosmesa6/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
libosmesa6/plucky,now 25.0.3-1ubuntu2 i386 [installed,automatic]
mesa-libgallium/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
mesa-libgallium/plucky,now 25.0.3-1ubuntu2 i386 [installed,automatic]
mesa-utils-bin/plucky,now 9.0.0-2 amd64 [installed,automatic]
mesa-utils/plucky,now 9.0.0-2 amd64 [installed,automatic]
mesa-vdpau-drivers/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
mesa-vdpau-drivers/plucky,now 25.0.3-1ubuntu2 i386 [installed,automatic]
mesa-vulkan-drivers/plucky,now 25.0.3-1ubuntu2 amd64 [installed,automatic]
mesa-vulkan-drivers/plucky,now 25.0.3-1ubuntu2 i386 [installed,automatic]

What I’ve Tried: Nothing. I have no idea how to go about fixing or diagnosing this.


Found this. Could be useful
https://www.reddit.com/r/linux_gaming/comments/1hixrrq/elden_ring_crash_and_the_log_said_radvamdgpu_the/

Similar problem but I’m not entirely sure if that’s the right fix.

I happened to find this in the gentoo wiki
https://wiki.gentoo.org/wiki/AMDGPU#Frequent_and_Sporadic_Crashes

In /sys/class/drm/card1/device:
Running cat pp_dpm_sclk I get

0: 300Mhz 
1: 800Mhz *
2: 2200Mhz 

Running pp_dpm_mclk I get

0: 100Mhz 
1: 500Mhz 
2: 625Mhz 
3: 875Mhz *

Now 875mhz makes sense for my graphics card memory clock, DDR so effective is 1750MHz and times 8 to get the 14Gbps bandwidth. All good.

Problem is for cat pp_dpm_sclk which I can only assume is the GPU clock speed, value 2:2200mhz is quite a bit higher than what my GPU can handle. TechPowerUp reports, 1810MHz base, 1935MHz game, and 2035MHz boost clock. So I can only assume that what’s happening is my GPU is unintentionally getting overclocked, and it fails and results in the artifcating and errors I’m getting.

Question is. How do I go about fixing this? Assuming that’s what my problem is.

note that the asterisk (*) marks the currently used value on your system, so your GPU runs at 800Mhz at that point in time …

Yep. I guess that makes sense. I thought the other values were upper and lower limits. Either way trying now with ETS2 it instantly crashes so anything remotely intensive will make the GPU crash. I wrote a basic script to log the pp_dpm_sclk to a text file every 2 seconds and these are the last values it recorded before the gpu crash.

Clock speeds
0: 300Mhz 
1: 1910Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 1620Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 2115Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 2115Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 2115Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 2115Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 2115Mhz *
2: 2200Mhz 
Clock speeds
0: 300Mhz 
1: 2115Mhz *
2: 2200Mhz 
Clock speeds
Clock speeds
0: 300Mhz 
1: 800Mhz *
2: 2200Mhz 

That’s 90mhz over boost, and 180mhz over game clock. Why does it clock my GPU past what it’s capable off? And what am I supposed to do about it?

Edit: I don’t understand. I run furmark 2 no problems in ubuntu. I try and play ets2 or half life 2 and it crashes nearly instantly.

I have a different GPU, an RDNA3 one and I’ve experienced my fair share of timeout issues. So I thought I might be able to provide some advice. I went through a long series of troubleshooting steps but what I think ultimately helped in my case was updating to Mesa 25.1 and newer. Normally I’m not the biggest fan of using lots of PPAs or third party repos but the kisak-mesa fresh PPA has always been solid. Unfortunately, it currently doesn’t support Ubuntu 25.04 so instead I’ve switched to this one provided by Ernst Persson, Mesa Almost Stable. Has been working great in my experience. Right now it’s on Mesa 25.1.3 and it tends to update quicker compared to kisak’s PPA. However, some might consider frequent Mesa updates to be like a double edge sword. Sure, it usually fixes bugs but might introduce new bugs. If you use your PC for gaming then generally I think it’s better to have it more up to date.

If updating Mesa doesn’t help or you rather not use a PPA then you could try setting this kernel parameter: amdgpu.ppfeaturemask=0xfffd7fff
This disables STUTTER_MODE and GFXOFF which are GPU power saving features while still allowing the typical clock, voltage, power adjustments. Normally, you’re supposed to use amdgpu.ppfeaturemask=0xffffffff if you want the overclock/undervolt support but that parameter basically unlocks/enables everything. Including those two aforementioned power saving features. Speaking on GPU configuration, I recommend you use LACT. Has worked more reliably in my experience compared to corectrl or manually adjusting it.

It’s probably also a good idea to check if you have ASPM disabled in your BIOS for your GPU’s PCIe slot. It’s another power saving feature. I’ve read Linux and ASPM don’t always play nicely together. Can be somewhat buggy depending on the hardware or firmware. Disabling “Spread Spectrum” could potentially improve stability as well but I don’t think it affects the GPU nearly as much as the CPU and RAM.

Regarding that gentoo wiki page about GPU clocks being too high. Personally, I don’t think that’s the cause of the issues since it happens on Windows as well. Modern GPUs have an opportunistic boost algorithm where, as long as they’re not power or temp limited, they boost as high as they want. Generally, they almost always run higher compared to the advertised clockspeeds. Although, it doesn’t hurt if you still want to lower them. Regardless, still reduces temps and arguably improves stability.

Another parameter you could try is this: amdgpu.ppfeaturemask=0xfffd3fff
I’ve seen this one suggested around somewhat. It also disables STUTTER_MODE and GFXOFF but disables OVERDRIVE aka manual OC/UV support as well. Might or might not help.

1 Like

Unfortunately none of this seems to work. I’ve also tried capping the clocks of the GPU and undervolting to no avail, and similarly disabled XMP with no better results. I don’t have an option for ASPM in the bios. Tried ubuntu 24.04 LTS of another drive with similar problems occurring there as well. I think I might just have to return back to Windows. Nothing seems to work properly.

Damn. That is unfortunate. I’m not exactly sure what that cause is. These ring timeout crashes are usually hard to diagnose. But I’ll just throw out random troubleshooting suggestions and see what sticks.

You could try updating your motherboard’s BIOS. Latest version was released back in April of this year. Sometimes there’s improvements like with PCIe signal link or added features. Such as adding ASPM control. Alternatively, you could test these parameters: pcie_aspm=off and/or pcie_port_pm=off
I’ve read these parameters have helped with other cards, namely Nvidia ones.

I noticed you have a motherboard which only supports PCIe 3.0 while your GPU is PCIe 4.0. You could try manually setting the link version for the GPU slot from “auto” to gen 3.0.

It’ll hurt performance but you could briefly test with resizable bar and 4G decoding disabled. I believe those features didn’t originally ship with B450 series motherboards. It was more so added after the fact when RX 6000 series launched. I could be wrong tho.

A good thing to check is your GPU’s power cables. It’s better to use two separate cables instead of a single cable with daisy chain aka pigtail connectors.

If I recall correctly, since kernel 6.12 “3D_FULL_SCREEN” became the default power profile for AMD GPUs. While that definitely has helped with overall performance and stability. You could instead try setting your GPU’s performance level to “high” which makes clocks run at their highest power state. Notably the GPU’s SOC. It might not apply to RDNA 1, but having the SOC clock run higher could help with VRAM stability. Keep in mind, power-profiles-daemon will probably override your performance level. Changing it back to automatic.

If your GPU allows it, you could try increasing the voltage in small increments. By like 10-15 mV.

Ok I found something that seems promising. Another person experiencing crashes with the same GPU as you and they apparently managed to resolved their issue.
https://www.reddit.com/r/linuxquestions/comments/1lbbiwm/amd_radeon_rx_5700_xt_irregular_crashes_only/

This topic was automatically closed after 30 days. New replies are no longer allowed.