AMD iGPU crashing after recent update

Ubuntu Version:
Ubuntu 24.04.3 LTS

Desktop Environment (if applicable):
Gnome

Problem Description:
Although I have seen chatter around AMD’s iGPU on L1T, my unit has been working for 3+ month already without any issue. So I still hope this is relevant to some of the updates, applied over the last 7-10 days. Unless the funny word “degradation” is in order.

This is related to MS Edge(dev) running MS Teams as an app. The system just hangs (cursor, keyboard… but one time I was in a call, and could hear other people. A few times (I am getting the impression that if I don’t move the mousepointer) I managed to get with a few seconds of frozen screen. But most of the times - 5-15 seconds of frozen screen → black screen → session killed. One time (today) it managed to stay in the black screen (responsive, can type, but not tty).

From the perspective of Teams (yes, every time it happened, Teams was involved), it doesn’t need to be a call (video, or even audio). Simply moving it from minimized (yes, I forgot the word), or even switching to a different chat can cause this.

I tried looking through journalctl, but for odd reason, didn’t find anything for that time period (started accusing a faulty m.2 drive). But I decided to keep journalctl -t in background, and try to capture the problem. Which I did. And which I’m sharing.

I know that maybe “go buy a dGPU, pleb” would be an option (I am already considering it), but the system is strictly for work, and the iGPU was more than enough for my line of work.

P.S. I do realize that this may better be posted as a bug report, but I didn’t find where to do so (hopefully some moder will address this question :wink: )

Relevant System Information:
Kernel: Linux 6.14.0-29-generic
Mobo: MSI x870 Tomahawk (not newest bios, but lets leave this aside)
CPU: Ryzen 9900x
GPU: no discrete GPU, using iGPU

Screenshots or Error Messages:

Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771)
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:  in process msedge pid 7090 thread msedge:cs0 pid 7119
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 0x1b (UTCL2)
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701430
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:          MORE_FAULTS: 0x0
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:          WALKER_ERROR: 0x0
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:          MAPPING_ERROR: 0x0
Sep 04 16:44:08 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu:          RW: 0x0
Sep 04 16:44:19 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: Dumping IP State
Sep 04 16:44:19 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: Dumping IP State Completed
Sep 04 16:44:19 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered
Sep 04 16:44:19 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: Dumping IP State
Sep 04 16:44:19 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: Dumping IP State Completed
Sep 04 16:44:19 amdpc kernel: amdgpu 0000:6e:00.0: amdgpu: ring gfx_0.1.0 timeout, but soft recovered

I run into very similar issues when using Google Meet on Chromium. I am using a AMD Ryzen 9 7950X CPU with iGPU. The problems started last week, and I just had it today again:

[90542.563452] amdgpu 0000:6b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32790)
[90542.563458] amdgpu 0000:6b:00.0: amdgpu:  in process chromium pid 52522 thread chromium:cs0 pid 52545
[90542.563461] amdgpu 0000:6b:00.0: amdgpu:   in page starting at address 0x000098993f800000 from client 0x1b (UTCL2)
[90542.563463] amdgpu 0000:6b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[90542.563465] amdgpu 0000:6b:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[90542.563466] amdgpu 0000:6b:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[90542.563468] amdgpu 0000:6b:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[90542.563469] amdgpu 0000:6b:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[90542.563471] amdgpu 0000:6b:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[90542.563472] amdgpu 0000:6b:00.0: amdgpu: 	 RW: 0x0
[90552.831863] amdgpu 0000:6b:00.0: amdgpu: Dumping IP State
[90552.832944] amdgpu 0000:6b:00.0: amdgpu: Dumping IP State Completed
[90552.832995] amdgpu 0000:6b:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
[90552.832997] amdgpu 0000:6b:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
[90552.833407] amdgpu 0000:6b:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered
[90721.648522] amdgpu 0000:6b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:1 pasid:32790)
[90721.648529] amdgpu 0000:6b:00.0: amdgpu:  in process chromium pid 52522 thread chromium:cs0 pid 52545
[90721.648531] amdgpu 0000:6b:00.0: amdgpu:   in page starting at address 0x0000a0a13d989000 from client 0x1b (UTCL2)
[90721.648534] amdgpu 0000:6b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00101430
[90721.648536] amdgpu 0000:6b:00.0: amdgpu: 	 Faulty UTCL2 client ID: SQC (data) (0xa)
[90721.648538] amdgpu 0000:6b:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[90721.648539] amdgpu 0000:6b:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[90721.648541] amdgpu 0000:6b:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[90721.648542] amdgpu 0000:6b:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[90721.648543] amdgpu 0000:6b:00.0: amdgpu: 	 RW: 0x0
[90732.028240] amdgpu 0000:6b:00.0: amdgpu: Dumping IP State
[90732.029317] amdgpu 0000:6b:00.0: amdgpu: Dumping IP State Completed
[90732.029327] amdgpu 0000:6b:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
[90732.029328] amdgpu 0000:6b:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
[90732.039334] amdgpu 0000:6b:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=148537, emitted seq=148538
[90732.039337] amdgpu 0000:6b:00.0: amdgpu: Process information: process gnome-shell pid 4321 thread gnome-shel:cs0 pid 4355
[90732.039339] amdgpu 0000:6b:00.0: amdgpu: Starting gfx_0.1.0 ring reset
[90732.183374] amdgpu 0000:6b:00.0: amdgpu: Ring gfx_0.1.0 reset failed
[90732.183379] amdgpu 0000:6b:00.0: amdgpu: GPU reset begin!
[90732.242579] amdgpu 0000:6b:00.0: amdgpu: MODE2 reset
[90732.249551] amdgpu 0000:6b:00.0: amdgpu: GPU reset succeeded, trying to resume

This is on ArchLinux kernel 6.16.7. It seems that 6.16.5 had some amdgpu related patches merged. I am downgrading to 6.16.4 now to see if the problems go away.

1 Like

The browser usually runs in a user session, did you start journalctl with the --user option to actually get session logs instead of system ones ?

Didn’t know that. Although I confirmed this precise entry in the journal a few more times after this post was created.

I can guess that --user will give a more precise picture of what is happening with Edge, but I generally see it as an app, that manages to crash GNOME user session, and not the app being faulty.

But, on the other hand, I narrowed down the problem to MS Edge by trying/switching to Chromium. Have been running it for close to a week now, and zero problems.

Hm. Mine is the other way around. MS Edge (dev) is crashing the session, and after moving to Chromium (praying that Teams would work normally there), I had zero problems.

Well, the --user option gives you a session-only log for the logged in user, stuff in this log usually does not show up in the system log at all and should also cover gnome-session/gnome-shell itself (beyond the apps)…

1 Like

I also got crashes on Google Meet / Chrome.

[270379.486354] amdgpu 0000:c4:00.0: [drm] *ERROR* [CRTC:83:crtc-1] flip_done timed out
[270380.510316] [drm:do_aquire_global_lock.isra.0 [amdgpu]] *ERROR* [CRTC:83:crtc-1] hw_done or flip_done timed out
[270390.750358] amdgpu 0000:c4:00.0: [drm] *ERROR* flip_done timed out
[270390.750369] amdgpu 0000:c4:00.0: [drm] *ERROR* [CRTC:83:crtc-1] commit wait timed out
[270400.990223] amdgpu 0000:c4:00.0: [drm] *ERROR* flip_done timed out
[270400.990237] amdgpu 0000:c4:00.0: [drm] *ERROR* [PLANE:52:plane-2] commit wait timed out
[270401.392996] ------------[ cut here ]------------
[270401.393002] WARNING: CPU: 3 PID: 1381 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9447 amdgpu_dm_commit_planes+0x11ca/0x1740 [amdgpu]
[270401.393387] Modules linked in: ath11k_pci ath11k typec_displayport snd_usb_audio snd_usbmidi_lib snd_ump usbhid tls xt_conntrack xt_MASQUERADE bridge stp llc xt_set ip_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables xfrm_user xfrm_algo ccm michael_mic rfcomm snd_seq_dummy snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 qrtr_mhi snd_soc_dmic snd_soc_ps_mach snd_ps_pdm_dma snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh amdgpu snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_soc_acpi_amd_match snd_amd_sdw_acpi soundwire_amd soundwire_generic_allocation soundwire_bus intel_rapl_msr amd_atl snd_soc_sdca intel_rapl_common snd_soc_core qrtr snd_hda_codec_realtek snd_compress amdxcp snd_hda_codec_generic ac97_bus drm_panel_backlight_quirks snd_hda_scodec_component snd_pcm_dmaengine
[270401.393462]  drm_buddy snd_hda_codec_hdmi edac_mce_amd uvcvideo qmi_helpers snd_rpl_pci_acp6x drm_ttm_helper videobuf2_vmalloc snd_hda_intel btusb snd_acp_pci ttm kvm_amd mac80211 uvc snd_intel_dspcfg btrtl snd_acp_legacy_common drm_exec videobuf2_memops snd_pci_acp6x snd_intel_sdw_acpi drm_suballoc_helper spd5118 btintel kvm videobuf2_v4l2 cfg80211 snd_pci_acp5x snd_hda_codec drm_display_helper snd_hda_core btbcm videobuf2_common btmtk snd_rn_pci_acp3x irqbypass snd_ctl_led snd_hwdep amd_pmf think_lmi videodev snd_acp_config libarc4 rapl cec amdxdna rc_core bluetooth amdtee snd_pcm mc firmware_attributes_class snd_soc_acpi thinkpad_acpi i2c_algo_bit mhi gpu_sched k10temp wmi_bmof i2c_piix4 snd_timer ccp i2c_smbus snd_pci_acp3x nvram nxp_nci_i2c nxp_nci amd_sfh nci tee input_leds nfc amd_pmc joydev mac_hid serio_raw sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 dm_crypt hid_multitouch snd polyval_clmulni hid_generic nvme soundcore ucsi_acpi polyval_generic
[270401.393566]  ghash_clmulni_intel nvme_core r8169 i2c_hid_acpi typec_ucsi sha256_ssse3 video sparse_keymap sha1_ssse3 i2c_hid psmouse thunderbolt nvme_auth realtek typec platform_profile wmi hid aesni_intel crypto_simd cryptd [last unloaded: ath11k]
[270401.393592] CPU: 3 UID: 0 PID: 1381 Comm: systemd-logind Not tainted 6.14.0-1011-oem #11-Ubuntu
[270401.393596] Hardware name: LENOVO 21MECTO1WW/21MECTO1WW, BIOS R2LET33W (1.14 ) 04/18/2025
[270401.393598] RIP: 0010:amdgpu_dm_commit_planes+0x11ca/0x1740 [amdgpu]
[270401.393896] Code: 9d fa ff ff 31 c9 48 85 d2 0f 85 c3 fe ff ff 44 88 75 80 e9 bf f9 ff ff 31 c9 48 85 d2 0f 85 1d f9 ff ff e9 7b f9 ff ff 0f 0b <0f> 0b e9 e6 fe ff ff 0f 0b e9 fe fe ff ff 48 8b 45 88 be 01 00 00
[270401.393899] RSP: 0018:ffffb578817f7278 EFLAGS: 00010002
[270401.393901] RAX: 0000000000000246 RBX: 0000000000000246 RCX: 0000000000000000
[270401.393903] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[270401.393904] RBP: ffffb578817f7370 R08: 0000000000000000 R09: 0000000000000002
[270401.393905] R10: 0000000000000000 R11: 0000000000000000 R12: ffff93a90aec4a88
[270401.393906] R13: 0000000000000000 R14: ffff93a905cb6000 R15: ffff93a8ca7efe00
[270401.393908] FS:  000079e93b8284c0(0000) GS:ffff93b721b80000(0000) knlGS:0000000000000000
[270401.393909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[270401.393911] CR2: 00007077cbf63a58 CR3: 00000001384c1000 CR4: 0000000000f50ef0
[270401.393913] PKRU: 55555554
[270401.393914] Call Trace:
[270401.393916]  <TASK>
[270401.393926]  amdgpu_dm_atomic_commit_tail+0xab9/0x12c0 [amdgpu]
[270401.394210]  commit_tail+0xca/0x1b0
[270401.394218]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394225]  drm_atomic_helper_commit+0x132/0x160
[270401.394228]  drm_atomic_commit+0xad/0xf0
[270401.394233]  ? __pfx___drm_printfn_info+0x10/0x10
[270401.394239]  drm_client_modeset_commit_atomic+0x200/0x240
[270401.394246]  drm_client_modeset_commit_locked+0x5b/0x170
[270401.394250]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x86/0x100
[270401.394256]  drm_fb_helper_set_par+0x2f/0x50
[270401.394258]  fb_set_var+0x249/0x4a0
[270401.394265]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394267]  ? xas_load+0x17/0x100
[270401.394276]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394281]  fbcon_blank+0x28e/0x350
[270401.394289]  do_unblank_screen+0xc5/0x1d0
[270401.394296]  vt_k_ioctl+0x4fb/0x590
[270401.394300]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394302]  ? security_capable+0x44/0x80
[270401.394308]  vt_ioctl+0x77/0xa10
[270401.394311]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394312]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394314]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[270401.394320]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394322]  ? syscall_exit_to_user_mode+0x38/0x1d0
[270401.394328]  tty_ioctl+0x4d6/0x910
[270401.394332]  ? __seccomp_filter+0x368/0x570
[270401.394337]  ? __pfx_i_callback+0x10/0x10
[270401.394344]  __x64_sys_ioctl+0xa4/0xe0
[270401.394349]  x64_sys_call+0x131e/0x2650
[270401.394355]  do_syscall_64+0x7e/0x170
[270401.394359]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394361]  ? call_rcu+0x34/0x50
[270401.394365]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394367]  ? mntput_no_expire+0x51/0x270
[270401.394372]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394374]  ? putname+0x60/0x80
[270401.394378]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394380]  ? do_renameat2+0x1b7/0x670
[270401.394387]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394389]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[270401.394391]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394393]  ? syscall_exit_to_user_mode+0x38/0x1d0
[270401.394395]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394397]  ? do_syscall_64+0x8a/0x170
[270401.394399]  ? syscall_exit_to_user_mode+0x38/0x1d0
[270401.394401]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394403]  ? do_syscall_64+0x8a/0x170
[270401.394405]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394406]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[270401.394409]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394410]  ? syscall_exit_to_user_mode+0x38/0x1d0
[270401.394413]  ? __pfx_flush_tlb_func+0x10/0x10
[270401.394418]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394420]  ? __flush_smp_call_function_queue+0x99/0x430
[270401.394425]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394427]  ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[270401.394429]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394430]  ? irqentry_exit_to_user_mode+0x2d/0x1d0
[270401.394433]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394435]  ? irqentry_exit+0x43/0x50
[270401.394437]  ? srso_alias_return_thunk+0x5/0xfbef5
[270401.394438]  ? sysvec_call_function+0x57/0xc0
[270401.394441]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[270401.394445] RIP: 0033:0x79e93b724ded
[270401.394449] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[270401.394450] RSP: 002b:00007ffd888940c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[270401.394453] RAX: ffffffffffffffda RBX: 000000000000001c RCX: 000079e93b724ded
[270401.394454] RDX: 0000000000000000 RSI: 0000000000004b3a RDI: 000000000000001c
[270401.394455] RBP: 00007ffd88894110 R08: 00007ffd888940c0 R09: 0000000000000007
[270401.394456] R10: 000056bba4fbee70 R11: 0000000000000246 R12: 000056bba4f7c500
[270401.394457] R13: 00007ffd88894210 R14: 000056bba4f7c500 R15: 0000000000000000
[270401.394462]  </TASK>
[270401.394463] ---[ end trace 0000000000000000 ]---

Did a reboot after today’s updates, and for the first time Chrome → MS Teams cursor hanged on a meeting. Although it didn’t crash the session, but made me question the whole thing of installing updates.

Have opened the journalctl --user -f. Will be monitoring.

This topic was automatically closed after 30 days. New replies are no longer allowed.