The iGPU hung again after removing the external firmware and relying on the Linux kernel. Here’s a bulleted list of what I’ve included in this post, please let me know what else I can grab to help with this.
Edit: I ran sudo apt upgrade linux-firmware but after a reboot syslog is still showing the message [drm] GuC firmware (70.45.2) is recommended, but only (70.44.1) was found in xe/bmg_guc_70.bin
Post contents:
- syslog where the hang starts
- syslog with the repeating messages until the CPU cores all hang
- syslog showing GT1 and xe at boot
- Generic info: lspci -k; uname -r; lsb_release -a
Edit: Another section of the syslog:
2026-02-28T18:07:15.917269-06:00 server kernel: ---[ end trace 0000000000000000 ]---
2026-02-28T18:07:15.917270-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: reset failed (-ETIMEDOUT)
2026-02-28T18:07:15.917270-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
2026-02-28T18:07:15.917270-06:00 server kernel: IOCTLs and executions are blocked. Only a rebind may clear the failure
2026-02-28T18:07:15.917271-06:00 server kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
2026-02-28T18:07:15.917271-06:00 server kernel: xe 0000:03:00.0: [drm] device wedged, needs recovery
Where the hang starts:
2026-02-25T06:58:12.821975-06:00 server kernel: Lockdown: mdadm: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
2026-02-25T07:02:32.830014-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: TLB invalidation fence timeout, seqno=883778 recv=883777
2026-02-25T07:02:35.134044-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: TLB invalidation fence timeout, seqno=883779 recv=883777
2026-02-25T07:02:36.002005-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
2026-02-25T07:02:36.002019-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0
2026-02-25T07:02:36.002027-06:00 server kernel: ------------[ cut here ]------------
2026-02-25T07:02:36.002029-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: Forcewake domains 0x1020 failed to acknowledge awake request
2026-02-25T07:02:36.002947-06:00 server kernel: WARNING: CPU: 15 PID: 1266353 at drivers/gpu/drm/xe/xe_force_wake.c:205 xe_force_wake_get+0x2e5/0x310 [xe]
The messages the logs repeat until all the CPU cores hang:
2026-02-28T03:21:38.985184-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
2026-02-28T03:21:39.002166-06:00 server kernel: message repeated 349 times: [ xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]]
2026-02-28T03:21:39.002167-06:00 server kernel: systemd[1]: Starting systemd-journald.service - Journal Service...
2026-02-28T03:21:39.002172-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
2026-02-28T03:21:39.008167-06:00 server kernel: message repeated 126 times: [ xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]]
2026-02-28T03:21:39.008168-06:00 server kernel: systemd-journald[1331188]: Collecting audit messages is disabled.
GT1 and xe at boot:
2026-02-28T06:37:59.726877-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: Using GuC firmware from xe/bmg_guc_70.bin version 70.44.1
2026-02-28T06:37:59.726870-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726878-06:00 server kernel: xe 0000:03:00.0: [drm] GuC firmware (70.45.2) is recommended, but only (70.44.1) was found in xe/bmg_guc_70.bin
2026-02-28T06:37:59.726878-06:00 server kernel: xe 0000:03:00.0: [drm] Consider updating your linux-firmware pkg or downloading from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
2026-02-28T06:37:59.726880-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726879-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: Using HuC firmware from xe/bmg_huc.bin version 8.2.10
2026-02-28T06:37:59.726884-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs1 fused off
2026-02-28T06:37:59.726884-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs3 fused off
2026-02-28T06:37:59.726885-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs4 fused off
2026-02-28T06:37:59.726885-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs5 fused off
2026-02-28T06:37:59.726884-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726886-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs6 fused off
2026-02-28T06:37:59.726888-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726887-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs7 fused off
2026-02-28T06:37:59.726891-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vecs2 fused off
2026-02-28T06:37:59.726892-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vecs3 fused off
2026-02-28T06:37:59.726891-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726896-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726892-06:00 server kernel: xe 0000:03:00.0: [drm] Registered 4 planes with drm panic
2026-02-28T06:37:59.726899-06:00 server kernel: [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 1
2026-02-28T06:37:59.726900-06:00 server kernel: xe 0000:03:00.0: [drm] Cannot find any crtc or sizes
2026-02-28T06:37:59.726900-06:00 server kernel: xe 0000:03:00.0: [drm] Using mailbox commands for power limits
2026-02-28T06:37:59.726907-06:00 server kernel: xe 0000:03:00.0: [drm] PL2 is supported on channel 0
2026-02-28T06:37:59.726900-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726908-06:00 server kernel: Creating 4 MTD partitions on "xe.nvm.768":
2026-02-28T06:37:59.726909-06:00 server kernel: 0x000000000000-0x000000001000 : "xe.nvm.768.DESCRIPTOR"
2026-02-28T06:37:59.726909-06:00 server kernel: 0x000000001000-0x00000054e000 : "xe.nvm.768.GSC"
2026-02-28T06:37:59.726910-06:00 server kernel: 0x00000054e000-0x00000074e000 : "xe.nvm.768.OptionROM"
2026-02-28T06:37:59.726910-06:00 server sbkeysync[1178]: alphanumeric
2026-02-28T06:37:59.726910-06:00 server kernel: 0x00000074e000-0x00000075e000 : "xe.nvm.768.DAM"
2026-02-28T06:37:59.726914-06:00 server kernel: xe 0000:03:00.0: [drm] Cannot find any crtc or sizes
2026-02-28T06:37:59.726915-06:00 server kernel: snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops intel_audio_component_bind_ops [xe])
The iGPU and GPU devices after boot:
user@server:/var/log$ lspci -k
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
DeviceName: Onboard - Video
Subsystem: Gigabyte Technology Co., Ltd Raptor Lake-S GT1 [UHD Graphics 770]
Kernel driver in use: i915
Kernel modules: i915, xe
03:00.0 VGA compatible controller: Intel Corporation Device e20b
Subsystem: ASRock Incorporation Device 6021
Kernel driver in use: xe
Kernel modules: xe
Generic info:
user@server:/var/log$ uname -r
6.17.0-14-generic
user@server:/var/log$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble