Intel iGPU Using Both i915 and xe Kernel Modules, Is This an Issue? How to Resolve?

Ubuntu Version: 24.04.3 LTS

Kernel: 6.17.0-14-generic

Problem description: I had been running into a hanging issue with the iGPU and thought I’d fixed it through the steps noted below, plus a BIOS update. The box ran fine for…over a month. I did a reboot of my server yesterday and I had another iGPU hang today, so something seems to have reverted. I checked the iGPU’s info, below, and noticed that while it’s still being forced to use the xe driver, it shows as having both the i915 and xe kernel modules. Is this potentially part of my issue with it hanging? How do I remove the i915 kernel module?

GT1’s information:

user@server:/$ lspci -k
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 \[UHD Graphics 770\] (rev 04)
DeviceName: Onboard - Video
Subsystem: Gigabyte Technology Co., Ltd Raptor Lake-S GT1 \[UHD Graphics 770\]
Kernel driver in use: xe
Kernel modules: i915, xe

Original GT1 (UHD-770) hanging issue:

Motherboard: Gigabyte Technology Co., Ltd. B760M D3H

CPU: Intel(R) Core™ i5-13600K

BIOS version: F10 (latest available)

syslog snippet (note it says ‘xe’ in relation to GT1):

2026-01-01T08:39:43.970793-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: TLB invalidation fence timeout, seqno=57583 recv=57582
2026-01-01T08:39:47.142869-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
2026-01-01T08:39:47.142880-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0

GPU and iGPU info (before forcing xe on the iGPU):

user@server:/$ lspci -k | grep -EA3 ‘VGA|3D|Display’
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 \[UHD Graphics 770\] (rev 04)
DeviceName: Onboard - Video
Subsystem: Gigabyte Technology Co., Ltd Raptor Lake-S GT1 \[UHD Graphics 770\]
Kernel driver in use: i915

03:00.0 VGA compatible controller: Intel Corporation Device e20b
Subsystem: ASRock Incorporation Device 6021
Kernel driver in use: xe
Kernel modules: xe

Forcing iGPU to use Xe:

GRUB_CMDLINE_LINUX_DEFAULT=“i915.force_probe=!a780 xe.force_probe=a780”

I am just an old nosey by-stander who keeps his brain working by asking questions.

Where did you get the Xe driver from? I have Intel UHD CML GT2 graphics. It uses the i915 driver that is built into the Linux kernel. That could be why you are show the i915 driver module as present.

How do you get rid of it? Delete the Linux kernel. The i915 driver is built into the kernel. It does not need to be installed. It cannot be removed.

This is what google AI says about the Intel UHD 770

For Intel UHD Graphics 770 (Alder Lake) on Linux, you should use the open-source

Mesa drivers (specifically the iris Gallium3D driver) combined with a recent Linux kernel (5.16 or newer recommended). The driver is built into the kernel and Mesa, so no proprietary installation is needed, but ensure your system is updated to get proper support for the 12th-gen hardware

google AI also says this:

  1. Vulkan: Install vulkan-intel or the equivalent package in your distribution (e.g., mesa-vulkan-drivers).
  2. Avoid: Do not install xserver-xorg-video-intel (the legacy intel DDX), as it can cause performance issues or freezes on modern hardware.

You are not using the X-Server are you? Do you have mesa-vulcan-drivers installed? I see nothing about Intel Xe drivers. How? Why? Xe?

Regards

I have the same, but it works, so not changing anything.
> Kernel driver in use: i915

    Kernel modules: i915, xe

Initially the Xe driver was not complete. and now there are newer Xe2 for newer Intel video cards/chips. You may be able to force change with boot parameters unique to your [PC=ID]. Probably only worthwhile for Intel Arc or very new chips.
https://www.kernel.org/doc/html/v6.8/gpu/rfc/xe.html

to see pc-id like 9a49

fred@dell5310:~$ lspci -nn|grep VGA
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] [8086:9a49] (rev 03)

I’ve no issues with more questions!

I installed the xe driver using the below links:

GPU driver install link: https://dgpu-docs.intel.com/driver/client/overview.html

HWE kernel installation link: https://canonical-kernel-docs.readthedocs-hosted.com/latest/reference/hwe-kernels/#installing-a-hwe-kernel

Before installing the Battlemage Intel GPU, I was just using the Linux kernel’s i915 driver for the iGPU. Installing the standalone GPU and associated xe driver is what has created the issue. My assumption is xe is trying to communicate with the iGPU and hanging because the iGPU retains some configuration from the i915 driver.

Maybe there’s a second path - removing the xe module and driver from the iGPU and reverting it back to i915? But in that case I need to figure out how to get the xe module to stop trying to communicate with the iGPU. The xe module/driver started doing this all on its own (recall the iGPU was still using the i915 driver when xe originally started hanging on it).

The xe driver is in the kernel since a very long time (look for xe.ko on your disk) … is there any advantage using it from an external place instead ? (with all the drawbacks that you have to care for security maintenance yourself, manage the integration and conflicts with the integrated one etc)

Ubuntu LTS didn’t recognize the Intel B580 GPU on bootup, so that’s why I went down the route of installing the driver manually.

Are you suggesting I remove the manually-installed drivers from Intel and then reboot to try to get the Linux kernel to recognize the GPU? (And remove the grub line forcing the iGPU to use the xe driver?)

Edit: I’m wondering if, since I’m running a server install, I needed to migrate to the HWE kernel and then reboot? Instead what I actually did was I migrated to the HWE kernel and then installed the Intel drivers.

Edit 2: Why did I migrate to the HWE kernel? Here’s the note from Intel’s driver site (the B580 is a Battlemage architecture) and Canonical:

Ubuntu 25.10 provides native support for Lunar Lake, Battlemage, and Panther Lake. To support these GPUs on Ubuntu 24.04, your system must be running the hardware enablement (HWE) kernel. By default, Ubuntu Desktop 24.04 tracks the HWE stack. However, if your system is instead using the general availability (GA) kernel, you must switch to the HWE kernel before proceeding with the Client GPU installation.

By default, Ubuntu Desktop installations of 24.04 default to tracking the HWE stack. Server installations will default to the general availability (GA) kernel and provide the HWE kernel as an option.

Was that with kernel 6.17 already installed ?

Well, if the xe.ko module included in 6.17 does not yet support your card because it is to new, stay with what works for the moment, but with (or shortly after) the 26.04 release there should be a linux 7.0 HWE kernel for 24.04 available that should have the latest xe driver included so you do not need to maintain the external driver yourself anymore …

I added a couple edits in the comment you replied to. Could you review those to see if that maybe answers why the GPU wasn’t detected on boot? I’d just like to understand root cause before diving back in and I’m thinking that might be it.

Was that with kernel 6.17 already installed ?

No, the kernel on original install was 6.14.

Right the original kernel 24.04 server ships with is 6.14, your card is likely newer than that and would not be supported by the included xe driver there, this is why there is the suggestion to update to the HWE kernel (6.17) …

But did you test with 6.17 without installing the external driver first ?

The paste you provided above looks like it should just be supported out of the box after updating to that kernel version … If that isn’t/wasn’t the case then using the external driver might indeed make sense (at least until 7.0 HWE becomes available) …

No, the Intel driver has been installed the whole time. I’ve never had 6.17 without the external driver. So I think next steps are:

  1. Remove everything that was installed from the guide found in the GPU driver install link (several comments above)
  2. Remove the grub line forcing the iGPU to use the xe driver
    1. GRUB_CMDLINE_LINUX_DEFAULT=“i915.force_probe=!a780 xe.force_probe=a780”
  3. Reboot

Does this sound right? For the uninstalls, do you agree with the below method? Or suggest something else?

Install commands (from the link):

sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
sudo apt-get install -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo

Uninstall commands (my best guess):

sudo apt-get remove -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
sudo apt-get remove -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo

I’d use sudo apt purge … instead of remove (just to be sure all configs of the packages you remove get purged alongside), but yeah, this is essentially what I would do to test it … If it does not work you can indeed re-install them again until the kernel supports it on it’s own …

1 Like

Ok, so here’s what I’ll run:

sudo apt purge -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc
sudo apt purge -y intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo

And for the grub changes, in /etc/default/grub I’ll edit:

GRUB_CMDLINE_LINUX_DEFAULT="initcall_blacklist=simpledrm_platform_driver_init i915.force_probe=!a780 xe.force_probe=a780"
GRUB_CMDLINE_LINUX_DEFAULT=""

FYI it’ll be a couple hours before I can do this. I’ll reply back here if it goes south or if it all just works, either way. Thank you so much for taking the time to get me to this point!

1 Like

I went through the steps noted above and rebooted. Both the iGPU and GPU are recognized and I confirmed grub no longer has the additional cmdline inputs. Here are some lspci outputs, note that the iGPU (GT1) now shows the i915 driver being used (as expected) and still shows both the i915 and xe modules as available. I’m assuming that’s fine?

user@server:~$ lspci -k | grep -EA3 'VGA|3D|Display'
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
        DeviceName: Onboard - Video
        Subsystem: Gigabyte Technology Co., Ltd Raptor Lake-S GT1 [UHD Graphics 770]
        Kernel driver in use: i915
--
03:00.0 VGA compatible controller: Intel Corporation Device e20b
        Subsystem: ASRock Incorporation Device 6021
        Kernel driver in use: xe
        Kernel modules: xe
user@server:~$ lspci -k
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
        DeviceName: Onboard - Video
        Subsystem: Gigabyte Technology Co., Ltd Raptor Lake-S GT1 [UHD Graphics 770]
        Kernel driver in use: i915
        Kernel modules: i915, xe
03:00.0 VGA compatible controller: Intel Corporation Device e20b
        Subsystem: ASRock Incorporation Device 6021
        Kernel driver in use: xe
        Kernel modules: xe

The iGPU hangs can take a while to manifest, so I won’t know it’s fixed for a week or more. But for now it looks like everything checks out. Definitely let me know if there’s something else you’d like to see. Otherwise, I’ll check back in after a week or so.

Again, thank you for taking the time to walk through this with me.

1 Like

The iGPU hung again after removing the external firmware and relying on the Linux kernel. Here’s a bulleted list of what I’ve included in this post, please let me know what else I can grab to help with this.

Edit: I ran sudo apt upgrade linux-firmware but after a reboot syslog is still showing the message [drm] GuC firmware (70.45.2) is recommended, but only (70.44.1) was found in xe/bmg_guc_70.bin

Post contents:

  • syslog where the hang starts
  • syslog with the repeating messages until the CPU cores all hang
  • syslog showing GT1 and xe at boot
  • Generic info: lspci -k; uname -r; lsb_release -a

Edit: Another section of the syslog:

2026-02-28T18:07:15.917269-06:00 server kernel: ---[ end trace 0000000000000000 ]---
2026-02-28T18:07:15.917270-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: reset failed (-ETIMEDOUT)
2026-02-28T18:07:15.917270-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
2026-02-28T18:07:15.917270-06:00 server kernel: IOCTLs and executions are blocked. Only a rebind may clear the failure
2026-02-28T18:07:15.917271-06:00 server kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
2026-02-28T18:07:15.917271-06:00 server kernel: xe 0000:03:00.0: [drm] device wedged, needs recovery

Where the hang starts:

2026-02-25T06:58:12.821975-06:00 server kernel: Lockdown: mdadm: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
2026-02-25T07:02:32.830014-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: TLB invalidation fence timeout, seqno=883778 recv=883777
2026-02-25T07:02:35.134044-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: TLB invalidation fence timeout, seqno=883779 recv=883777
2026-02-25T07:02:36.002005-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: Force wake domain 5 failed to ack wake (-ETIMEDOUT) reg[0xd58] = 0x0
2026-02-25T07:02:36.002019-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: Force wake domain 12 failed to ack wake (-ETIMEDOUT) reg[0xd74] = 0x0
2026-02-25T07:02:36.002027-06:00 server kernel: ------------[ cut here ]------------
2026-02-25T07:02:36.002029-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: Forcewake domains 0x1020 failed to acknowledge awake request
2026-02-25T07:02:36.002947-06:00 server kernel: WARNING: CPU: 15 PID: 1266353 at drivers/gpu/drm/xe/xe_force_wake.c:205 xe_force_wake_get+0x2e5/0x310 [xe]

The messages the logs repeat until all the CPU cores hang:

2026-02-28T03:21:38.985184-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
2026-02-28T03:21:39.002166-06:00 server kernel: message repeated 349 times: [ xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]]
2026-02-28T03:21:39.002167-06:00 server kernel: systemd[1]: Starting systemd-journald.service - Journal Service...
2026-02-28T03:21:39.002172-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]
2026-02-28T03:21:39.008167-06:00 server kernel: message repeated 126 times: [ xe 0000:03:00.0: [drm] GT1: trying reset from guc_exec_queue_timedout_job [xe]]
2026-02-28T03:21:39.008168-06:00 server kernel: systemd-journald[1331188]: Collecting audit messages is disabled.

GT1 and xe at boot:

2026-02-28T06:37:59.726877-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: Using GuC firmware from xe/bmg_guc_70.bin version 70.44.1
2026-02-28T06:37:59.726870-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726878-06:00 server kernel: xe 0000:03:00.0: [drm] GuC firmware (70.45.2) is recommended, but only (70.44.1) was found in xe/bmg_guc_70.bin
2026-02-28T06:37:59.726878-06:00 server kernel: xe 0000:03:00.0: [drm] Consider updating your linux-firmware pkg or downloading from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
2026-02-28T06:37:59.726880-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726879-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: Using HuC firmware from xe/bmg_huc.bin version 8.2.10
2026-02-28T06:37:59.726884-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs1 fused off
2026-02-28T06:37:59.726884-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs3 fused off
2026-02-28T06:37:59.726885-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs4 fused off
2026-02-28T06:37:59.726885-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs5 fused off
2026-02-28T06:37:59.726884-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726886-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs6 fused off
2026-02-28T06:37:59.726888-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726887-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vcs7 fused off
2026-02-28T06:37:59.726891-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vecs2 fused off
2026-02-28T06:37:59.726892-06:00 server kernel: xe 0000:03:00.0: [drm] GT1: vecs3 fused off
2026-02-28T06:37:59.726891-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726896-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726892-06:00 server kernel: xe 0000:03:00.0: [drm] Registered 4 planes with drm panic
2026-02-28T06:37:59.726899-06:00 server kernel: [drm] Initialized xe 1.1.0 for 0000:03:00.0 on minor 1
2026-02-28T06:37:59.726900-06:00 server kernel: xe 0000:03:00.0: [drm] Cannot find any crtc or sizes
2026-02-28T06:37:59.726900-06:00 server kernel: xe 0000:03:00.0: [drm] Using mailbox commands for power limits
2026-02-28T06:37:59.726907-06:00 server kernel: xe 0000:03:00.0: [drm] PL2 is supported on channel 0
2026-02-28T06:37:59.726900-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726908-06:00 server kernel: Creating 4 MTD partitions on "xe.nvm.768":
2026-02-28T06:37:59.726909-06:00 server kernel: 0x000000000000-0x000000001000 : "xe.nvm.768.DESCRIPTOR"
2026-02-28T06:37:59.726909-06:00 server kernel: 0x000000001000-0x00000054e000 : "xe.nvm.768.GSC"
2026-02-28T06:37:59.726910-06:00 server kernel: 0x00000054e000-0x00000074e000 : "xe.nvm.768.OptionROM"
2026-02-28T06:37:59.726910-06:00 server sbkeysync[1178]:     alphanumeric
2026-02-28T06:37:59.726910-06:00 server kernel: 0x00000074e000-0x00000075e000 : "xe.nvm.768.DAM"
2026-02-28T06:37:59.726914-06:00 server kernel: xe 0000:03:00.0: [drm] Cannot find any crtc or sizes
2026-02-28T06:37:59.726915-06:00 server kernel: snd_hda_intel 0000:04:00.0: bound 0000:03:00.0 (ops intel_audio_component_bind_ops [xe])

The iGPU and GPU devices after boot:

user@server:/var/log$ lspci -k
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
        DeviceName: Onboard - Video
        Subsystem: Gigabyte Technology Co., Ltd Raptor Lake-S GT1 [UHD Graphics 770]
        Kernel driver in use: i915
        Kernel modules: i915, xe
03:00.0 VGA compatible controller: Intel Corporation Device e20b
        Subsystem: ASRock Incorporation Device 6021
        Kernel driver in use: xe
        Kernel modules: xe

Generic info:

user@server:/var/log$ uname -r
6.17.0-14-generic
user@server:/var/log$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.3 LTS
Release:        24.04
Codename:       noble

While I was putting together the above reply, I noticed that the hang in syslog is referring to the device as GT1 (the iGPU), but using the “03.00.0” from the GPU. Is xe confusing the iGPU and GPU with each other?

Edit: Am I getting to the point of nuking it and starting over? Is there any way to nuke it without completely nuking it (ie: leave my nfs share links, lvm’s, etc.)?

Edit 2: I found another section in the syslog that points to re-flashing the firmware on the GPU (not the iGPU). I added it to the reply above, plus in this reply below. Do you have a walkthrough you can point me to for re-flashing the GPU’s firmware?

Another section of the syslog:

2026-02-28T18:07:15.917269-06:00 server kernel: ---[ end trace 0000000000000000 ]---
2026-02-28T18:07:15.917270-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* GT1: reset failed (-ETIMEDOUT)
2026-02-28T18:07:15.917270-06:00 server kernel: xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
2026-02-28T18:07:15.917270-06:00 server kernel: IOCTLs and executions are blocked. Only a rebind may clear the failure
2026-02-28T18:07:15.917271-06:00 server kernel: Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
2026-02-28T18:07:15.917271-06:00 server kernel: xe 0000:03:00.0: [drm] device wedged, needs recovery

Edit 3: Trying to use fwupd. It sees the GPU, but then says it’s not supported.

user@server:~$ sudo fwupdmgr get-devices
├─Graphics Card:
│     Device ID:          3743975ad7f64f8d6575a9ae49fb3a8856fe186f
│     Summary:            Discrete Graphics Card
│     Current version:    21.1137
│     Vendor:             Intel (MEI:0x8086)
│     GUIDs:              87d90ca5-3495-4559-8105-3fbfa37b8b79
│                         ed2e910e-c152-54ce-9c8e-3a8a93a37900 ← MEI\VEN_8086&DEV_E20B
│                         12a13ad3-3c76-5a50-b9f0-0f5342b6cbea ← MEI\VEN_8086&DEV_E20B&SUBSYS_18496021
│                         f05aae4f-acd9-57d4-8a7a-202c2124f3af ← MEI\VEN_8086&DEV_E20B&PART_FWCODE
│                         7d8bcbcc-d981-5b03-8491-3891f89796cf ← MEI\VEN_8086&DEV_E20B&SUBSYS_18496021&PART_FWCODE
│     Device Flags:       • Internal device
│                         • Updatable
│                         • System requires external power source
│                         • Signed Payload

Trying to flash the GPU:

user@server:~$ sudo fwupdmgr get-updates
Devices with no available firmware updates:
 • Graphics Card
 • SSD 980 PRO 2TB
 • UEFI Device Firmware
 • UEFI Device Firmware
 • UEFI Device Firmware
 • UEFI dbx
No updatable devices
user@server:~$ sudo fwupdmgr refresh
Updating lvfs
Downloading…             [***************************            ]
Successfully downloaded new metadata: 0 local devices supported
user@server:~$ sudo fwupdmgr reinstall 3743975ad7f64f8d6575a9ae49fb3a8856fe186f
No releases found

Hi, I just had another freeze after doing everything in the above two replies. Any chance you can review and see if you have any additional guidance? I appreciate the help.

Edit: Did some more work, detailed below. I mean, is this really what I need to be doing? Why isn’t sudo apt upgrade linux-firmware grabbing this?

I went to https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git per the syslog snippet above, downloaded bmg_guc_70.bin, then moved it to /lib/firmware/xe. I left the existing bmg_guc_70.bin.zst and just renamed it to bmg_guc_70_backup.bin.zst. I then rebooted and now this is what I see:

user@server:/lib/firmware/xe$ sudo dmesg | grep -i guc
[    5.931433] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.36.0
[    5.934881] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[    5.934887] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[    5.935261] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[    6.302945] xe 0000:03:00.0: [drm] GT0: Using GuC firmware from xe/bmg_guc_70.bin version 70.58.0
[    6.413629] xe 0000:03:00.0: [drm] GT1: Using GuC firmware from xe/bmg_guc_70.bin version 70.58.0

We’ll see if this fixes it, but I’m still unclear why my linux-firmware and fwupdmgr don’t seem to be updating the GPU’s firmware.

Can you try the firmware mentioned here and comment on the bug, please? Attach a kernel log showing the error if it still shows.

Make sure to remove all non-standard packages and drivers installed from the internet.

1 Like

Last night I manually installed bmg_guc_70.bin version 70.58.0 and there were no freezes overnight. Do you want me to replace it with the one you linked in your reply?

Juergh, I replied in the bug thread you linked. Here’s the direct link to my comment:

https://bugs.launchpad.net/ubuntu/+source/linux-signed-oem-6.8/+bug/2085434/comments/14

Long story short my bmg_guc_70.bin version 70.58.0 crashed, so I installed your .deb and it immediately crashed again. I attached my kern.log in that bug report comment.

Just to be clear, that bug report calls out the i915 GPU hanging while mine is an xe GPU (Battlemage B580).

Edit: And to your comment about removing all non-standard packages, etc., you can see the commands I ran several comments up. The comment with the sudo apt purge commands.
Edit 2: Duplicating my updated comment in the other thread here. The stability was awful with this setup, so I’ve reverted the iGPU to using xe. It seems more stable, but I doubt it’s 100%. Here’s what I’ve done:

In /etc/default/grub I’ve added:

GRUB_CMDLINE_LINUX_DEFAULT=“initcall_blacklist=simpledrm_platform_driver_init i915.force_probe=!a780 xe.force_probe=a780”

And then ran:

user@server:/var/log$ sudo grub-mkconfig -o /boot/grub/grub.cfg

And here’s the firmware situation. What’s going on with GT0?

user@server:/var/log$ sudo dmesg | grep -i guc
[ 6.343016] xe 0000:00:02.0: [drm] GT0: Using GuC firmware from i915/tgl_guc_70.bin version 70.44.1
[ 6.552491] xe 0000:03:00.0: [drm] GT0: Using GuC firmware from xe/bmg_guc_70.bin version 70.58.0
[ 6.661820] xe 0000:03:00.0: [drm] GT1: Using GuC firmware from xe/bmg_guc_70.bin version 70.58.0

Juergh, with your .deb installed it’s crashing pretty often. Can you please review my logs and let me know additional things to try? And if you don’t have any suggestions right now, can you tell me how to revert your .deb? Things were more stable before I installed it.