Feb 2024 Update: NVIDIA Grace MGX + Canonical Support

Continuing the discussion from NVIDIA Grace MGX + Canonical Support:

As of February 2024, the following recommendations for OS/kernel to use for both pre-production and production samples of Grace MGX systems, including Grace-Grace and Grace-Hopper based systems.

Recommendations for Grace MGX Systems

Timing Partners & Early MGX Customers General MGX Consumers
Today Ubuntu 22.04.4 LTS, select the “HWE kernel” installation option and then migrate to the linux-nvidia HWE1 kernel. [see instructions below] N/A
Future (April/May 2024) Install Ubuntu 24.04 LTS3 Ubuntu Server 24.04 LTS certified hardware once your server is publicly listed on ubuntu.com/certified2

Instructions:
Migrating to a linux-nvidia HWE kernel:

  1. Install Ubuntu 22.04 with the HWE kernel option
  2. After install and reboot, install the linux-nvidia-hwe or linux-nvidia-64k-hwe3 kernel:
$ sudo apt update
$ sudo apt install linux-nvidia-hwe-22.04

– or –

$ sudo apt install linux-nvidia-64k-hwe-22.04

  1. Reboot:
    $ sudo reboot

Footnotes:
[1] Today the HWE version is 6.5, and in August 2024 it will transition to the kernel version underpinning 24.04.
[2] Depending on server certification and partner program, see below for more information.
[3] The arm64+largemem installer (64k kernel) is recommended for performance when running GPU workloads

About Ubuntu Server Certification Programs:

  • Learn more here.

  • Make your hardware stand out. Talk to us about certifying your products. Learn more about becoming certified.

For those seeking more technical details…

NVIDIA has posted a list of kernel patches and configs for Grace systems here:
https://docs.nvidia.com/grace-patch-config-guide.pdf

As of linux-nvidia 6.5.0-1013 and linux (generic) 6.5.0-21.21~22.04.1, all of the listed patches[1] are available in both kernel flavors except "tpm_tis-spi: Add hardware wait polling", which is currently only in the -nvidia tree. Ubuntu 24.04 LTS will be based on Linux v6.8, and all kernel flavors will include all of the currently listed patches[1].

Of the listed kernel config settings, here are the relevant differences:

Kernel Config NVIDIA-Recommended 6.5 -generic 6.5 -nvidia
CONFIG_NR_CPUS 512 256[2] 512
CONFIG_SPI_TEGRA210_QUAD module module built-in
CONFIG_TCG_TIS_SPI module module built-in
CONFIG_CPU_FREQ_DEFAULT_GOV PERFORMANCE ONDEMAND PERFORMANCE
CONFIG_PREEMPT_NONE Yes No Yes

All other recommended kernel configs are the same in the -generic and -nvidia kernel flavors.

In general, the -nvidia optimized kernel moves faster to incorporate Grace support patches. In addition, the -nvidia kernel may also include other out of tree features from NVIDIA, such as support for NVIDIA GPUDirect Storage in Secure Boot.

[1] As of the December 18, 2023 version “05” of NVIDIA’s Grace Patch Config Guide
[2] Ubuntu 24.04 will bump this to 512 for -generic kernels as well.

Hi all,

This is still the recommended OS/kernel for Grace systems. Work is ongoing to deliver the best experience for Grace users on 24.04 LTS and you can expect another blog post within the coming months. For now it is important to note this behaviour Grace users are seeing on the Ubuntu 24.04 6.8 kernel. The relevant fix is described in the launchpad link, and will be included in future releases.

Thanks,
Ahmed

Click here for our June 2024 update.