Feb 2024 Update: NVIDIA Grace MGX + Canonical Support

Continuing the discussion from NVIDIA Grace MGX + Canonical Support:

As of February 2024, the following recommendations for OS/kernel to use for both pre-production and production samples of Grace MGX systems, including Grace-Grace and Grace-Hopper based systems.

Recommendations for Grace MGX Systems

Timing Partners & Early MGX Customers General MGX Consumers
Today Ubuntu 22.04.4 LTS, select the “HWE kernel” installation option and then migrate to the linux-nvidia HWE1 kernel. [see instructions below] N/A
Future (April/May 2024) Install Ubuntu 24.04 LTS3 Ubuntu Server 24.04 LTS certified hardware once your server is publicly listed on ubuntu.com/certified2

Instructions:
Migrating to a linux-nvidia HWE kernel:

  1. Install Ubuntu 22.04 with the HWE kernel option
  2. After install and reboot, install the linux-nvidia-hwe or linux-nvidia-64k-hwe3 kernel:
$ sudo apt update
$ sudo apt install linux-nvidia-hwe-22.04

– or –

$ sudo apt install linux-nvidia-64k-hwe-22.04

  1. Reboot:
    $ sudo reboot

Footnotes:
[1] Today the HWE version is 6.5, and in August 2024 it will transition to the kernel version underpinning 24.04.
[2] Depending on server certification and partner program, see below for more information.
[3] The arm64+largemem installer (64k kernel) is recommended for performance when running GPU workloads

About Ubuntu Server Certification Programs:

  • Learn more here.

  • Make your hardware stand out. Talk to us about certifying your products. Learn more about becoming certified.

For those seeking more technical details…

NVIDIA has posted a list of kernel patches and configs for Grace systems here:
https://docs.nvidia.com/grace-patch-config-guide.pdf

As of linux-nvidia 6.5.0-1013 and linux (generic) 6.5.0-21.21~22.04.1, all of the listed patches[1] are available in both kernel flavors except "tpm_tis-spi: Add hardware wait polling", which is currently only in the -nvidia tree. Ubuntu 24.04 LTS will be based on Linux v6.8, and all kernel flavors will include all of the currently listed patches[1].

Of the listed kernel config settings, here are the relevant differences:

Kernel Config NVIDIA-Recommended 6.5 -generic 6.5 -nvidia
CONFIG_NR_CPUS 512 256[2] 512
CONFIG_SPI_TEGRA210_QUAD module module built-in
CONFIG_TCG_TIS_SPI module module built-in
CONFIG_CPU_FREQ_DEFAULT_GOV PERFORMANCE ONDEMAND PERFORMANCE
CONFIG_PREEMPT_NONE Yes No Yes

All other recommended kernel configs are the same in the -generic and -nvidia kernel flavors.

In general, the -nvidia optimized kernel moves faster to incorporate Grace support patches. In addition, the -nvidia kernel may also include other out of tree features from NVIDIA, such as support for NVIDIA GPUDirect Storage in Secure Boot.

[1] As of the December 18, 2023 version “05” of NVIDIA’s Grace Patch Config Guide
[2] Ubuntu 24.04 will bump this to 512 for -generic kernels as well.