As of February 2024, the following recommendations for OS/kernel to use for both pre-production and production samples of Grace MGX systems, including Grace-Grace and Grace-Hopper based systems.
Recommendations for Grace MGX Systems
Timing
Partners & Early MGX Customers
General MGX Consumers
Today
Ubuntu 22.04.4 LTS, select the “HWE kernel” installation option and then migrate to the linux-nvidia HWE1 kernel. [see instructions below]
N/A
Future (April/May 2024)
Install Ubuntu 24.04 LTS3
Ubuntu Server 24.04 LTS certified hardware once your server is publicly listed on ubuntu.com/certified2
Instructions:
Migrating to a linux-nvidia HWE kernel:
Install Ubuntu 22.04 with the HWE kernel option
After install and reboot, install the linux-nvidia-hwe or linux-nvidia-64k-hwe3 kernel:
Footnotes:
[1] Today the HWE version is 6.5, and in August 2024 it will transition to the kernel version underpinning 24.04.
[2] Depending on server certification and partner program, see below for more information.
[3] The arm64+largemem installer (64k kernel) is recommended for performance when running GPU workloads
As of linux-nvidia 6.5.0-1013 and linux (generic) 6.5.0-21.21~22.04.1, all of the listed patches[1] are available in both kernel flavors except "tpm_tis-spi: Add hardware wait polling", which is currently only in the -nvidia tree. Ubuntu 24.04 LTS will be based on Linux v6.8, and all kernel flavors will include all of the currently listed patches[1].
Of the listed kernel config settings, here are the relevant differences:
Kernel Config
NVIDIA-Recommended
6.5 -generic
6.5 -nvidia
CONFIG_NR_CPUS
512
256[2]
512
CONFIG_SPI_TEGRA210_QUAD
module
module
built-in
CONFIG_TCG_TIS_SPI
module
module
built-in
CONFIG_CPU_FREQ_DEFAULT_GOV
PERFORMANCE
ONDEMAND
PERFORMANCE
CONFIG_PREEMPT_NONE
Yes
No
Yes
All other recommended kernel configs are the same in the -generic and -nvidia kernel flavors.
In general, the -nvidia optimized kernel moves faster to incorporate Grace support patches. In addition, the -nvidia kernel may also include other out of tree features from NVIDIA, such as support for NVIDIA GPUDirect Storage in Secure Boot.
[1] As of the December 18, 2023 version “05” of NVIDIA’s Grace Patch Config Guide [2] Ubuntu 24.04 will bump this to 512 for -generic kernels as well.
This is still the recommended OS/kernel for Grace systems. Work is ongoing to deliver the best experience for Grace users on 24.04 LTS and you can expect another blog post within the coming months. For now it is important to note this behaviour Grace users are seeing on the Ubuntu 24.04 6.8 kernel. The relevant fix is described in the launchpad link, and will be included in future releases.