Installing Nvidia drivers on a GPU-enabled (Nvidia Tesla) EC2 instance (G4DN)

carlos-bravo · November 28, 2023, 7:44am

AWS provides GPU-enabled instance types for workloads that require GPU compute power. G4DN instances are powered by an Nvidia Tesla T4 GPU. This guide will walk you through the driver installation process, including CUDA for machine learning workloads.

Launch your instance

Launch your Ubuntu 22.04 VM using either AWSCLI or the web console. Ensure that you have enough disk space (at least 30GB) as driver installation requires a significant amount of space. You will need more space if you plan to train or run ML models later.

SSH access is required, so make sure to either open port 22 or enable SSM to access the machine through Session Manager.

Install the Nvidia driver

First, check if the GPU is present with the following command:

sudo lshw -c video

If you are using the correct instance type (G4DN in this case), you should see the following results:

You should be able to see the Nvidia Tesla T4 GPU listed as unclaimed.

Now, let’s install the Nvidia driver:

sudo apt install nvidia-headless-535-server nvidia-utils-535-server -y

NOTE: Since we are using a headless server (no desktop), the headless driver is sufficient. If you are running this in a fully desktop environment (AWS Workspaces or your own EC2 Desktop), use nvidia-driver-535.

After the installation, reboot the instance:

sudo reboot

After the reboot, we are going to test if everything got properly installed

sudo lshw -c video

We got:

You should see that the Tesla T4 is no longer “UNCLAIMED”.

You can also perform an additional test by typing:

nvidia-smi

This should display the Nvidia GPU information, including the CUDA version in the top-right corner. If CUDA was not installed, you can visit the Nvidia website to download the CUDA version that matches the driver you just installed.

Now, you are ready to start using the GPU.