📢 Call for testing nemotron-3-nano snap

Hi all,

As part of our ongoing effort to expand the inference snaps ecosystem, we are excited to release the nemotron-3-nano inference snap.

Nemotron 3 Nano is a lightweight, high-performance Large Language Model (LLM) trained by NVIDIA. It is specifically built for both reasoning and non-reasoning tasks, making it an excellent choice for efficient on-device AI in embedded and desktop environments.

To install the nemotron-3-nano snap, run:

sudo snap install nemotron-3-nano

As with our other inference snaps, this package automatically detects your hardware and deploys the most suitable model weights and runtime for your system.

:high_voltage: Optimizations

This snap is optimized for inference using NVIDIA GPUs on both ARM and x86 platforms. These include workstations such as NVIDIA DGX Spark and discrete NVIDIA graphics.

The snap also includes optimizations for a wide range of CPUs. We expect a working performance on most high-end CPUs.

:play_button: Getting Started

Once installed, you can start a chat client in the terminal for quick conversations:

nemotron-3-nano chat

You can also interact with the model via the OpenAI-compatible API, which you can view from nemotron-3-nano status. For all available commands, run nemotron-3-nano --help.

To explore the available engines and verify which optimization is being used on your hardware, you can run nemotron-3-nano list-engines.

Another good way to start is by using Open WebUI as described in this guide.

For more information, visit the documentation.

:postal_horn: Feedback

We are looking for feedback on performance and compatibility across different hardware setups. Please share your experience here or via GitHub Discussions. If you encounter any bugs, feel free to report them to help us improve the experience for the community!

4 Likes

Ran fine on CPU with AMD Ryzen AI 9 HX 370 CPU (32GB unified memory). I did have it stall on part 5-of-6 while downloading the model, but restarting the snap install solved that.

1 Like

I see a 50% cpu usage next to the GPU usage below (idle is 10W, so the model uses 50W from the gpu), and yet I have memory free. The model seems to take only 8GB of gpu ram from gpu but around 70GB from the normal RAM.

| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
| 0 NVIDIA GeForce RTX 5060 Ti 60W / 180W | 9685MiB / 16311MiB |
| 0 N/A N/A 96072 C llama-server 8792MiB

ENGINE VENDOR DESCRIPTION COMPAT
nvidia-gpu* Canonical Ltd Use CUDA for inference on Nvidia GPUs yes
cpu Canonical Ltd Use CPU yes

Just tested this on my DGX Spark, both via the chat feature and by using the OpenAI API with Cline in VSCode, and it worked seamlessly!

1 Like