Hi all,
As part of our ongoing effort to expand the inference snaps ecosystem, we are excited to release the nemotron-3-nano inference snap.
Nemotron 3 Nano is a lightweight, high-performance Large Language Model (LLM) trained by NVIDIA. It is specifically built for both reasoning and non-reasoning tasks, making it an excellent choice for efficient on-device AI in embedded and desktop environments.
To install the nemotron-3-nano snap, run:
sudo snap install nemotron-3-nano
As with our other inference snaps, this package automatically detects your hardware and deploys the most suitable model weights and runtime for your system.
Optimizations
This snap is optimized for inference using NVIDIA GPUs on both ARM and x86 platforms. These include workstations such as NVIDIA DGX Spark and discrete NVIDIA graphics.
The snap also includes optimizations for a wide range of CPUs. We expect a working performance on most high-end CPUs.
Getting Started
Once installed, you can start a chat client in the terminal for quick conversations:
nemotron-3-nano chat
You can also interact with the model via the OpenAI-compatible API, which you can view from nemotron-3-nano status. For all available commands, run nemotron-3-nano --help.
To explore the available engines and verify which optimization is being used on your hardware, you can run nemotron-3-nano list-engines.
Another good way to start is by using Open WebUI as described in this guide.
For more information, visit the documentation.
Feedback
We are looking for feedback on performance and compatibility across different hardware setups. Please share your experience here or via GitHub Discussions. If you encounter any bugs, feel free to report them to help us improve the experience for the community!