CUDA and NVIDIA drivers
This page explains NVIDIA drivers and CUDA on Spheron GPU instances, how they interact with AI/ML frameworks, and how to choose the right version for your workload.
What are NVIDIA drivers?
NVIDIA drivers are software components that let the operating system communicate with the GPU hardware. Without a compatible driver, the GPU cannot be used for any compute workload.
On Spheron instances, NVIDIA drivers come pre-installed on all GPU images. You do not need to install them manually.
Key points:- Drivers are specific to the GPU architecture (e.g., Hopper for H100, Ampere for A100)
- Each driver version exposes a maximum supported CUDA version
- Driver version and CUDA version are separate but must be compatible
Check the installed driver after connecting:
nvidia-smiExample output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
+-----------------------------------------------------------------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
|=========================================================================================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:00:00.0 Off | 0 |
| N/A 34C P0 72W / 700W | 0MiB / 81920MiB | 0% Default |
+-----------------------------------------------------------------------------------------+The Driver Version and CUDA Version fields confirm what is installed.
What is CUDA?
CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform that lets software run computations on the GPU. Nearly all AI/ML frameworks depend on it.
CUDA has two components:| Component | What it is | How to check |
|---|---|---|
| CUDA Runtime | Libraries used by your application | nvidia-smi shows max supported CUDA |
| CUDA Toolkit | Compiler (nvcc) and dev tools | nvcc --version |
# Compiler version (CUDA toolkit)
nvcc --version
# List all CUDA installations
ls /usr/local/ | grep cudaHow CUDA and drivers affect development
The relationship between drivers, CUDA, and your frameworks determines what works:
GPU Hardware
└── NVIDIA Driver (minimum requirement)
└── CUDA Runtime (must be ≤ driver's max CUDA)
└── Framework (PyTorch, TensorFlow, JAX...)
└── Your Code- A newer driver supports a higher maximum CUDA version but is backward-compatible with older CUDA runtimes
- If your framework requires CUDA 12.4 but the instance only has CUDA 12.0, builds or training runs will fail
- Mismatched versions are the most common source of
CUDA not availableerrors
| Framework | Minimum CUDA | Recommended CUDA |
|---|---|---|
| PyTorch 2.3+ | 11.8 | 12.1 to 12.4 |
| TensorFlow 2.16+ | 12.3 | 12.3 to 12.4 |
| JAX (latest) | 12.0 | 12.4+ |
| vLLM 0.4+ | 12.1 | 12.4 |
Always check the framework's official docs for the exact compatibility matrix before selecting a CUDA version.
Available CUDA versions on Spheron
| CUDA version | NVIDIA driver | Notes |
|---|---|---|
| 12.0 | 525+ | Maximum compatibility with older frameworks |
| 12.4 | 550+ | Stable, broadly compatible; good default |
| 12.6 | 560+ | Optimized for RTX 5090, H100, newer GPUs |
| 12.8 Open | 570+ (open-source) | Open-source kernel module, community use |
| 13.0 Open | 575+ (open-source) | Latest features; early adoption and research use |
Open-source drivers (12.8 Open, 13.0 Open) use NVIDIA's open-source kernel module instead of the proprietary driver. They are functionally equivalent for most AI/ML workloads but preferred in community and research environments.
Choose a driver version at deployment
When deploying an instance on Spheron, select the CUDA version and driver via the OS image dropdown. They are bundled together.
- Go to app.spheron.ai → Deploy
- Select your GPU
- Open the OS / Environment dropdown
- Choose an image that includes your desired CUDA version:
| Goal | Recommended image |
|---|---|
| Stable AI/ML work | Ubuntu 22.04 + CUDA 12.4 or Ubuntu 24.04 ML PyTorch |
| Latest GPU support (H100, RTX 5090) | Ubuntu 24.04 + CUDA 12.6 |
| Open-source driver preference | Ubuntu 22.04 + CUDA 12.8 Open |
| Research and early adoption | Ubuntu 24.04 + CUDA 13.0 Open |
| Legacy framework compatibility | Ubuntu 20.04 + CUDA 12.0 |
- Deploy. The instance is ready in 30 to 60 seconds with the driver already loaded.
Verify after deployment
Once connected via SSH, confirm the environment is set up correctly:
# Driver version and max supported CUDA
nvidia-smi
# CUDA toolkit version (compiler)
nvcc --version
# Installed CUDA directories
ls /usr/local/ | grep cuda
# Quick Python check (PyTorch)
python3 -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda)"Troubleshooting
nvidia-smi: command not found
The instance may have launched on a CPU-only node, or the driver failed to load. Redeploy with a GPU image.
CUDA not available in PyTorch/TensorFlow
The framework's CUDA build does not match the installed runtime. Reinstall the framework with the correct CUDA wheel:
# PyTorch example: match cu124 to your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu124nvcc: command not found but nvidia-smi works
The CUDA toolkit (compiler) is not installed, only the runtime. Install it:
apt-get install -y cuda-toolkit-12-4Version mismatch between nvidia-smi and nvcc
This is expected behavior. nvidia-smi shows the driver's maximum supported CUDA, while nvcc shows the toolkit version. Both are valid as long as the toolkit version is less than or equal to the driver's max CUDA.
What's next
- Ubuntu Environments: Full list of OS images and configurations
- PyTorch Environment: PyTorch and CUDA setup
- TensorFlow Environment: TensorFlow and CUDA setup
- Templates and Images: Pre-built startup scripts