Skip to content

Quick Guides

Where to start

Training a model? → Start with Distributed Training for multi-GPU, or pick any RTX 4090 Spot instance for single-GPU fine-tuning
Running inference? → vLLM Server for an OpenAI-compatible API; Ollama for interactive local usage
Generating images? → FLUX.1 for state-of-the-art text-to-image; ComfyUI for visual workflows
Running an AI node? → See the AI Nodes section below

Training

Model training guides, from single-GPU fine-tuning to large-scale distributed runs.

Distributed Training (PyTorch DDP)

Multi-GPU PyTorch DDP and DeepSpeed ZeRO-3 on a Voltage Park bare-metal H100 NVLink cluster (up to 8× H100). Covers torchrun, gradient checkpointing, BF16 precision, checkpoint persistence, and GPU monitoring.

Hardware: Voltage Park Cluster (H100 NVLink, up to 8 GPUs)

LLM inference

Deploy and serve large language models on Spheron GPU instances.

Inference Frameworks

Choose the right serving stack for your use case.

vLLM Inference Server

OpenAI-compatible inference server using vLLM on H100 or A100. Includes a systemd service for persistence, SSH tunnel access, and performance tuning flags.

Hardware: H100 80GB (7B–13B models) · 2× A100 80GB (30B+)

Ollama + Open WebUI

Browser-based chat interface backed by Ollama on an RTX 4090. Docker Compose setup with GPU passthrough; pull any model with one command.

Hardware: RTX 4090 (24GB VRAM)

DeepSeek R1 & V3

DeepSeek reasoning models from 7B distillations to the full 671B FP8 multi-GPU deployment.

Llama 4 Scout & Maverick

Meta's latest multimodal MoE models with long-context and image understanding.

Llama 3.1 / 3.2 / 3.3

Meta Llama 3 family guides covering 8B through 405B with tensor parallelism.

Qwen3 Dense & MoE

Qwen3 text models with thinking mode, 8B through 235B MoE variants.

Mistral & Mixtral

Mistral 7B, Mixtral 8x7B MoE, and Mistral Small 3.1 with function calling.

Gemma 3

Google DeepMind Gemma 3, 4B through 27B, available under the Gemma Terms of Use (commercial use permitted).

Phi-4 & Phi-4 Multimodal

Microsoft Phi-4 SLMs including the multimodal variant with image input.

Multimodal Models

Vision-language models including Qwen3-Omni, InternVL3, LLaVA-Next, Pixtral, and Baidu ERNIE.

Chandra OCR

Specialized OCR model for document processing and text extraction.

Soulx Podcast-1.7B

Compact 1.7B parameter model optimized for podcast and audio content generation.

Janus CoderV-8B

Code generation and understanding model with 8B parameters.

Image generation

Deploy GPU-accelerated image generation models on Spheron instances.

FLUX.1 & FLUX.2

Black Forest Labs text-to-image models. FLUX.1-dev on RTX 4090 (24GB), FLUX.2 on H100 (80GB).

Hardware: RTX 4090 24GB (FLUX.1-dev) · H100 80GB (FLUX.2)

Stable Diffusion 3.5 & SDXL

Stability AI diffusion models. SD 1.5 on 8GB, SDXL on 16GB, SD 3.5 on 24–40GB VRAM.

Hardware: 8–40GB VRAM depending on model variant

ComfyUI

Node-based visual workflow server for image generation. Docker-based, port 8188, SSH tunnel setup.

Hardware: RTX 4090 24GB (recommended)

AI nodes

Deploy and run specialized AI network nodes.

Gonka AI Node

Deploy Gonka AI node infrastructure for AI compute network participation.

Pluralis Node 0

Set up and run Pluralis Node 0 for distributed AI network participation.

What's next

Instance Types: Choose the right GPU for your workload
Cost Optimization: Reduce training and inference costs
Templates & Images: Copy-ready startup scripts
API Reference: Automate deployments programmatically