Quick Guides
Where to start
- Training a model? → Start with Distributed Training for multi-GPU, or pick any RTX 4090 Spot instance for single-GPU fine-tuning
- Running inference? → vLLM Server for an OpenAI-compatible API; Ollama for interactive local usage
- Generating images? → FLUX.1 for state-of-the-art text-to-image; ComfyUI for visual workflows
- Running an AI node? → See the AI Nodes section below
Training
Model training guides, from single-GPU fine-tuning to large-scale distributed runs.
Distributed Training (PyTorch DDP)
Multi-GPU PyTorch DDP and DeepSpeed ZeRO-3 on a Voltage Park bare-metal H100 NVLink cluster (up to 8× H100). Covers torchrun, gradient checkpointing, BF16 precision, checkpoint persistence, and GPU monitoring.
Hardware: Voltage Park Cluster (H100 NVLink, up to 8 GPUs)
LLM inference
Deploy and serve large language models on Spheron GPU instances.
Inference Frameworks
Choose the right serving stack for your use case.
vLLM Inference Server
OpenAI-compatible inference server using vLLM on H100 or A100. Includes a systemd service for persistence, SSH tunnel access, and performance tuning flags.
Hardware: H100 80GB (7B–13B models) · 2× A100 80GB (30B+)
Ollama + Open WebUI
Browser-based chat interface backed by Ollama on an RTX 4090. Docker Compose setup with GPU passthrough; pull any model with one command.
Hardware: RTX 4090 (24GB VRAM)
DeepSeek R1 & V3
DeepSeek reasoning models from 7B distillations to the full 671B FP8 multi-GPU deployment.
Llama 4 Scout & Maverick
Meta's latest multimodal MoE models with long-context and image understanding.
Llama 3.1 / 3.2 / 3.3
Meta Llama 3 family guides covering 8B through 405B with tensor parallelism.
Qwen3 Dense & MoE
Qwen3 text models with thinking mode, 8B through 235B MoE variants.
Mistral & Mixtral
Mistral 7B, Mixtral 8x7B MoE, and Mistral Small 3.1 with function calling.
Gemma 3
Google DeepMind Gemma 3, 4B through 27B, available under the Gemma Terms of Use (commercial use permitted).
Phi-4 & Phi-4 Multimodal
Microsoft Phi-4 SLMs including the multimodal variant with image input.
Multimodal Models
Vision-language models including Qwen3-Omni, InternVL3, LLaVA-Next, Pixtral, and Baidu ERNIE.
Chandra OCR
Specialized OCR model for document processing and text extraction.
Soulx Podcast-1.7B
Compact 1.7B parameter model optimized for podcast and audio content generation.
Janus CoderV-8B
Code generation and understanding model with 8B parameters.
Image generation
Deploy GPU-accelerated image generation models on Spheron instances.
FLUX.1 & FLUX.2
Black Forest Labs text-to-image models. FLUX.1-dev on RTX 4090 (24GB), FLUX.2 on H100 (80GB).
Hardware: RTX 4090 24GB (FLUX.1-dev) · H100 80GB (FLUX.2)
Stable Diffusion 3.5 & SDXL
Stability AI diffusion models. SD 1.5 on 8GB, SDXL on 16GB, SD 3.5 on 24–40GB VRAM.
Hardware: 8–40GB VRAM depending on model variant
ComfyUI
Node-based visual workflow server for image generation. Docker-based, port 8188, SSH tunnel setup.
Hardware: RTX 4090 24GB (recommended)
AI nodes
Deploy and run specialized AI network nodes.
Gonka AI Node
Deploy Gonka AI node infrastructure for AI compute network participation.
Pluralis Node 0
Set up and run Pluralis Node 0 for distributed AI network participation.
What's next
- Instance Types: Choose the right GPU for your workload
- Cost Optimization: Reduce training and inference costs
- Templates & Images: Copy-ready startup scripts
- API Reference: Automate deployments programmatically