Cost optimization

This page covers strategies for reducing GPU infrastructure costs on Spheron, including tier selection, instance type trade-offs, reserved GPU savings, and spend monitoring.

GPU tier selection matrix

Choose the right GPU tier based on how much VRAM your workload needs:

VRAM needed	GPU	Type	Approx. $/hr	Best for
Less than 16 GB	RTX 4090 (24 GB)	Dedicated/Spot	~$0.25-0.55	Dev, inference, fine-tuning
40 GB	A100 40 GB	Dedicated/Spot	Variable	Mid-scale training
80 GB	A100 80 GB / H100	Dedicated	Variable	Large model training
640 GB+	8x H100 NVLink	Cluster	~$15+/hr	Distributed training, K8s

Check current prices in the dashboard; prices vary by provider and availability.

Instance type strategy

Spot (lowest cost)

Spot instances are 30-60% cheaper than Dedicated. The trade-off: the provider can reclaim them at any time.

Use Spot for:

Experiments and hyperparameter search
Batch training jobs with checkpoint saving enabled
Any workload under 4 hours that can tolerate interruption

Handling interruption: Save checkpoints to a persistent volume every N steps. If the instance is reclaimed, resume from the latest checkpoint on a new instance without losing progress.

# Save checkpoint every 100 steps
if step % 100 == 0:
    torch.save(state, '/checkpoints/checkpoint_latest.pt')

Dedicated (guaranteed)

Dedicated instances cannot be reclaimed. Use them when interruption would be costly:

Production inference servers
Multi-day training runs
Interactive workloads and demos

Cluster (largest scale)

Full physical servers with NVLink interconnects. Use for:

Multi-GPU distributed training (PyTorch DDP, DeepSpeed)
Kubernetes cluster workloads
Workloads requiring maximum GPU-to-GPU bandwidth

Reserved GPUs for long-term work

For multi-week or multi-month projects, Reserved GPUs offer significant savings:

Submit requests via dashboard > Reserved GPU
Multiple providers compete to offer the lowest price
Typical savings: 30-50% vs on-demand hourly rates for 3-12 month commitments
Select "Any Location" to maximize provider competition

See Reserved GPUs for the request form.

Team discount program

Teams with active discounts automatically see reduced prices on the dashboard. The discounted price is applied at deployment without any additional steps.

Discounts are either volume-based or admin-assigned; the higher of the two is applied automatically

To inquire about discount eligibility for high-volume usage, contact support via Discord or email.

Monitoring burn rate

Check remaining balance

View your current credit balance on the Billing page in the dashboard. The balance updates in real time as instances run.

Track per-instance spend

Open the instance details drawer from the Instances page to see the hourly rate and total cost accumulated for a running deployment.

Terminate when done

Terminate instances from the dashboard as soon as your workload finishes to stop charges immediately. Navigate to Instances, select the instance, and click Terminate.

Set up balance alerts in User Settings to receive a notification before credits run out.

Practical tips

Use persistent volumes for datasets and model weights. Avoid re-downloading multi-GB datasets on every deployment; mount a volume with data pre-loaded. This saves both time and egress costs.

Prefer Spot for short jobs. Any job under 4 hours that can be checkpointed is a good Spot candidate. Switch to Dedicated for multi-day runs requiring uninterrupted time.

Batch GPU use. Avoid leaving instances running idle. Terminate immediately when your job finishes, and re-deploy from a checkpoint when you resume work.

Use RTX 4090 for development. The RTX 4090 is the most cost-effective GPU for code iteration, small model experiments, and inference serving at low traffic. Move to A100/H100 only when VRAM or compute requirements demand it.

What's next

Instance Types: Detailed Spot/Dedicated/Cluster comparison
Regions & Providers: Provider capabilities and GPU tiers
Reserved GPUs: Long-term GPU reservation form
Billing: Credit management, auto top-up, and team discounts
Volume mounting: Persistent storage for datasets and checkpoints