Skip to content

Cost optimization

This page covers strategies for reducing GPU infrastructure costs on Spheron, including tier selection, instance type trade-offs, reserved GPU savings, and spend monitoring.

GPU tier selection matrix

Choose the right GPU tier based on how much VRAM your workload needs:

VRAM neededGPUTypeApprox. $/hrBest for
Less than 16 GBRTX 4090 (24 GB)Dedicated/Spot~$0.25-0.55Dev, inference, fine-tuning
40 GBA100 40 GBDedicated/SpotVariableMid-scale training
80 GBA100 80 GB / H100DedicatedVariableLarge model training
640 GB+8x H100 NVLinkCluster~$15+/hrDistributed training, K8s

Check current prices in the dashboard; prices vary by provider and availability.

Instance type strategy

Spot (lowest cost)

Spot instances are 30-60% cheaper than Dedicated. The trade-off: the provider can reclaim them at any time.

Use Spot for:
  • Experiments and hyperparameter search
  • Batch training jobs with checkpoint saving enabled
  • Any workload under 4 hours that can tolerate interruption

Handling interruption: Save checkpoints to a persistent volume every N steps. If the instance is reclaimed, resume from the latest checkpoint on a new instance without losing progress.

# Save checkpoint every 100 steps
if step % 100 == 0:
    torch.save(state, '/checkpoints/checkpoint_latest.pt')

Dedicated (guaranteed)

Dedicated instances cannot be reclaimed. Use them when interruption would be costly:

  • Production inference servers
  • Multi-day training runs
  • Interactive workloads and demos

Cluster (largest scale)

Full physical servers with NVLink interconnects. Use for:

  • Multi-GPU distributed training (PyTorch DDP, DeepSpeed)
  • Kubernetes cluster workloads
  • Workloads requiring maximum GPU-to-GPU bandwidth

Reserved GPUs for long-term work

For multi-week or multi-month projects, Reserved GPUs offer significant savings:

  • Submit requests via dashboard > Reserved GPU
  • Multiple providers compete to offer the lowest price
  • Typical savings: 30-50% vs on-demand hourly rates for 3-12 month commitments
  • Select "Any Location" to maximize provider competition

See Reserved GPUs for the request form.

Team discount program

Teams with active discounts automatically see reduced prices on the dashboard. The discounted price is applied at deployment without any additional steps.

  • Discounts are either volume-based or admin-assigned; the higher of the two is applied automatically

To inquire about discount eligibility for high-volume usage, contact support via Discord or email.

Monitoring burn rate

Check remaining balance

View your current credit balance on the Billing page in the dashboard. The balance updates in real time as instances run.

Track per-instance spend

Open the instance details drawer from the Instances page to see the hourly rate and total cost accumulated for a running deployment.

Terminate when done

Terminate instances from the dashboard as soon as your workload finishes to stop charges immediately. Navigate to Instances, select the instance, and click Terminate.

Set up balance alerts in User Settings to receive a notification before credits run out.

Practical tips

Use persistent volumes for datasets and model weights. Avoid re-downloading multi-GB datasets on every deployment; mount a volume with data pre-loaded. This saves both time and egress costs.

Prefer Spot for short jobs. Any job under 4 hours that can be checkpointed is a good Spot candidate. Switch to Dedicated for multi-day runs requiring uninterrupted time.

Batch GPU use. Avoid leaving instances running idle. Terminate immediately when your job finishes, and re-deploy from a checkpoint when you resume work.

Use RTX 4090 for development. The RTX 4090 is the most cost-effective GPU for code iteration, small model experiments, and inference serving at low traffic. Move to A100/H100 only when VRAM or compute requirements demand it.

What's next