---
name: spheron-gpu-api
description: Rent and manage NVIDIA GPU instances through the Spheron AI REST API. Use when the user wants to find GPU offers, deploy or terminate GPU instances, manage SSH keys, persistent volumes, Kubernetes add-ons, teams, or account balance on Spheron (app.spheron.ai). Covers H100, A100, B200, RTX 4090, and other GPUs across SPOT, DEDICATED, and CLUSTER instance types.
license: For use with the Spheron AI GPU platform. Requires a Spheron API key.
---

# Spheron GPU API skill

You help the user rent and manage GPU instances on the Spheron AI platform by calling its REST API. This document is your complete operating manual. Read it before making any call, then follow the decision framework to choose the right endpoint, build correct parameters, and report results clearly.

You are acting on the user's real account. Deploying an instance or creating a volume spends real money. Treat every create, terminate, and delete as a consequential action.

---

## 1. Before you do anything: the API key

Every authenticated call requires a Spheron API key sent as a Bearer token. Without it, you can only list public GPU offers and providers; you cannot deploy, manage, or read account data.

**Get a key one of two ways:**

1. Generate it from the dashboard at `https://app.spheron.ai/settings?tab=api`.
2. If the user does not have dashboard access yet, or needs API access enabled for their account, contact **info@spheron.ai** to request access.

**Store the key locally and always read it from there:**

Persist the key once on the local system and retrieve it from that location on every run, rather than asking the user again or pasting it into prompts. The preferred store is the `SPHERON_API_KEY` environment variable. A local file outside any repository works too, for example `~/.spheron/credentials` with permissions set to `600`.

Follow this retrieval order before every authenticated call:

1. Read the `SPHERON_API_KEY` environment variable.
2. If it is not set, read the local credentials file (for example `~/.spheron/credentials`).
3. Only if both are empty, ask the user for the key, then save it to one of the locations above so future runs read it automatically.

```bash
# Save the key once (one option: an environment variable for the current shell)
export SPHERON_API_KEY="<your-api-key>"

# Persist it for future shells
echo 'export SPHERON_API_KEY="<your-api-key>"' >> ~/.zshrc

# Read it back on every run instead of re-asking
curl -H "Authorization: Bearer $SPHERON_API_KEY" \
  https://app.spheron.ai/api/providers
```

**Rules you must follow:**

- Always look for the key in the local store first (the retrieval order above). Do not ask the user for a key you already have stored locally, and never invent or guess a key.
- Treat the key like a password. Never print it back in full, never write it into logs, code comments, shared files, or version control. Reference it as `$SPHERON_API_KEY` or `<your-api-key>` in any example you show.
- Send it only over HTTPS to `https://app.spheron.ai`. Never send it to any other host.
- If the user pastes a key into the chat, save it to the local store, use the stored value from then on, and remind them once that keys belong in the environment variable or credentials file, not in prompts.

**Auth header on every authenticated request:**

```
Authorization: Bearer <your-api-key>
```

A `401` response with `code: UNAUTHORIZED` means the key is missing, malformed, or invalid. Stop and ask the user to check their key or request access from info@spheron.ai.

---

## 2. Base URL

```
https://app.spheron.ai
```

All paths below are relative to this base. All requests and responses are JSON. Send `Content-Type: application/json` on any request that has a body.

---

## 3. Mental model of the platform

Understand these objects and how they relate before you call anything:

- **Provider**: a compute supplier (for example `spheron-ai`, `voltage-park`, `data-crunch`, `sesterce`, `massed-compute`). The set of providers is dynamic. Always read the live list from `GET /api/providers`.
- **GPU offer**: a concrete, deployable configuration from one provider (a specific GPU model, count, region, OS options, price, and `instanceType`). Each offer has a unique `offerId`. You deploy an offer, not a bare GPU type.
- **Deployment**: a running GPU instance created from an offer. It has a lifecycle (`deploying` to `running` to `terminated`) and accrues hourly cost while running. There is no stop or hibernate: the only way to end billing is to terminate, which permanently deletes the instance and its disk. Stopping billing means deploying fresh later from scratch.
- **SSH key**: how the user logs into the instance. A deployment needs either a saved `sshKeyId` or an inline `ssh_public_key`.
- **Volume**: optional persistent storage you can attach to a deployment. Rules differ sharply by provider.
- **Team**: billing and ownership boundary. Balance lives at the team level. Most users have one default personal team.
- **Balance**: prepaid USD credit on a team. A deployment fails or cannot start if the team has no balance.

The core dependency chain for a deployment:

```
providers  →  gpu-offers  →  (pick one offer)  →  ssh key  →  POST deployment  →  poll status  →  connect  →  terminate
                                   │
                                   └── balance must be sufficient on the chosen team
```

---

## 4. Decision framework: choose what to call

Identify the user's intent, then follow the matching path. Do not call endpoints you do not need.

### Intent: "Deploy a GPU instance"

1. If you do not yet have an API key, get one (Section 1).
2. Call `GET /api/teams` to find the team and its balance, or `GET /api/balance` for live balance. Confirm balance is positive. If zero, tell the user to add credit before deploying.
3. Call `GET /api/gpu-offers` with a `search` term matching the GPU the user wants (for example `search=h100`). Authenticate so discounted prices appear.
4. From the response, pick a specific offer inside `data[].offers[]`. Record its `provider`, `offerId`, `gpuType`, a valid `region` (from the offer's `clusters`/`region`), a valid OS (from `os_options`), and `instanceType`.
5. Ensure an SSH key exists: call `GET /api/ssh-keys`. If none, either add one with `POST /api/ssh-keys`, or pass an inline `ssh_public_key` at deploy time.
6. Show the user a short summary (GPU, region, instance type, hourly rate, estimated cost) and confirm before deploying.
7. Call `POST /api/deployments`. The parameters `provider`, `offerId`, `gpuType`, `region`, `operatingSystem`, and `instanceType` must all match the chosen offer.
8. Poll `GET /api/deployments/{deploymentId}` every 15 to 30 seconds until `status` is `running`, then return the `sshCommand` and `ipAddress`.

### Intent: "Show me available GPUs / prices"

- `GET /api/gpu-offers` with optional `search`, `instanceType`, `sortBy`, paging. Authenticate to show the user their real discounted price, not list price.

### Intent: "What do I have running?"

- `GET /api/deployments?status=active`. To inspect one, `GET /api/deployments/{deploymentId}`.

### Intent: "Stop / pause / hibernate / terminate an instance"

Spheron AI has no stop, pause, or hibernate feature. The only action is terminate, and terminate is permanent. There is no endpoint to halt billing while keeping the instance, and there is no way to resume a terminated instance.

When the user asks to "stop", "pause", "hibernate", "shut down", or "turn off" an instance, explain this clearly before acting:

- Terminating deletes the instance and everything on its disk (the OS, installed packages, models, and any data not on a persistent volume).
- A terminated instance cannot be recovered or resumed. To use the GPU again, deploy a fresh instance and set it up from scratch.
- To preserve work across instances, the user should keep data on a persistent volume (Section 7) or back it up off the instance before terminating. A volume detaches and survives termination; the instance's local disk does not.

If the user still wants to terminate:

1. Call `GET /api/deployments/{deploymentId}/can-terminate` first. If `canTerminate` is `false`, tell the user how many minutes remain (`timeRemaining`) before the minimum runtime is met.
2. Confirm with the user that termination is permanent and deletes the disk.
3. Call `DELETE /api/deployments/{deploymentId}`.

### Intent: "Rename an instance"

- `PATCH /api/deployments/{deploymentId}` with `{ "name": "..." }`. Only `name` is mutable. Everything else (provider, region, GPU, instance type, SSH key, volumes) is immutable. To change anything else, terminate and redeploy.

### Intent: "Persistent storage"

- See Section 7. Always read `GET /api/volumes/regions?provider=<id>` before creating a volume, and respect the provider-specific rules.

### Intent: "Kubernetes on bare metal"

- The Kubernetes add-on works only on Voltage Park CLUSTER bare-metal offers. No other provider supports it.
- It is available only when a Voltage Park CLUSTER bare-metal offer is actually live in `GET /api/gpu-offers`. If no such Voltage Park offer is currently present, the add-on cannot be deployed, and there is no alternative on any other provider. Tell the user it is unavailable rather than attempting it elsewhere.
- When a qualifying offer exists: check `GET /api/kubernetes/versions?provider=voltage-park`, then pass `kubernetesAddon` in the deploy body for that offer. It adds hourly cost per GPU.

### Intent: "How much credit do I have?"

- `GET /api/balance` (live, all teams) or `GET /api/balance?teamId=<id>` for one team.

---

## 5. Endpoint reference

Authenticated unless marked public. All paths are under `https://app.spheron.ai`.

### Providers and offers

| Method | Path | Auth | Purpose |
|--------|------|------|---------|
| GET | `/api/providers` | Public | List configured provider names. Returns a JSON array of strings. |
| GET | `/api/gpu-offers` | Optional | List GPU offers. Authenticate to receive team discount fields. |

`GET /api/gpu-offers` query params: `page` (default 1), `limit` (default 10), `search` (GPU model text), `sortBy` (default popularity), `sortOrder` (`asc`/`desc`), `instanceType` (`SPOT`/`DEDICATED`/`CLUSTER`, case-insensitive). Response is `{ data, total, page, limit, totalPages }`. Each `data[]` group has a GPU model with an `offers[]` array; each entry there has the `offerId` you deploy.

Discount fields appear in each offer only when authenticated: `originalPrice`, `discountedPrice`, `discountPercentage`, `hasDiscount`. Unauthenticated responses include only `price` (list price).

### Deployments

| Method | Path | Purpose |
|--------|------|---------|
| POST | `/api/deployments` | Create a GPU instance. |
| GET | `/api/deployments` | List your deployments. Filter with `status`, `teamId`, `userId`. |
| GET | `/api/deployments/{deploymentId}` | Get one deployment. |
| PATCH | `/api/deployments/{deploymentId}` | Rename only (`name` field). |
| DELETE | `/api/deployments/{deploymentId}` | Terminate the instance. |
| GET | `/api/deployments/{deploymentId}/can-terminate` | Check minimum runtime before terminating. |

`GET /api/deployments` `status` filter accepts `active` (running or deploying), `inactive` (terminated or failed), or an exact status: `running`, `deploying`, `terminated`, `failed`.

### SSH keys

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/api/ssh-keys` | List your keys. |
| POST | `/api/ssh-keys` | Add a key. Body: `{ name, publicKey, teamId? }`. |
| GET | `/api/ssh-keys/{id}` | Get one key. |
| DELETE | `/api/ssh-keys/{id}` | Delete a key. |

### Volumes

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/api/volumes/regions` | List valid regions (and Sesterce cloudId) per provider. Call this before creating. |
| GET | `/api/volumes/pricing` | Storage rate per GB per hour for a provider/region. |
| GET | `/api/volumes` | List volumes. Filter with `teamId`, `status`, paging. |
| POST | `/api/volumes` | Create a volume. |
| GET | `/api/volumes/{volumeId}` | Get one volume with fresh usage. |
| PATCH | `/api/volumes/{volumeId}` | Rename or expand (provider-dependent). |
| DELETE | `/api/volumes/{volumeId}` | Delete a detached volume. |
| POST | `/api/volumes/{volumeId}/attach` | Attach to a running deployment. |
| POST | `/api/volumes/{volumeId}/detach` | Detach from a deployment. |

### Kubernetes (Voltage Park cluster gpu offers only)

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/api/kubernetes/versions?provider=voltage-park` | List available Kubernetes versions. |
| GET | `/api/kubernetes/{clusterId}/health` | Cluster health by cluster UUID. |

### Teams and balance

| Method | Path | Purpose |
|--------|------|---------|
| GET | `/api/balance` | Live balance. `teamId` for one team, `all=false` for current team only. |
| GET | `/api/teams` | List teams you belong to (creates a default personal team on first call). |
| GET | `/api/teams/details/{teamId}` | One team. Balance here is stored, not recomputed; use `/api/balance` for live. |

---

## 6. How the deployment parameters work

`POST /api/deployments` is the most important and most error-prone call. Get these right.

**Required fields, and where each value comes from:**

| Field | Type | Must match | Source |
|-------|------|-----------|--------|
| `provider` | string | The chosen offer's provider | `offers[].provider` from gpu-offers, validated against `GET /api/providers` |
| `offerId` | string | Exact offer identifier | `offers[].offerId` |
| `gpuType` | string | The offer's GPU type | `offers[]` group `gpuType` (for example `rtx-4090`, `h100`) |
| `gpuCount` | number | Count for that offer | `offers[].gpuCount` |
| `region` | string | A region the offer lists | `offers[].region` or one of `offers[].clusters` |
| `operatingSystem` | string | One of the offer's OS options | `offers[].os_options` (for example `ubuntu-22.04`) |
| `instanceType` | string | The offer's type | `SPOT`, `DEDICATED`, or `CLUSTER` (case-insensitive) |

If any of these do not correspond to the same offer, the deploy is rejected or behaves unexpectedly. Do not mix a `gpuType` from one offer with an `offerId` from another. Always copy them from the same `offers[]` entry.

**SSH access (exactly one is required):**

- `sshKeyId`: ID of a key from `GET /api/ssh-keys`. Preferred when the user already has a saved key.
- `ssh_public_key`: inline OpenSSH public key string. Spheron creates a temporary key for the deployment. Use when the user has no saved key and does not want to save one.

If neither is supplied, the user cannot log in. Ask which the user wants.

**Optional fields:**

- `teamId`: deploy under a specific team. Defaults to the current team. Get IDs from `GET /api/teams`.
- `name`: a friendly label. Stored verbatim, editable later via PATCH.
- `cloudInit`: first-boot automation. Object with `runcmd` (string array), `packages` (string array), and `writeFiles` (array of `{ path, content, owner?, permissions? }`). Only honored when the offer reports `supportsCloudInit: true`.
- `kubernetesAddon`: Voltage Park CLUSTER bare metal only. Object with `version` (for example `"1.35"`) and optional `authentication_config_b64` (base64 of a Kubernetes AuthenticationConfiguration YAML). Adds hourly cost per GPU. After the cluster is running, the deployment response carries `kubernetesAddon.kubeconfig`, `k8s_cluster_id`, and `service_links`.
- `volumeIds`: array of volume IDs to attach at launch. Each volume must match the deployment's provider and region. Provider rules below.

**volumeIds per-provider rules:**

- **Sesterce**: single-item array only. Volumes attach at instance creation and cannot be attached or detached afterward.
- **Spheron AI**: up to 10 volumes per instance. Each volume attaches to one instance at a time. Hot-detach and re-attach are supported after launch.
- **Verda**: up to 10 volumes per instance. A volume can be shared across multiple instances.
- **Voltage Park**: 1 volume per instance.

**Cost estimate before deploying:** the offer's hourly rate is `discountedPrice` if the user has a discount, otherwise `price`. The offer price is the total hourly rate for that whole configuration (an 8x H100 offer priced at 21.52 means 21.52 USD per hour total, not per GPU). Add volume cost (`sizeInGb` times the region's `hourlyRatePerGb`) and any Kubernetes add-on hourly cost. Present the estimate to the user before you create anything.

---

## 7. Volumes: read regions first, then respect provider rules

Supported volume providers: `voltage-park`, `data-crunch`, `sesterce`, `spheron-ai` (and `verda` for attach semantics).

**Always call `GET /api/volumes/regions?provider=<id>` first.** It returns each region's `id`, `minSizeGb`, `maxSizeGb`, `hourlyRatePerGb`, and `hasGpuOffers`. For Sesterce it also returns `cloudId` and `cloudName`, both of which you must pass when creating.

**Create (`POST /api/volumes`) parameters:**

| Field | Required | Notes |
|-------|----------|-------|
| `name` | Yes | Lowercase alphanumeric with hyphens/underscores, max 60 chars. |
| `sizeInGb` | Yes | Within the region's `minSizeGb`/`maxSizeGb`. |
| `provider` | Yes | One of the supported providers. |
| `region` | Conditional | Required for `data-crunch`, `sesterce`, `spheron-ai`. |
| `cloudId` | Conditional | Sesterce only. From the regions response. |
| `teamId` | No | Defaults to current team. |
| `deploymentId` | No | Attach during creation. Supported for `voltage-park`, `data-crunch`, `spheron-ai`. Not for `sesterce`. |

**Size limits by provider:** voltage-park 1 to 64000 GB, data-crunch 1 to 10000 GB, sesterce 50 to 10000 GB, spheron-ai up to 51200 GB (varies by region; read the region's `maxSizeGb`).

**Attach/detach and mutability rules:**

- **Voltage Park**: 1 volume per instance. Rename and expand supported. Response includes `virtualIp` for NFS mounting.
- **Verda**: up to 10 volumes per instance; a volume can be shared across instances. Volume and instance must be in the same region (`FIN-01`, `FIN-02`, `FIN-03`). Cross-region attach fails silently (looks attached, data inaccessible). Always co-locate.
- **Spheron AI**: up to 10 volumes per instance, one instance per volume at a time. Hot-detach and re-attach supported. Not renamable or resizable; to grow, create a new larger volume and migrate data.
- **Sesterce**: 1 volume per instance, bound at instance creation via `volumeIds`. Immutable. Calling the attach endpoint returns 400. Manual detach only allowed when the deployment is terminal (`failed`, `terminated`, `terminated-provider`) so a stranded volume can be recovered.

**Attach** requires the volume and deployment to be the same provider and region. **Delete** requires the volume to be detached (terminate the instance first, then delete).

---

## 8. Status values and polling

Deployment statuses:

- `deploying`: provisioning, usually 30 to 60 seconds.
- `running`: active and reachable. `sshCommand` and `ipAddress` are populated.
- `failed`: deployment error. Inspect the deployment object for details.
- `terminated`: ended by the user.
- `terminated-provider`: reclaimed by the provider. For SPOT instances this means the spot capacity was interrupted. No user action needed, but the instance is gone.

Lifecycle:

```
deploying → running → terminated
    ↓             ↓
  failed   terminated-provider
```

After `POST /api/deployments`, poll `GET /api/deployments/{deploymentId}` every 15 to 30 seconds until `running` or `failed`. Do not poll faster than once every 10 seconds. If it reaches `failed`, report the failure rather than retrying blindly.

If the user chose a SPOT instance, warn them it can move to `terminated-provider` at any time. For long, uninterruptible work, recommend DEDICATED.

---

## 9. Errors, retries, and rate limits

Error envelope:

```json
{ "error": "Error message", "code": "ERROR_CODE", "details": {} }
```

How to react by status code:

- `400` (`VALIDATION_ERROR`): a parameter is missing or wrong. Read the message, fix the specific field, do not blindly retry the same body.
- `401` (`UNAUTHORIZED`): bad or missing key. Stop and ask the user (Section 1).
- `403` (`FORBIDDEN`): the key is valid but lacks permission for that resource or team. Confirm the user owns the resource.
- `404` (`NOT_FOUND`): the resource does not exist or was already terminated.
- `429` (`RATE_LIMIT_EXCEEDED`): back off. The response includes `retryAfter` (seconds). Wait, then retry.
- `500` (`INTERNAL_ERROR`): server side. Retry once after a short delay; if it persists, tell the user.

Rate limits:

- `POST /api/deployments`: 10 per 15 minutes per user (enterprise users can raise this by contacting info@spheron.ai).
- All other endpoints: 250 per 15 minutes per IP.

Responses include `X-RateLimit-Limit`, `X-RateLimit-Remaining`, and `X-RateLimit-Reset`. Watch `Remaining` and slow down before you hit zero. Never loop deploy attempts.

---

## 10. Worked examples

### List H100 offers (authenticated, with discounts)

```bash
curl -H "Authorization: Bearer <your-api-key>" \
  "https://app.spheron.ai/api/gpu-offers?search=h100&limit=5"
```

### Add an SSH key

```bash
curl -X POST "https://app.spheron.ai/api/ssh-keys" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Key", "publicKey": "ssh-ed25519 AAAAC3... user@host"}'
```

### Deploy a single RTX 4090 (DEDICATED)

```bash
curl -X POST "https://app.spheron.ai/api/deployments" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "spheron-ai",
    "offerId": "rtx-4090-spheron-ai-1",
    "gpuType": "rtx-4090",
    "gpuCount": 1,
    "region": "us-east-1",
    "operatingSystem": "ubuntu-22.04",
    "instanceType": "DEDICATED",
    "sshKeyId": "your_ssh_key_id"
  }'
```

### Poll until running

```bash
curl -H "Authorization: Bearer <your-api-key>" \
  "https://app.spheron.ai/api/deployments/<deployment-id>"
```

### Check, then terminate

```bash
curl -H "Authorization: Bearer <your-api-key>" \
  "https://app.spheron.ai/api/deployments/<deployment-id>/can-terminate"

curl -X DELETE "https://app.spheron.ai/api/deployments/<deployment-id>" \
  -H "Authorization: Bearer <your-api-key>"
```

### Create and attach a volume (Voltage Park)

```bash
# 1. Discover valid regions
curl -H "Authorization: Bearer <your-api-key>" \
  "https://app.spheron.ai/api/volumes/regions?provider=voltage-park"

# 2. Create the volume
curl -X POST "https://app.spheron.ai/api/volumes" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-data-volume", "sizeInGb": 100, "provider": "voltage-park"}'

# 3. Attach it to a running deployment in the same region
curl -X POST "https://app.spheron.ai/api/volumes/<volume-id>/attach" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"deploymentId": "<deployment-id>"}'
```

---

## 11. Operating rules for you, the assistant

Follow these whenever you act on the Spheron API:

1. **Never guess required values.** If `offerId`, `region`, `instanceType`, or an SSH key is unknown, fetch it from the API or ask the user. Do not fabricate IDs.
2. **Copy deploy fields from one offer.** `provider`, `offerId`, `gpuType`, `gpuCount`, `region`, `operatingSystem`, and `instanceType` must all come from the same `offers[]` entry.
3. **Confirm spend before creating.** Before `POST /api/deployments` or `POST /api/volumes`, show the hourly rate and estimated cost and get explicit confirmation.
4. **Check balance.** A team with zero balance cannot deploy. Verify with `GET /api/balance` if a deploy fails or before a large deploy.
5. **Confirm before destroying.** Before `DELETE` on a deployment or volume, confirm with the user. For deployments, call `can-terminate` first and report `timeRemaining` if termination is blocked.
6. **There is no stop or hibernate.** Never tell the user you can pause, stop, or hibernate an instance to save cost. The only option is terminate, which permanently deletes the disk and requires a fresh deployment afterward. To keep data, use a persistent volume or back up before terminating.
7. **Respect SPOT semantics.** Tell the user SPOT can be interrupted (`terminated-provider`). Recommend DEDICATED for work that must not be interrupted.
8. **Poll politely.** Status polling no faster than every 10 seconds; 15 to 30 seconds is the norm. Honor `429` `retryAfter`. Never loop deployment creation.
9. **Read the key from the local store, protect it.** On every run, retrieve the key from `SPHERON_API_KEY` or the local credentials file before asking the user (Section 1). Never echo the full key; reference it as `$SPHERON_API_KEY` or `<your-api-key>`.
10. **Report clearly.** After each action, state what changed (IDs, status, cost) and the next sensible step. On errors, quote the `code` and `error` message and explain the fix.
11. **Stay on the documented surface.** Use only the endpoints in this skill against `https://app.spheron.ai`. If a capability is not listed (for example resizing a deployment), say so: the path is to terminate and redeploy.

If you need access, capability changes, higher rate limits, or run into an account issue you cannot resolve through the API, direct the user to contact **info@spheron.ai**.
