Skip to content

Provider integration guide

This guide describes the API contract a compute provider's orchestrator must expose for the Spheron AI Marketplace to list, provision, and manage GPU instances on the provider's infrastructure. It is written for the engineers who build and operate that orchestrator.

By the end of integration, the marketplace can pull your offer catalog, deploy an instance on your platform, give the end user working SSH access, and stop, start, or terminate that instance, all by calling your API.

The machine-readable companion is provider-orchestrator-openapi.yaml (OpenAPI 3.0.3). That file is the authoritative source for endpoint paths, payloads, and schemas. This page explains the intent and conventions behind it.

Integration model

The marketplace is the client. The provider's orchestrator is the server. The integration is entirely poll-based. No webhooks or callbacks are required.

After provisioning, the marketplace polls the instance-state endpoint every 10 to 30 seconds for each active instance and reacts to status transitions. Accurate, promptly updated status reporting is therefore the single most important property of the integration.

All requests and responses are JSON. The marketplace sends Content-Type: application/json on every request body and expects the same in responses.

Capability tiers

Implement the Compute tier in full. Add the Storage tier only if your platform offers network volumes.

TierScopeRequired?
ComputeOffer catalog, real-time availability, instance create / get / list / start / stop / terminateRequired. A fully working compute tier is sufficient for a successful integration.
StorageVolume locations, volume create / get / attach / detach / deleteOptional. Implement only if your platform supports network volumes. Its absence does not block integration.

Authentication

The preferred mechanism is a long-lived static API key sent on every request as Authorization: Bearer <key>.

# Every request carries the static API key
curl -H "Authorization: Bearer <your-key>" \
  https://api.provider.example.com/v1/configurations

Keys must meet three requirements:

  • Scoped to the marketplace account.
  • Revocable.
  • Concurrently valid in multiples, so keys rotate with zero downtime.

If your platform requires short-lived tokens instead, expose POST /auth/token and include expires_in in the response so the marketplace can refresh proactively.

Offer catalog and availability

Two endpoints serve different purposes, and both are required.

Catalog: GET /configurations

This endpoint is the catalog. It must return every offer the provider knows about, both deployable and out-of-stock, with an available boolean set per offer.

Out-of-stock offers must be returned with available: false, never dropped from the array. The marketplace renders unavailable offers as out-of-stock cards with a notify-me flow. An offer that was never emitted simply vanishes from the marketplace UI.

The example below shows one deployable offer and one out-of-stock offer in the same response:

{
  "configurations": [
    {
      "id": "gpu-8x-h100-sxm",
      "name": "8x H100 SXM",
      "instance_type": "DEDICATED",
      "vcpus": 192,
      "memory_gb": 2048,
      "storage_gb": 8000,
      "gpu_count": 8,
      "gpu_type": "H100-SXM5-80GB",
      "gpu_memory_gb": 80,
      "price_per_hour": 21.52,
      "available": true,
      "regions": ["EU-North 1", "US-Central 1"],
      "os_options": ["ubuntu-22.04-cuda-12.4", "ubuntu-24.04"],
      "supports_cloud_init": true
    },
    {
      "id": "gpu-1x-rtx4090",
      "name": "1x RTX 4090",
      "instance_type": "SPOT",
      "vcpus": 16,
      "memory_gb": 64,
      "storage_gb": 500,
      "gpu_count": 1,
      "gpu_type": "RTX-4090-24GB",
      "gpu_memory_gb": 24,
      "price_per_hour": 0.72,
      "spot_price_per_hour": 0.36,
      "available": false,
      "regions": ["EU-North 1"],
      "os_options": ["ubuntu-22.04-cuda-12.4"],
      "supports_cloud_init": true
    }
  ]
}

Deploy-time gate: GET /availability

This endpoint is the deploy-time gate. It returns only offers deployable right now (available: true), optionally filtered by region, and the marketplace calls it immediately before provisioning. It should reflect real-time inventory as closely as possible.

# Check live inventory in a single region before deploying
curl -H "Authorization: Bearer <your-key>" \
  "https://api.provider.example.com/v1/availability?region=EU-North%201"

Offer fields

Each offer (Configuration) carries:

  • A stable id and name.
  • instance_type: one of SPOT or DEDICATED.
  • Hardware shape: vcpus, memory_gb, storage_gb, gpu_count, gpu_type, gpu_memory_gb.
  • Pricing: price_per_hour and optional spot_price_per_hour, both USD per hour.
  • regions, os_options, and supports_cloud_init.

Region tokens

Regions are identified by stable, human-readable tokens (for example "EU-North 1"), used consistently across every endpoint: offer regions, the availability filter, instance region, and volume locations.

Translate internal datacenter codes to canonical tokens at the API boundary. Raw codes must never leak into responses. If the storage tier is implemented, each volume-location id must equal the matching offer region token.

Instance lifecycle

State machine

Instances report one of eight canonical statuses. Map your internal platform states onto these:

                  ┌────────────► ERROR (terminal)

PROVISIONING ────► ONLINE ◄──────────────┐
                  │   │                  │
                  │   ▼ stop             │ start
                  │  STOPPING ─► STOPPED ┘
                  │                  │
                  │                  ▼ (reaped by platform)
                  │               OFFLINE (terminal)
                  ▼ terminate
              DESTROYING ─► DESTROYED (terminal)

The distinction between STOPPED and OFFLINE matters:

  • STOPPED: the resource still exists, is still owned by the deployment, and can be resumed via /start.
  • OFFLINE: the instance can no longer be resumed in place (for example, the platform reaped a shut-off VM) and the deployment is effectively over.

ERROR is reserved for non-recoverable failures and should include a human-readable error field.

Provisioning flow

A single deployment follows this sequence. Steps 1 and 2 are marketplace calls; the rest is what your orchestrator does and reports:

  1. The marketplace calls GET /availability to confirm live capacity.
  2. The marketplace calls POST /instances with the offer, region, and SSH keys.
  3. Your orchestrator returns a PROVISIONING instance record immediately, without waiting for boot.
  4. The marketplace polls GET /instances/{id} every 10 to 30 seconds.
  5. Your orchestrator transitions the instance to ONLINE once it has a reachable public IP (or port-forwarded SSH endpoint) and the injected SSH keys work.

A create request looks like this:

{
  "configuration_id": "gpu-8x-h100-sxm",
  "region": "EU-North 1",
  "name": "spheron-d3f9a1",
  "ssh_keys": [
    "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAA... user@host"
  ],
  "operating_system_image": "ubuntu-22.04-cuda-12.4",
  "cloud_init": {
    "package_update": true,
    "packages": ["tmux", "htop"],
    "runcmd": ["nvidia-smi"]
  }
}

Return 201 with a PROVISIONING record as soon as the instance exists:

{
  "id": "inst-7f3a91c2",
  "configuration_id": "gpu-8x-h100-sxm",
  "region": "EU-North 1",
  "status": "PROVISIONING",
  "created_at": "2026-06-10T12:00:00Z",
  "updated_at": "2026-06-10T12:00:00Z"
}

A later poll returns the live, reachable instance:

{
  "id": "inst-7f3a91c2",
  "configuration_id": "gpu-8x-h100-sxm",
  "region": "EU-North 1",
  "status": "ONLINE",
  "public_ip": "203.0.113.10",
  "ssh_port": 22,
  "ssh_username": "ubuntu",
  "gpu_count": 8,
  "gpu_type": "H100-SXM5-80GB",
  "image": "ubuntu-22.04-cuda-12.4",
  "price_per_hour": 21.52,
  "created_at": "2026-06-10T12:00:00Z",
  "updated_at": "2026-06-10T12:03:40Z"
}

Expected provisioning times:

  • Virtual machines: reach ONLINE within 10 minutes, with a typical target under 5 minutes.
  • Bare metal: longer windows are acceptable. Agree on an SLA during onboarding, provided status is reported accurately throughout.

SSH keys, images, and cloud-init

The create request carries raw OpenSSH public key material in ssh_keys. The orchestrator injects these so the end user can connect. If your platform requires registered key objects, create temporary keys internally and clean them up at termination.

operating_system_image selects one of the offer's os_options. When an offer reports supports_cloud_init: true, honor the structured cloud_init block (run commands, packages, file writes) on first boot.

If instance names must be unique on your platform, de-duplicate (for example, by suffixing) rather than failing the request.

Stop and start

POST /instances/{id}/stop pauses a running instance while preserving its disk. Compute billing should pause while stopped. Document any storage charges during onboarding.

POST /instances/{id}/start resumes the instance in place. If your platform cannot resume stopped instances, return 409 with a descriptive error rather than silently destroying state.

Termination

DELETE /instances/{id} destroys the instance and releases all associated resources, including temporary SSH keys or scripts created during provisioning.

Once destroyed, GET /instances/{id} should return 404. The marketplace interprets 404 on a previously known instance as DESTROYED. Deletion must be idempotent: deleting an already-destroyed instance is success, not an error.

Volumes (optional tier)

Implement these endpoints only if your platform supports network volumes:

  • GET /volumes/locations: lists regions where volumes can be created, with ids equal to region tokens.
  • POST /volumes: creates a volume, optionally attaching to instances at create time.
  • POST /volumes/{id}/attach and /detach: manage attachments.
  • DELETE /volumes/{id}: removes the volume.

Several conventions make volume integrations robust:

  • Attach is idempotent. Attaching to an already-attached instance is success.
  • If attachment requires a stop, attach, then start cycle on your platform, handle it internally and report the intermediate instance states accurately.
  • If detachment is asynchronous, do not report the volume as detached until the attachment record is actually gone.
  • If attachment records can linger, force-detach internally before delete instead of failing.

The create-instance request may carry volume_ids for inline attachment. Honor it if you support it; ignore it otherwise, and the marketplace falls back to post-create attach.

Errors and operational conventions

Errors use a consistent envelope with meaningful HTTP status codes:

{
  "error": {
    "code": "capacity_unavailable",
    "message": "No capacity for gpu-8x-h100-sxm in EU-North 1.",
    "details": {}
  }
}

Map status codes as follows:

StatusMeaning
400Invalid request
401Bad credentials
404Not found (terminal for instances)
409Invalid state, or capacity loss at create time

Preserve real HTTP status codes. Do not return 200 with an embedded error.

Operational expectations:

  • The state endpoint is polled continuously, so keep it fast and cheap.
  • Status transitions should be observable within seconds of occurring.
  • Rate limits, if any, should comfortably accommodate one poll per active instance per 10 seconds, plus catalog refreshes.
  • All prices are USD per hour. Memory and storage are in GB.
  • Timestamps are RFC 3339 UTC.

Integration checklist

A provider is ready for marketplace onboarding when:

  1. All compute-tier endpoints in the OpenAPI spec are implemented and reachable over HTTPS.
  2. GET /configurations returns the full catalog, including out-of-stock offers with available: false.
  3. Region tokens are stable and consistent across all endpoints.
  4. A test instance can be provisioned, reached over SSH with an injected key, stopped, started, and terminated, with every status transition visible via polling.
  5. GET /instances/{id} returns 404 after termination.
  6. VM provisioning reaches ONLINE within the agreed SLA.
  7. API keys are issued, scoped to the marketplace account, and rotatable.
  8. If applicable, volume endpoints pass the same lifecycle test, including idempotent attach and reliable detach.

What we need from you to start

To begin integration, share the following with the Spheron team:

  • Your orchestrator base URL for staging and production.
  • API credentials for a test account.
  • Your offer catalog with pricing and regions.
  • Expected provisioning times per instance type.
  • Any platform-specific constraints: naming rules, stop/start support, NAT or port-forwarding, and volume semantics.

The Spheron team validates the contract with you against the checklist above. For questions, contact info@spheron.ai.

What's next