Skip to content

FLUX.1 & FLUX.2

Deploy FLUX.1 and FLUX.2 from Black Forest Labs on Spheron GPU instances. FLUX models deliver state-of-the-art photorealistic text-to-image generation.

Recommended hardware

ModelMin VRAMRecommended GPUInstance Type
FLUX.1-dev16 GBRTX 4090 (24 GB)Dedicated or Spot
FLUX.1-schnell16 GBRTX 4090 (24 GB)Dedicated or Spot
FLUX.2-dev80 GBH100 80 GB (FP8) or H200 141 GBDedicated

Prerequisites

  • A running Spheron GPU instance (see Instance Types)
  • SSH access to the instance
  • A HuggingFace account with access granted to the FLUX model you intend to use: both FLUX.1-dev (non-commercial license) and FLUX.1-schnell (Apache 2.0 license) are gated models that require accepting their respective licenses before downloading.

Manual setup

Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.

Step 1: Connect to your instance

ssh <user>@<ipAddress>

Replace <user> with the username shown in the instance details panel (e.g., ubuntu for Spheron AI instances) and <ipAddress> with your instance's public IP.

Step 2: Install dependencies

sudo apt-get update -y
sudo apt-get install -y python3-pip
pip install diffusers transformers accelerate torch fastapi uvicorn pillow sentencepiece huggingface_hub

Step 2a: Authenticate with HuggingFace

Both FLUX.1-dev and FLUX.1-schnell are gated models and require a HuggingFace token.

huggingface-cli login --token <your-hf-token>

Replace <your-hf-token> with a token from huggingface.co/settings/tokens.

Step 3: Create the server script

cat > /opt/flux_server.py << 'EOF'
import io, base64
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from diffusers import FluxPipeline
 
app = FastAPI()
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
).to("cuda")
 
class GenerateRequest(BaseModel):
    prompt: str
    width: int = 1024
    height: int = 1024
    num_inference_steps: int = 28
    guidance_scale: float = 3.5
 
@app.post("/generate")
def generate(req: GenerateRequest):
    image = pipe(
        req.prompt,
        width=req.width,
        height=req.height,
        num_inference_steps=req.num_inference_steps,
        guidance_scale=req.guidance_scale,
    ).images[0]
    buf = io.BytesIO()
    image.save(buf, format="PNG")
    return {"image_b64": base64.b64encode(buf.getvalue()).decode()}
 
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

Step 4: Start the server

Run the server in the foreground to verify it works:

python3 /opt/flux_server.py

Press Ctrl+C to stop.

Step 5: Run as a background service

To keep the server running after you close your SSH session, create a systemd service:

sudo tee /etc/systemd/system/flux.service > /dev/null << 'EOF'
[Unit]
Description=FLUX.1 Image Generation Server
After=network.target
 
[Service]
Type=simple
Environment=HF_TOKEN=<your-hf-token>
ExecStart=/usr/bin/python3 /opt/flux_server.py
Restart=on-failure
RestartSec=10
 
[Install]
WantedBy=multi-user.target
EOF
 
sudo systemctl daemon-reload
sudo systemctl enable flux
sudo systemctl start flux

Accessing the server

SSH tunnel

ssh -L 8000:localhost:8000 <user>@<ipAddress>

Usage example

import requests
import base64
from PIL import Image
import io
 
response = requests.post(
    "http://localhost:8000/generate",
    json={
        "prompt": "A photorealistic mountain landscape at sunset, golden hour lighting",
        "width": 1024,
        "height": 1024,
        "num_inference_steps": 28,
        "guidance_scale": 3.5,
    },
)
response.raise_for_status()
 
image_bytes = base64.b64decode(response.json()["image_b64"])
image = Image.open(io.BytesIO(image_bytes))
image.save("output.png")
print("Image saved to output.png")

Check server logs

journalctl -u flux -f

Cloud-init startup script (optional)

If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the setup above. It installs the required Python packages and starts a FastAPI server exposing a /generate endpoint on port 8000.

#cloud-config
write_files:
  - path: /opt/flux_server.py
    content: |
      import io, base64
      from fastapi import FastAPI
      from pydantic import BaseModel
      import torch
      from diffusers import FluxPipeline
 
      app = FastAPI()
      pipe = FluxPipeline.from_pretrained(
          "black-forest-labs/FLUX.1-dev",
          torch_dtype=torch.bfloat16,
      ).to("cuda")
 
      class GenerateRequest(BaseModel):
          prompt: str
          width: int = 1024
          height: int = 1024
          num_inference_steps: int = 28
          guidance_scale: float = 3.5
 
      @app.post("/generate")
      def generate(req: GenerateRequest):
          image = pipe(
              req.prompt,
              width=req.width,
              height=req.height,
              num_inference_steps=req.num_inference_steps,
              guidance_scale=req.guidance_scale,
          ).images[0]
          buf = io.BytesIO()
          image.save(buf, format="PNG")
          return {"image_b64": base64.b64encode(buf.getvalue()).decode()}
 
      if __name__ == "__main__":
          import uvicorn
          uvicorn.run(app, host="0.0.0.0", port=8000)
  - path: /etc/systemd/system/flux.service
    content: |
      [Unit]
      Description=FLUX.1 Image Generation Server
      After=network.target
 
      [Service]
      Type=simple
      Environment=HF_TOKEN=<your-hf-token>
      ExecStart=/usr/bin/python3 /opt/flux_server.py
      Restart=on-failure
      RestartSec=10
 
      [Install]
      WantedBy=multi-user.target
runcmd:
  - apt-get update -y
  - apt-get install -y python3-pip
  - pip install diffusers transformers accelerate torch fastapi uvicorn pillow sentencepiece huggingface_hub
  - systemctl daemon-reload
  - systemctl enable flux
  - systemctl start flux

What's next