Skip to content

Janus CoderV-8B

Deploy JanusCoderV-8B on a Spheron GPU instance. JanusCoderV-8B is an 8B multimodal model that generates code from visual inputs including charts, screenshots, and UI mockups. It converts images into HTML, CSS, React components, and data visualization code.

Key capabilities

  • Visual-to-code translation: converts charts and screenshots to HTML and code
  • Layout bug fixing from screenshot inputs
  • Animation reconstruction using Manim
  • 32K token context support
  • Multimodal understanding across text, images, and code
Benchmarks:
  • ChartMimic: 74.20 (beats Qwen2.5VL-7B, InternVL3.5-8B)
  • WebCode2,M: 18.28 (best open-weight structural correctness)
  • InteractScience: 33.32 (visual metrics leader)

Requirements

Hardware:
  • GPU: RTX 4090, A100, or H100 (16 GB VRAM minimum, 24 GB recommended)
  • RAM: 16 GB (32 GB for large context workloads)
  • Storage: 20 GB (SSD recommended)
Software:
  • Ubuntu 22.04 LTS
  • CUDA 12.1 or later
  • Python 3.11

Manual setup

Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.

Step 1: Provision a Spheron instance

  1. Sign up at app.spheron.ai
  2. Add credits (card/crypto)
  3. Click DeployRTX 4090 → Region → Ubuntu 22.04 → add your SSH key → Deploy

See Getting Started or SSH Connection for details.

Step 2: Connect to your instance

ssh <user>@<ipAddress>

Replace <user> with the username shown in the instance details panel (e.g., root or ubuntu) and <ipAddress> with your instance's public IP.

Step 3: Set up the environment

sudo apt update && apt install -y software-properties-common curl ca-certificates
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 4: Install Python 3.11

sudo apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Step 5: Create virtual environment

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate

Step 6: Install PyTorch (CUDA)

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Step 7: Install dependencies

pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests
pip install -U bitsandbytes

Create the runner script

Create run_januscoder.py using nano, vim, or an SSH-capable editor:

#!/usr/bin/env python3
 
# JanusCoderV-8B runner (InternVL head)
# Uses AutoModelForImageTextToText + AutoProcessor and supports URL/local images.
 
import argparse
import io
import sys
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText  # <-- key class
 
MODEL_NAME = "internlm/JanusCoderV-8B"
 
def load_image_from_url(url: str) -> Image.Image:
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    return Image.open(io.BytesIO(response.content)).convert("RGB")
 
def load_image_local(path: str) -> Image.Image:
    return Image.open(path).convert("RGB")
 
def main():
    parser = argparse.ArgumentParser()
    source_group = parser.add_mutually_exclusive_group(required=True)
    source_group.add_argument("--image-url", type=str, help="URL of the image to process")
    source_group.add_argument("--image-path", type=str, help="Local path to the image file")
    parser.add_argument("--task", type=str, default="Please describe the image explicitly.", help="Task description for the model")
    parser.add_argument("--max-new-tokens", type=int, default=1024, help="Maximum number of new tokens to generate")
    parser.add_argument("--bits8", action="store_true", help="Load model in 8-bit mode (requires bitsandbytes)")
    parser.add_argument("--no-bf16", action="store_true", help="Force FP16 inputs instead of BF16")
    args = parser.parse_args()
 
    use_bf16 = (not args.no_bf16) and torch.cuda.is_available() and torch.cuda.is_bf16_supported()
    input_dtype = torch.bfloat16 if use_bf16 else torch.float16
 
    print(f"torch={torch.__version__} | cuda={torch.cuda.is_available()} | bf16_ok={use_bf16} | dtype={input_dtype}")
 
    print("Loading processor …")
    processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)
 
    print("Loading model …")
    load_kwargs = dict(device_map="auto", trust_remote_code=True)
    if args.bits8:
        load_kwargs["load_in_8bit"] = True
    else:
        load_kwargs["torch_dtype"] = input_dtype  # Use torch_dtype for consistency
    model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, **load_kwargs).eval()
 
    # Build messages with either URL or PIL image
    content = []
    if args.image_url:
        content.append({"type": "image", "url": args.image_url})
    else:
        pil_image = load_image_local(args.image_path)
        content.append({"type": "image", "image": pil_image})
    content.append({"type": "text", "text": args.task})
    messages = [{"role": "user", "content": content}]
 
    print("Tokenizing …")
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )
 
    # Move input tensors to model device/dtype
    device = next(iter(model.parameters())).device
    for key, value in list(inputs.items()):
        if torch.is_floating_point(value):
            inputs[key] = value.to(device, dtype=input_dtype)
        else:
            inputs[key] = value.to(device)
 
    print("Generating …")
    with torch.inference_mode():
        output_ids = model.generate(**inputs, max_new_tokens=args.max_new_tokens, do_sample=False, use_cache=True)
    prompt_length = inputs["input_ids"].shape[1]
    generated_text = processor.decode(output_ids[0, prompt_length:], skip_special_tokens=True)
 
    print("\n" + "=" * 80 + "\nOUTPUT:\n" + "=" * 80)
    print(generated_text)
 
if __name__ == "__main__":
    main()

Usage examples

Basic image description

python run_januscoder.py \
  --image-url https://c7.alamy.com/comp/BHKEPY/woman-running-with-two-rottweilers-canis-lupus-familiaris-in-garden-BHKEPY.jpg

Generate HTML/CSS from a mockup

python run_januscoder.py \
  --image-url https://example.com/mockup.jpg \
  --task "Generate responsive HTML+CSS from this mockup."

Process a local image

python run_januscoder.py \
  --image-path /path/to/image.jpg \
  --task "Convert this UI mockup into React components."

Convert a chart to code

python run_januscoder.py \
  --image-url https://example.com/chart.png \
  --task "Generate matplotlib code to recreate this chart."

Fix layout bugs

python run_januscoder.py \
  --image-path screenshot.png \
  --task "Identify layout issues and provide corrected CSS."

Configuration

Arguments:
  • --image-url or --image-path: Input source (URL or local file path)
  • --task: Task description (default: "describe image")
  • --max-new-tokens: Maximum output length (default: 1024)
  • --bits8: Enable 8-bit quantization to reduce VRAM usage
  • --no-bf16: Force FP16 for GPU compatibility

8-bit quantization for low VRAM:

python run_januscoder.py --image-url <image-url> --bits8

Long output for complex tasks:

python run_januscoder.py --image-url <image-url> --max-new-tokens 4096

FP16 mode for GPU compatibility:

python run_januscoder.py --image-url <image-url> --no-bf16

Performance optimization

  • Memory: Use --bits8, lower --max-new-tokens, and batch smaller workloads.
  • Speed: Use BF16 on A100 and H100 GPUs; ensure CUDA is properly configured and caching is enabled.
  • Quality: Use high-resolution images, write detailed task prompts, and increase --max-new-tokens for complex outputs.

Use cases

  • Web development: Convert mockups to HTML/CSS, generate responsive layouts, fix layout bugs, create React and Vue components.
  • Data visualization: Convert charts to matplotlib or plotly code and generate interactive dashboards.
  • Animation: Rebuild animations using Manim, generate SVG, or create CSS animations.
  • Documentation: Generate code explanations, visual docs, and GUI documentation from screenshots.

Troubleshooting

Issue: Out-of-memory (OOM) error

Use 8-bit quantization, reduce output length, or switch to FP16:

# Use 8-bit quantization
python run_januscoder.py --image-url <image-url> --bits8
 
# Reduce output length
python run_januscoder.py --image-url <image-url> --max-new-tokens 512
 
# Use FP16
python run_januscoder.py --image-url <image-url> --no-bf16

Issue: Model download failures

Set a custom cache directory with sufficient storage:

export HF_HOME=/path/to/large/storage
export TRANSFORMERS_CACHE=/path/to/large/storage
python run_januscoder.py --image-url <image-url>

Issue: CUDA errors

Verify CUDA availability and reinstall PyTorch if needed:

python -c "import torch; print(torch.cuda.is_available())"
nvidia-smi
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Issue: Image loading errors

Download the image locally or fix file permissions:

wget https://example.com/image.jpg
python run_januscoder.py --image-path image.jpg --task "Your task"
chmod 644 /path/to/image.jpg

Best practices

  • Prompts: Be specific. Include the target format (HTML/CSS, Python) and framework (React, Vue).
  • Images: Use high-resolution, well-lit, and cropped images in standard formats (JPEG or PNG).
  • Output: Save generated code to files, review before use, iterate on prompts, and track what works.
  • Resources: Monitor GPU usage with nvidia-smi, close unused processes, use quantization, and batch tasks where possible.

Integration example

Python subprocess wrapper:

import subprocess
 
def generate_code_from_image(image_path, task):
    cmd = ["python", "run_januscoder.py", "--image-path", image_path, "--task", task, "--max-new-tokens", "2048"]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout
 
code = generate_code_from_image("mockup.png", "Generate React components")

Flask API wrapper:

from flask import Flask, request, jsonify
import subprocess
 
app = Flask(__name__)
 
@app.route('/generate', methods=['POST'])
def generate():
    data = request.json
    cmd = ["python", "run_januscoder.py", "--image-url", data['image_url'], "--task", data.get('task', 'Describe')]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return jsonify({"code": result.stdout})
 
app.run(port=5000)

Performance on Spheron

TaskRTX 4090 timeA100 timeVRAM usage
Simple description5 s3 s12 GB
HTML generation10 s6 s14 GB
Complex output (2K tokens)20 s12 s16 GB
Full output (4K tokens)40 s24 s18 GB

Supported output formats

  • Web: HTML/CSS, JavaScript, React, Vue, Tailwind, Bootstrap
  • Data visualization: Python (matplotlib, plotly), JavaScript (D3, Chart.js), R (ggplot2)
  • Animation: Manim, CSS, JavaScript, SVG
  • Other: SVG, LaTeX, Processing, Three.js

Cloud-init startup script (optional)

If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the environment setup. After the instance is ready, SSH in, activate the virtual environment, and create the run_januscoder.py script following the Create Runner Script section above.

#cloud-config
runcmd:
  - apt update && apt install -y software-properties-common curl ca-certificates
  - add-apt-repository -y ppa:deadsnakes/ppa
  - apt update
  - apt install -y python3.11 python3.11-venv python3.11-dev
  - python3.11 -m ensurepip --upgrade
  - python3.11 -m pip install --upgrade pip setuptools wheel
  - python3.11 -m venv /opt/januscoder
  - /opt/januscoder/bin/pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
  - /opt/januscoder/bin/pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests bitsandbytes

After cloud-init completes, SSH in, activate the environment, and create the runner script:

source /opt/januscoder/bin/activate
# Create run_januscoder.py as shown in the "Create the runner script" section above
python run_januscoder.py --image-url https://example.com/image.jpg

What's next