Janus CoderV-8B
Deploy JanusCoderV-8B on a Spheron GPU instance. JanusCoderV-8B is an 8B multimodal model that generates code from visual inputs including charts, screenshots, and UI mockups. It converts images into HTML, CSS, React components, and data visualization code.
Key capabilities
- Visual-to-code translation: converts charts and screenshots to HTML and code
- Layout bug fixing from screenshot inputs
- Animation reconstruction using Manim
- 32K token context support
- Multimodal understanding across text, images, and code
- ChartMimic: 74.20 (beats Qwen2.5VL-7B, InternVL3.5-8B)
- WebCode2,M: 18.28 (best open-weight structural correctness)
- InteractScience: 33.32 (visual metrics leader)
Requirements
Hardware:- GPU: RTX 4090, A100, or H100 (16 GB VRAM minimum, 24 GB recommended)
- RAM: 16 GB (32 GB for large context workloads)
- Storage: 20 GB (SSD recommended)
- Ubuntu 22.04 LTS
- CUDA 12.1 or later
- Python 3.11
Manual setup
Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.
Step 1: Provision a Spheron instance
- Sign up at app.spheron.ai
- Add credits (card/crypto)
- Click Deploy → RTX 4090 → Region → Ubuntu 22.04 → add your SSH key → Deploy
See Getting Started or SSH Connection for details.
Step 2: Connect to your instance
ssh <user>@<ipAddress>Replace <user> with the username shown in the instance details panel (e.g., root or ubuntu) and <ipAddress> with your instance's public IP.
Step 3: Set up the environment
sudo apt update && apt install -y software-properties-common curl ca-certificates
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt updateStep 4: Install Python 3.11
sudo apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheelStep 5: Create virtual environment
python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activateStep 6: Install PyTorch (CUDA)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudioStep 7: Install dependencies
pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests
pip install -U bitsandbytesCreate the runner script
Create run_januscoder.py using nano, vim, or an SSH-capable editor:
#!/usr/bin/env python3
# JanusCoderV-8B runner (InternVL head)
# Uses AutoModelForImageTextToText + AutoProcessor and supports URL/local images.
import argparse
import io
import sys
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText # <-- key class
MODEL_NAME = "internlm/JanusCoderV-8B"
def load_image_from_url(url: str) -> Image.Image:
response = requests.get(url, timeout=30)
response.raise_for_status()
return Image.open(io.BytesIO(response.content)).convert("RGB")
def load_image_local(path: str) -> Image.Image:
return Image.open(path).convert("RGB")
def main():
parser = argparse.ArgumentParser()
source_group = parser.add_mutually_exclusive_group(required=True)
source_group.add_argument("--image-url", type=str, help="URL of the image to process")
source_group.add_argument("--image-path", type=str, help="Local path to the image file")
parser.add_argument("--task", type=str, default="Please describe the image explicitly.", help="Task description for the model")
parser.add_argument("--max-new-tokens", type=int, default=1024, help="Maximum number of new tokens to generate")
parser.add_argument("--bits8", action="store_true", help="Load model in 8-bit mode (requires bitsandbytes)")
parser.add_argument("--no-bf16", action="store_true", help="Force FP16 inputs instead of BF16")
args = parser.parse_args()
use_bf16 = (not args.no_bf16) and torch.cuda.is_available() and torch.cuda.is_bf16_supported()
input_dtype = torch.bfloat16 if use_bf16 else torch.float16
print(f"torch={torch.__version__} | cuda={torch.cuda.is_available()} | bf16_ok={use_bf16} | dtype={input_dtype}")
print("Loading processor …")
processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)
print("Loading model …")
load_kwargs = dict(device_map="auto", trust_remote_code=True)
if args.bits8:
load_kwargs["load_in_8bit"] = True
else:
load_kwargs["torch_dtype"] = input_dtype # Use torch_dtype for consistency
model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, **load_kwargs).eval()
# Build messages with either URL or PIL image
content = []
if args.image_url:
content.append({"type": "image", "url": args.image_url})
else:
pil_image = load_image_local(args.image_path)
content.append({"type": "image", "image": pil_image})
content.append({"type": "text", "text": args.task})
messages = [{"role": "user", "content": content}]
print("Tokenizing …")
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
)
# Move input tensors to model device/dtype
device = next(iter(model.parameters())).device
for key, value in list(inputs.items()):
if torch.is_floating_point(value):
inputs[key] = value.to(device, dtype=input_dtype)
else:
inputs[key] = value.to(device)
print("Generating …")
with torch.inference_mode():
output_ids = model.generate(**inputs, max_new_tokens=args.max_new_tokens, do_sample=False, use_cache=True)
prompt_length = inputs["input_ids"].shape[1]
generated_text = processor.decode(output_ids[0, prompt_length:], skip_special_tokens=True)
print("\n" + "=" * 80 + "\nOUTPUT:\n" + "=" * 80)
print(generated_text)
if __name__ == "__main__":
main()Usage examples
Basic image description
python run_januscoder.py \
--image-url https://c7.alamy.com/comp/BHKEPY/woman-running-with-two-rottweilers-canis-lupus-familiaris-in-garden-BHKEPY.jpgGenerate HTML/CSS from a mockup
python run_januscoder.py \
--image-url https://example.com/mockup.jpg \
--task "Generate responsive HTML+CSS from this mockup."Process a local image
python run_januscoder.py \
--image-path /path/to/image.jpg \
--task "Convert this UI mockup into React components."Convert a chart to code
python run_januscoder.py \
--image-url https://example.com/chart.png \
--task "Generate matplotlib code to recreate this chart."Fix layout bugs
python run_januscoder.py \
--image-path screenshot.png \
--task "Identify layout issues and provide corrected CSS."Configuration
Arguments:--image-urlor--image-path: Input source (URL or local file path)--task: Task description (default: "describe image")--max-new-tokens: Maximum output length (default: 1024)--bits8: Enable 8-bit quantization to reduce VRAM usage--no-bf16: Force FP16 for GPU compatibility
8-bit quantization for low VRAM:
python run_januscoder.py --image-url <image-url> --bits8Long output for complex tasks:
python run_januscoder.py --image-url <image-url> --max-new-tokens 4096FP16 mode for GPU compatibility:
python run_januscoder.py --image-url <image-url> --no-bf16Performance optimization
- Memory: Use
--bits8, lower--max-new-tokens, and batch smaller workloads. - Speed: Use BF16 on A100 and H100 GPUs; ensure CUDA is properly configured and caching is enabled.
- Quality: Use high-resolution images, write detailed task prompts, and increase
--max-new-tokensfor complex outputs.
Use cases
- Web development: Convert mockups to HTML/CSS, generate responsive layouts, fix layout bugs, create React and Vue components.
- Data visualization: Convert charts to matplotlib or plotly code and generate interactive dashboards.
- Animation: Rebuild animations using Manim, generate SVG, or create CSS animations.
- Documentation: Generate code explanations, visual docs, and GUI documentation from screenshots.
Troubleshooting
Issue: Out-of-memory (OOM) error
Use 8-bit quantization, reduce output length, or switch to FP16:
# Use 8-bit quantization
python run_januscoder.py --image-url <image-url> --bits8
# Reduce output length
python run_januscoder.py --image-url <image-url> --max-new-tokens 512
# Use FP16
python run_januscoder.py --image-url <image-url> --no-bf16Issue: Model download failures
Set a custom cache directory with sufficient storage:
export HF_HOME=/path/to/large/storage
export TRANSFORMERS_CACHE=/path/to/large/storage
python run_januscoder.py --image-url <image-url>Issue: CUDA errors
Verify CUDA availability and reinstall PyTorch if needed:
python -c "import torch; print(torch.cuda.is_available())"
nvidia-smi
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudioIssue: Image loading errors
Download the image locally or fix file permissions:
wget https://example.com/image.jpg
python run_januscoder.py --image-path image.jpg --task "Your task"
chmod 644 /path/to/image.jpgBest practices
- Prompts: Be specific. Include the target format (HTML/CSS, Python) and framework (React, Vue).
- Images: Use high-resolution, well-lit, and cropped images in standard formats (JPEG or PNG).
- Output: Save generated code to files, review before use, iterate on prompts, and track what works.
- Resources: Monitor GPU usage with
nvidia-smi, close unused processes, use quantization, and batch tasks where possible.
Integration example
Python subprocess wrapper:
import subprocess
def generate_code_from_image(image_path, task):
cmd = ["python", "run_januscoder.py", "--image-path", image_path, "--task", task, "--max-new-tokens", "2048"]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.stdout
code = generate_code_from_image("mockup.png", "Generate React components")Flask API wrapper:
from flask import Flask, request, jsonify
import subprocess
app = Flask(__name__)
@app.route('/generate', methods=['POST'])
def generate():
data = request.json
cmd = ["python", "run_januscoder.py", "--image-url", data['image_url'], "--task", data.get('task', 'Describe')]
result = subprocess.run(cmd, capture_output=True, text=True)
return jsonify({"code": result.stdout})
app.run(port=5000)Performance on Spheron
| Task | RTX 4090 time | A100 time | VRAM usage |
|---|---|---|---|
| Simple description | 5 s | 3 s | 12 GB |
| HTML generation | 10 s | 6 s | 14 GB |
| Complex output (2K tokens) | 20 s | 12 s | 16 GB |
| Full output (4K tokens) | 40 s | 24 s | 18 GB |
Supported output formats
- Web: HTML/CSS, JavaScript, React, Vue, Tailwind, Bootstrap
- Data visualization: Python (matplotlib, plotly), JavaScript (D3, Chart.js), R (ggplot2)
- Animation: Manim, CSS, JavaScript, SVG
- Other: SVG, LaTeX, Processing, Three.js
Cloud-init startup script (optional)
If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the environment setup. After the instance is ready, SSH in, activate the virtual environment, and create the run_januscoder.py script following the Create Runner Script section above.
#cloud-config
runcmd:
- apt update && apt install -y software-properties-common curl ca-certificates
- add-apt-repository -y ppa:deadsnakes/ppa
- apt update
- apt install -y python3.11 python3.11-venv python3.11-dev
- python3.11 -m ensurepip --upgrade
- python3.11 -m pip install --upgrade pip setuptools wheel
- python3.11 -m venv /opt/januscoder
- /opt/januscoder/bin/pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
- /opt/januscoder/bin/pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests bitsandbytesAfter cloud-init completes, SSH in, activate the environment, and create the runner script:
source /opt/januscoder/bin/activate
# Create run_januscoder.py as shown in the "Create the runner script" section above
python run_januscoder.py --image-url https://example.com/image.jpgWhat's next
- Specialized Models: Compare Janus CoderV with Chandra OCR and SoulX Podcast
- Instance Types: Select the right GPU for multimodal code generation
- Getting Started: Create a Spheron account and deploy your first instance
- SSH Connection: Connect to your instance after deployment