Chandra OCR

Deploy Chandra OCR on a Spheron GPU instance. Chandra OCR converts images and PDFs into structured Markdown, HTML, or JSON while preserving document layout, hierarchy, and visual elements. It achieves 83.1% accuracy on the olmOCR benchmark, outperforming GPT-4o, Mistral OCR, and DeepSeek OCR.

Key capabilities

Multi-format output (Markdown, HTML, JSON)
Handwriting recognition
Form reconstruction (including checkboxes)
Complex layouts (tables, math equations)
Visual element extraction (images, diagrams, captions)
40+ languages

Chandra OCR supports two inference modes:

Local: HuggingFace transformers for privacy-sensitive and edge deployments
Remote: vLLM server for scalable production and high-throughput pipelines

Benchmark accuracy on olmOCR (83.1% overall):

Category	Accuracy
Headers/Footers	90.8%
Long Tiny Text	92.3%
Tables	88.0%
ArXiv	82.2%

Accuracy vs. competitors: +13.2 pp vs. GPT-4o, +19.3 pp vs. Gemini Flash 2, +4 pp vs. dots.ocr

Deployment tiers

Tier	GPU	Performance	Use Case
Dev/Test	CPU	0.1-0.3 img/s	PoC, batch processing
Cost-Optimized	RTX 3060/4060 Ti (4-bit)	0.4-0.8 img/s	Moderate volumes
High-Performance	RTX 3090/4090, L40S (BF16/FP16)	1.5-3.0 img/s	High daily volumes
Enterprise	A100/H100 (FlashAttention2)	3.0-5.0 img/s	Mission-critical pipelines
Distributed	2x A100/H100 (tensor-parallel)	5.0-8.0 img/s	Real-time OCR services

The model weights are available on HuggingFace.

Manual setup

Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.

Step 1: Provision a Spheron instance

Sign up at app.spheron.ai
Add credits (card/stables)
Click Deploy → Select GPU (see Deployment Tiers above) → Region → Ubuntu 22.04 → add your SSH key → Deploy

See Getting Started or SSH Connection for details.

Step 2: Connect to your instance

ssh <user>@<ipAddress>

Replace <user> with the username shown in the instance details panel (e.g., root or ubuntu) and <ipAddress> with your instance's public IP.

Step 3: Update system packages

sudo apt update && sudo apt install -y software-properties-common curl ca-certificates

Step 4: Add Python PPA repository

sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 5: Install Python 3.11

sudo apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev

Step 6: Set up pip

python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Step 7: Create virtual environment

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate

Step 8: Install PyTorch (CUDA 12.1)

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Step 9: Install Chandra OCR dependencies

pip install chandra-ocr vllm transformers accelerate pillow bitsandbytes

Usage

Launch web interface

chandra_app

Access at: http://localhost:8501

Features:

Upload PDFs or images
Visualize OCR results
Export as Markdown, HTML, or JSON

Programmatic usage

from chandra_ocr import ChandraOCR
 
# Initialize the model
ocr = ChandraOCR()
 
# Process a document
result = ocr.process("path/to/document.pdf", output_format="markdown")
 
# Print the result
print(result)

Batch processing

import os
from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR()
input_dir = "path/to/documents"
output_dir = "path/to/output"
 
for filename in os.listdir(input_dir):
    if filename.endswith((".pdf", ".png", ".jpg")):
        input_path = os.path.join(input_dir, filename)
        result = ocr.process(input_path, output_format="markdown")
        
        output_path = os.path.join(output_dir, f"{filename}.md")
        with open(output_path, "w") as f:
            f.write(result)

Advanced configuration

vLLM server (high throughput)

# Install vLLM if not already installed
pip install vllm
 
# Start the vLLM server
python -m vllm.entrypoints.openai.api_server \
    --model datalab-to/chandra \
    --dtype bfloat16 \
    --max-model-len 4096

Custom parameters

from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR(
    max_tokens=2048,
    temperature=0.7,
    batch_size=4,
    use_flash_attention=True
)

Performance optimization

Memory

Use 4-bit or 8-bit quantization to reduce VRAM requirements.
Reduce batch size when you encounter out-of-memory errors.
Enable gradient checkpointing for large documents.

Speed

Enable FlashAttention2 on A100 and H100 GPUs.
Use vLLM for concurrent multi-request processing.
Use distributed inference for high-volume workloads.

Accuracy

Use BF16 or FP16 precision for full-precision output.
Process images at 2560 px or higher resolution.
Apply multi-pass processing for critical documents.

Troubleshooting

OOM errors

# Solution 1: Reduce batch size
ocr = ChandraOCR(batch_size=1)
 
# Solution 2: Use quantization
pip install bitsandbytes
ocr = ChandraOCR(quantization="4bit")
 
# Solution 3: Lower resolution
ocr.process("document.pdf", max_resolution=1920)

Slow processing

# Ensure CUDA is properly configured
python -c "import torch; print(torch.cuda.is_available())"
 
# Check GPU utilization
nvidia-smi
 
# Enable vLLM for better throughput
# See Advanced Configuration section above

Installation issues

# If pip install fails, try:
pip install --no-cache-dir chandra-ocr
 
# Or install dependencies separately:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate
pip install chandra-ocr

Supported formats

Input formats: PNG, JPEG, JPG, TIFF, BMP, WebP, PDF, scanned documents, screenshots

Specialized document types: Academic papers, forms, tables, equations, diagrams, handwritten notes

Output formats:

Markdown: Preserves structure, hierarchy, and formatting
HTML: Browser-ready output with semantic markup
JSON: Includes text, layout, bounding boxes, confidence scores, and metadata

Best practices

Document quality

Use good lighting and scan at 300 DPI or higher.
Avoid skewed or rotated pages and remove background noise before processing.

Deployment

Start with the Balanced tier and scale up as volume increases.
Monitor GPU usage and adjust batch sizes to stay within VRAM limits.
Add error handling and retry logic to your pipeline.

Production

Use async processing for web applications.
Use a queue system for high-volume workloads.
Cache results for frequently processed documents.
Add logging and monitoring to track throughput and errors.

Use cases

Enterprise: Legacy archives, invoice automation, contract analysis, compliance reporting
Academic: Research papers, databases, publications, historical documents
Legal and financial: Contracts, statements, filings, due diligence review
Healthcare: Medical records, prescriptions, forms, clinical trial documents

Cloud-init startup script (optional)

If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the environment setup. After the instance is ready, SSH in and run . /opt/chandra-ocr/bin/activate && chandra_app to launch the web interface.

#cloud-config
runcmd:
  - apt update && apt install -y software-properties-common curl ca-certificates
  - add-apt-repository -y ppa:deadsnakes/ppa
  - apt update
  - apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev
  - python3.11 -m ensurepip --upgrade
  - python3.11 -m pip install --upgrade pip setuptools wheel
  - python3.11 -m venv /opt/chandra-ocr
  - /opt/chandra-ocr/bin/pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
  - /opt/chandra-ocr/bin/pip install chandra-ocr vllm transformers accelerate pillow bitsandbytes

After cloud-init completes, activate the environment and start the web interface:

. /opt/chandra-ocr/bin/activate
chandra_app

What's next

Specialized Models: Compare Chandra OCR with SoulX Podcast and Janus CoderV
vLLM Inference Server: Configure vLLM for high-throughput OCR pipelines
Instance Types: Select the right GPU tier for your document volume
Getting Started: Create a Spheron account and deploy your first instance