Skip to content

Chandra OCR

Deploy Chandra OCR on a Spheron GPU instance. Chandra OCR converts images and PDFs into structured Markdown, HTML, or JSON while preserving document layout, hierarchy, and visual elements. It achieves 83.1% accuracy on the olmOCR benchmark, outperforming GPT-4o, Mistral OCR, and DeepSeek OCR.

Key capabilities

  • Multi-format output (Markdown, HTML, JSON)
  • Handwriting recognition
  • Form reconstruction (including checkboxes)
  • Complex layouts (tables, math equations)
  • Visual element extraction (images, diagrams, captions)
  • 40+ languages

Chandra OCR supports two inference modes:

  • Local: HuggingFace transformers for privacy-sensitive and edge deployments
  • Remote: vLLM server for scalable production and high-throughput pipelines

Benchmark accuracy on olmOCR (83.1% overall):

CategoryAccuracy
Headers/Footers90.8%
Long Tiny Text92.3%
Tables88.0%
ArXiv82.2%

Accuracy vs. competitors: +13.2 pp vs. GPT-4o, +19.3 pp vs. Gemini Flash 2, +4 pp vs. dots.ocr

Deployment tiers

TierGPUPerformanceUse Case
Dev/TestCPU0.1-0.3 img/sPoC, batch processing
Cost-OptimizedRTX 3060/4060 Ti (4-bit)0.4-0.8 img/sModerate volumes
High-PerformanceRTX 3090/4090, L40S (BF16/FP16)1.5-3.0 img/sHigh daily volumes
EnterpriseA100/H100 (FlashAttention2)3.0-5.0 img/sMission-critical pipelines
Distributed2x A100/H100 (tensor-parallel)5.0-8.0 img/sReal-time OCR services

The model weights are available on HuggingFace.

Manual setup

Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.

Step 1: Provision a Spheron instance

  1. Sign up at app.spheron.ai
  2. Add credits (card/crypto)
  3. Click Deploy → Select GPU (see Deployment Tiers above) → Region → Ubuntu 22.04 → add your SSH key → Deploy

See Getting Started or SSH Connection for details.

Step 2: Connect to your instance

ssh <user>@<ipAddress>

Replace <user> with the username shown in the instance details panel (e.g., root or ubuntu) and <ipAddress> with your instance's public IP.

Step 3: Update system packages

sudo apt update && sudo apt install -y software-properties-common curl ca-certificates

Step 4: Add Python PPA repository

sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 5: Install Python 3.11

sudo apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev

Step 6: Set up pip

python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel

Step 7: Create virtual environment

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate

Step 8: Install PyTorch (CUDA 12.1)

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Step 9: Install Chandra OCR dependencies

pip install chandra-ocr vllm transformers accelerate pillow bitsandbytes

Usage

Launch web interface

chandra_app

Access at: http://localhost:8501

Features:
  • Upload PDFs or images
  • Visualize OCR results
  • Export as Markdown, HTML, or JSON

Programmatic usage

from chandra_ocr import ChandraOCR
 
# Initialize the model
ocr = ChandraOCR()
 
# Process a document
result = ocr.process("path/to/document.pdf", output_format="markdown")
 
# Print the result
print(result)

Batch processing

import os
from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR()
input_dir = "path/to/documents"
output_dir = "path/to/output"
 
for filename in os.listdir(input_dir):
    if filename.endswith((".pdf", ".png", ".jpg")):
        input_path = os.path.join(input_dir, filename)
        result = ocr.process(input_path, output_format="markdown")
        
        output_path = os.path.join(output_dir, f"{filename}.md")
        with open(output_path, "w") as f:
            f.write(result)

Advanced configuration

vLLM server (high throughput)

# Install vLLM if not already installed
pip install vllm
 
# Start the vLLM server
python -m vllm.entrypoints.openai.api_server \
    --model datalab-to/chandra \
    --dtype bfloat16 \
    --max-model-len 4096

Custom parameters

from chandra_ocr import ChandraOCR
 
ocr = ChandraOCR(
    max_tokens=2048,
    temperature=0.7,
    batch_size=4,
    use_flash_attention=True
)

Performance optimization

Memory

  • Use 4-bit or 8-bit quantization to reduce VRAM requirements.
  • Reduce batch size when you encounter out-of-memory errors.
  • Enable gradient checkpointing for large documents.

Speed

  • Enable FlashAttention2 on A100 and H100 GPUs.
  • Use vLLM for concurrent multi-request processing.
  • Use distributed inference for high-volume workloads.

Accuracy

  • Use BF16 or FP16 precision for full-precision output.
  • Process images at 2560 px or higher resolution.
  • Apply multi-pass processing for critical documents.

Troubleshooting

OOM errors

# Solution 1: Reduce batch size
ocr = ChandraOCR(batch_size=1)
 
# Solution 2: Use quantization
pip install bitsandbytes
ocr = ChandraOCR(quantization="4bit")
 
# Solution 3: Lower resolution
ocr.process("document.pdf", max_resolution=1920)

Slow processing

# Ensure CUDA is properly configured
python -c "import torch; print(torch.cuda.is_available())"
 
# Check GPU utilization
nvidia-smi
 
# Enable vLLM for better throughput
# See Advanced Configuration section above

Installation issues

# If pip install fails, try:
pip install --no-cache-dir chandra-ocr
 
# Or install dependencies separately:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate
pip install chandra-ocr

Supported formats

Input formats: PNG, JPEG, JPG, TIFF, BMP, WebP, PDF, scanned documents, screenshots

Specialized document types: Academic papers, forms, tables, equations, diagrams, handwritten notes

Output formats:
  • Markdown: Preserves structure, hierarchy, and formatting
  • HTML: Browser-ready output with semantic markup
  • JSON: Includes text, layout, bounding boxes, confidence scores, and metadata

Best practices

Document quality

  • Use good lighting and scan at 300 DPI or higher.
  • Avoid skewed or rotated pages and remove background noise before processing.

Deployment

  • Start with the Balanced tier and scale up as volume increases.
  • Monitor GPU usage and adjust batch sizes to stay within VRAM limits.
  • Add error handling and retry logic to your pipeline.

Production

  • Use async processing for web applications.
  • Use a queue system for high-volume workloads.
  • Cache results for frequently processed documents.
  • Add logging and monitoring to track throughput and errors.

Use cases

  • Enterprise: Legacy archives, invoice automation, contract analysis, compliance reporting
  • Academic: Research papers, databases, publications, historical documents
  • Legal and financial: Contracts, statements, filings, due diligence review
  • Healthcare: Medical records, prescriptions, forms, clinical trial documents

Cloud-init startup script (optional)

If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the environment setup. After the instance is ready, SSH in and run . /opt/chandra-ocr/bin/activate && chandra_app to launch the web interface.

#cloud-config
runcmd:
  - apt update && apt install -y software-properties-common curl ca-certificates
  - add-apt-repository -y ppa:deadsnakes/ppa
  - apt update
  - apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev
  - python3.11 -m ensurepip --upgrade
  - python3.11 -m pip install --upgrade pip setuptools wheel
  - python3.11 -m venv /opt/chandra-ocr
  - /opt/chandra-ocr/bin/pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
  - /opt/chandra-ocr/bin/pip install chandra-ocr vllm transformers accelerate pillow bitsandbytes

After cloud-init completes, activate the environment and start the web interface:

. /opt/chandra-ocr/bin/activate
chandra_app

What's next