Chandra OCR
Deploy Chandra OCR on a Spheron GPU instance. Chandra OCR converts images and PDFs into structured Markdown, HTML, or JSON while preserving document layout, hierarchy, and visual elements. It achieves 83.1% accuracy on the olmOCR benchmark, outperforming GPT-4o, Mistral OCR, and DeepSeek OCR.
Key capabilities
- Multi-format output (Markdown, HTML, JSON)
- Handwriting recognition
- Form reconstruction (including checkboxes)
- Complex layouts (tables, math equations)
- Visual element extraction (images, diagrams, captions)
- 40+ languages
Chandra OCR supports two inference modes:
- Local: HuggingFace transformers for privacy-sensitive and edge deployments
- Remote: vLLM server for scalable production and high-throughput pipelines
Benchmark accuracy on olmOCR (83.1% overall):
| Category | Accuracy |
|---|---|
| Headers/Footers | 90.8% |
| Long Tiny Text | 92.3% |
| Tables | 88.0% |
| ArXiv | 82.2% |
Accuracy vs. competitors: +13.2 pp vs. GPT-4o, +19.3 pp vs. Gemini Flash 2, +4 pp vs. dots.ocr
Deployment tiers
| Tier | GPU | Performance | Use Case |
|---|---|---|---|
| Dev/Test | CPU | 0.1-0.3 img/s | PoC, batch processing |
| Cost-Optimized | RTX 3060/4060 Ti (4-bit) | 0.4-0.8 img/s | Moderate volumes |
| High-Performance | RTX 3090/4090, L40S (BF16/FP16) | 1.5-3.0 img/s | High daily volumes |
| Enterprise | A100/H100 (FlashAttention2) | 3.0-5.0 img/s | Mission-critical pipelines |
| Distributed | 2x A100/H100 (tensor-parallel) | 5.0-8.0 img/s | Real-time OCR services |
The model weights are available on HuggingFace.
Manual setup
Use these steps to set up the server manually after SSH-ing into your instance. This works on any provider regardless of cloud-init support.
Step 1: Provision a Spheron instance
- Sign up at app.spheron.ai
- Add credits (card/crypto)
- Click Deploy → Select GPU (see Deployment Tiers above) → Region → Ubuntu 22.04 → add your SSH key → Deploy
See Getting Started or SSH Connection for details.
Step 2: Connect to your instance
ssh <user>@<ipAddress>Replace <user> with the username shown in the instance details panel (e.g., root or ubuntu) and <ipAddress> with your instance's public IP.
Step 3: Update system packages
sudo apt update && sudo apt install -y software-properties-common curl ca-certificatesStep 4: Add Python PPA repository
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt updateStep 5: Install Python 3.11
sudo apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-devStep 6: Set up pip
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheelStep 7: Create virtual environment
python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activateStep 8: Install PyTorch (CUDA 12.1)
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudioStep 9: Install Chandra OCR dependencies
pip install chandra-ocr vllm transformers accelerate pillow bitsandbytesUsage
Launch web interface
chandra_appAccess at: http://localhost:8501
Features:- Upload PDFs or images
- Visualize OCR results
- Export as Markdown, HTML, or JSON
Programmatic usage
from chandra_ocr import ChandraOCR
# Initialize the model
ocr = ChandraOCR()
# Process a document
result = ocr.process("path/to/document.pdf", output_format="markdown")
# Print the result
print(result)Batch processing
import os
from chandra_ocr import ChandraOCR
ocr = ChandraOCR()
input_dir = "path/to/documents"
output_dir = "path/to/output"
for filename in os.listdir(input_dir):
if filename.endswith((".pdf", ".png", ".jpg")):
input_path = os.path.join(input_dir, filename)
result = ocr.process(input_path, output_format="markdown")
output_path = os.path.join(output_dir, f"{filename}.md")
with open(output_path, "w") as f:
f.write(result)Advanced configuration
vLLM server (high throughput)
# Install vLLM if not already installed
pip install vllm
# Start the vLLM server
python -m vllm.entrypoints.openai.api_server \
--model datalab-to/chandra \
--dtype bfloat16 \
--max-model-len 4096Custom parameters
from chandra_ocr import ChandraOCR
ocr = ChandraOCR(
max_tokens=2048,
temperature=0.7,
batch_size=4,
use_flash_attention=True
)Performance optimization
Memory
- Use 4-bit or 8-bit quantization to reduce VRAM requirements.
- Reduce batch size when you encounter out-of-memory errors.
- Enable gradient checkpointing for large documents.
Speed
- Enable FlashAttention2 on A100 and H100 GPUs.
- Use vLLM for concurrent multi-request processing.
- Use distributed inference for high-volume workloads.
Accuracy
- Use BF16 or FP16 precision for full-precision output.
- Process images at 2560 px or higher resolution.
- Apply multi-pass processing for critical documents.
Troubleshooting
OOM errors
# Solution 1: Reduce batch size
ocr = ChandraOCR(batch_size=1)
# Solution 2: Use quantization
pip install bitsandbytes
ocr = ChandraOCR(quantization="4bit")
# Solution 3: Lower resolution
ocr.process("document.pdf", max_resolution=1920)Slow processing
# Ensure CUDA is properly configured
python -c "import torch; print(torch.cuda.is_available())"
# Check GPU utilization
nvidia-smi
# Enable vLLM for better throughput
# See Advanced Configuration section aboveInstallation issues
# If pip install fails, try:
pip install --no-cache-dir chandra-ocr
# Or install dependencies separately:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate
pip install chandra-ocrSupported formats
Input formats: PNG, JPEG, JPG, TIFF, BMP, WebP, PDF, scanned documents, screenshots
Specialized document types: Academic papers, forms, tables, equations, diagrams, handwritten notes
Output formats:- Markdown: Preserves structure, hierarchy, and formatting
- HTML: Browser-ready output with semantic markup
- JSON: Includes text, layout, bounding boxes, confidence scores, and metadata
Best practices
Document quality
- Use good lighting and scan at 300 DPI or higher.
- Avoid skewed or rotated pages and remove background noise before processing.
Deployment
- Start with the Balanced tier and scale up as volume increases.
- Monitor GPU usage and adjust batch sizes to stay within VRAM limits.
- Add error handling and retry logic to your pipeline.
Production
- Use async processing for web applications.
- Use a queue system for high-volume workloads.
- Cache results for frequently processed documents.
- Add logging and monitoring to track throughput and errors.
Use cases
- Enterprise: Legacy archives, invoice automation, contract analysis, compliance reporting
- Academic: Research papers, databases, publications, historical documents
- Legal and financial: Contracts, statements, filings, due diligence review
- Healthcare: Medical records, prescriptions, forms, clinical trial documents
Cloud-init startup script (optional)
If your provider supports cloud-init, you can paste this into the Startup Script field when deploying to automate the environment setup. After the instance is ready, SSH in and run . /opt/chandra-ocr/bin/activate && chandra_app to launch the web interface.
#cloud-config
runcmd:
- apt update && apt install -y software-properties-common curl ca-certificates
- add-apt-repository -y ppa:deadsnakes/ppa
- apt update
- apt-get -o Acquire::Retries=3 install -y python3.11 python3.11-venv python3.11-dev
- python3.11 -m ensurepip --upgrade
- python3.11 -m pip install --upgrade pip setuptools wheel
- python3.11 -m venv /opt/chandra-ocr
- /opt/chandra-ocr/bin/pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
- /opt/chandra-ocr/bin/pip install chandra-ocr vllm transformers accelerate pillow bitsandbytesAfter cloud-init completes, activate the environment and start the web interface:
. /opt/chandra-ocr/bin/activate
chandra_appWhat's next
- Specialized Models: Compare Chandra OCR with SoulX Podcast and Janus CoderV
- vLLM Inference Server: Configure vLLM for high-throughput OCR pipelines
- Instance Types: Select the right GPU tier for your document volume
- Getting Started: Create a Spheron account and deploy your first instance