Multi-modal AI system for industrial document processing combining vision-language understanding with structured data extraction.
Category | Capabilities |
---|---|
File Support | PDF, PNG, JPG, JPEG, BMP, WEBP |
Vision Processing | 1600px image optimization, PDF-to-image conversion (300dpi), Adaptive resizing |
AI Capabilities | Text+visual comprehension, Template-based extraction, Context-aware analysis |
Performance | GPU batch processing, Async I/O, Streamed responses |
Security | Ephemeral storage, Input sanitization, CUDA sandboxing |
graph TD
A[User Upload] --> B{File Type?}
B -->|PDF| C[PDF→Image Conversion]
B -->|Image| D[Image Preprocessing]
C --> E[Vision Encoder]
D --> E
E --> F[Pixtral-12B Fusion]
F --> G[Template Engine]
G --> H[Structured Output]
System Requirements:
- NVIDIA GPU (≥16GB VRAM)
- CUDA 12.1+
- Ubuntu 22.04 LTS
Step-by-Step:
# 1. Clone repo
git clone https://github.com/yourorg/etherpdf.git && cd etherpdf
# 2. Create virtual env
python -m venv .env && source .env/bin/activate
# 3. Install core dependencies
pip install -r requirements.txt
# 4. Install CUDA extras
pip install torch==2.1.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
# 5. Verify installation
python -c "import torch; print(torch.cuda.is_available())"
# Dockerfile
FROM nvidia/cuda:12.1.0-base
RUN apt-get update && apt-get install -y python3.10
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
docker build -t etherpdf .
docker run -p 7860:7860 --gpus all etherpdf
# README.md header
---
tags: [spaces, document-intelligence]
extra_gpu_tags: true
vram: 16GB
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
spec:
containers:
- name: etherpdf
image: etherpdf:latest
resources:
limits:
nvidia.com/gpu: 1
Document Type | Pages | VRAM Usage | Processing Time | Accuracy* |
---|---|---|---|---|
Timesheet PDF | 5 | 14.2GB | 23s | 98.7% |
Project BMP | 1 | 12.8GB | 8s | 99.1% |
Scanned JPG | 10 | 15.1GB | 41s | 97.3% |
*Accuracy measured against human-annotated test set (n=1500 documents)
-
Data Flow
sequenceDiagram User->>+Server: Upload (Encrypted HTTPS) Server->>GPU: Process (Ephemeral Storage) GPU->>Server: Structured Data Server->>User: Response Server->>DB: Log Metadata*
-
Key Measures:
- File sanitization with
python-magic
MIME validation - CUDA memory isolation between sessions
- Automatic temp file purging (5min TTL)
- Dependency scanning via
safety check
- File sanitization with
Output Structure:
{
"personnel": {
"name": "John Doe",
"position": "Offshore Engineer",
"contractor_status": true,
"noc_id": "XGTH-2345"
},
"activities": {
"onshore_days": 12,
"offshore_overtime": 8.5
}
}
Common Issues:
- CUDA Out of Memory
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
- PDF Rendering Artifacts
fitz.TOOLS.set_small_glyph_heights(True) # Before processing
- Model Loading Errors
huggingface-cli delete-cache
Channel | Availability |
---|---|
GitHub Issues | 24/7 Critical Bugs |
[email protected] | Business Hours (GMT+3) |
Enterprise SLA | Contract Customers |
requirements.txt (Expanded)
# Core
gradio==3.50.0
transformers==4.30.0
torch==2.1.0+cu121
accelerate==0.27.0
# Document Processing
PyMuPDF==1.22.0
Pillow==10.0.0
python-magic==0.4.27
# Security
safety==2.3.5
python-dotenv==1.0.0
# Optimization
nvidia-cublas-cu12==12.1.3.1
nvidia-cudnn-cu12==8.9.4.25
# Utilities
requests==2.31.0
numpy==1.26.0
tqdm==4.66.1