Paper a lightweight Python framework for performing matrix computations on datasets that are too large to fit into main memory. It is designed around the principle of lazy evaluation, which allows it to build a computation plan and apply powerful optimizations, such as operator fusion, before executing any costly I/O operations.
When your matrix computation is bottlenecked by I/O (data too large for RAM), Paper's intelligent tiling + prefetching strategy outperforms Dask's lazy evaluation.
The architecture is inspired by modern data systems and academic research (e.g., PreVision), with a clear separation between the logical plan, the physical execution backend, and an intelligent optimizer.
Paper now includes a NumPy-compatible API layer that provides a familiar interface for users migrating from NumPy or other array libraries. This makes it easy to leverage Paper's out-of-core capabilities with minimal code changes.
# Import Paper's NumPy-compatible API
from paper import numpy_api as pnp
import numpy as np
# Create arrays (similar to NumPy)
a = pnp.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
b = pnp.array([[7, 8, 9], [10, 11, 12]], dtype=np.float32)
# Perform operations with lazy evaluation
c = (a + b) * 2
# Execute the computation plan
result = c.compute()
print(result.to_numpy())- Familiar NumPy Interface: Use the same syntax as NumPy for array creation and operations
- Lazy Evaluation: Build computation plans without executing until
.compute()is called - Automatic Optimization: Operator fusion and intelligent caching applied automatically
- Out-of-Core Support: Handle datasets larger than memory seamlessly
- Matrix Operations: Support for addition, scalar multiplication, and matrix multiplication (@)
Array Creation:
pnp.array(data)- Create array from datapnp.zeros(shape)- Create zeros arraypnp.ones(shape)- Create ones arraypnp.eye(n)- Create identity matrixpnp.random_rand(shape)- Create random array
Operations:
a + b- Element-wise additiona * scalar- Scalar multiplicationa @ b- Matrix multiplicationa.T- Transpose
I/O:
pnp.load(filepath, shape)- Load array from filepnp.save(filepath, array)- Save array to file
See examples/numpy_api_example.py for comprehensive examples demonstrating:
- Basic array operations
- Chained operations with lazy evaluation
- Matrix multiplication
- File I/O
- Large array handling (out-of-core)
Run the examples:
python examples/numpy_api_example.pyYour application (ML, Finance, Science)
- sklearn, PyTorch, XGBoost, etc. ↓ Paper: I/O optimization layer
- Intelligent tiling + prefetching
- Lazy evaluation with compute plans
- Optimal buffer management (example, Belady inspired...) ↓ Storage (HDF5, Binary, S3, etc.)
- Paper orchestrates reads, doesn't replace
key: Paper is transparent to your application. It replaces I/O, not your business logic.
- Reuse best operations that already exists
- Provide matrix operations: @, .T, +, -, /, reductions
- Don't implement ML: No Logisitic regression, NN, clustering
- Let sklearn, PyTorch, XGBoost do their job
- Respect user workflows
- NumPy users: import paper as pnp (same API, better I/O)
- Dask users: Paper handles matrix ops, Dask handles scheduling
- sklearn users: Transparent optimization of preprocessing
- Doesn't require rewriting existing code
- Enable production ML workflows
- Feature engineering at scale (correlation matrices)
- Batch prediction on huge datasets
- Iterative solvers (scientific computing)
- All using existing frameworks, Paper optimizes I/O
- Respect User Workflows (Don't Disrupt)
NumPy users: import paper as pnp (same API, better I/O)
Dask users: Paper handles matrix ops, Dask handles scheduling
sklearn users: Transparent optimization of preprocessing
Not: Require rewriting existing code
- Enable Production ML Workflows (Don't Replace)
Feature engineering at scale (correlation matrices)
Batch prediction on huge datasets
Iterative solvers (scientific computing)
All using existing frameworks, Paper optimizes I/O
# Run all tests
python ./tests/run_tests.py
# Run a specific test
python ./tests/run_tests.py addition
python ./tests/run_tests.py fused
python ./tests/run_tests.py scalarPaper includes comprehensive benchmarking capabilities to compare performance with Dask on both synthetic and real-world datasets.
Synthetic Data (Default):
# Quick test with small matrices
python benchmarks/benchmark_dask.py --shape 1000 1000
# Standard benchmark (8k x 8k)
python benchmarks/benchmark_dask.py --shape 8192 8192
# Large benchmark (16k x 16k)
python benchmarks/benchmark_dask.py --shape 16384 16384Real-World Data:
# Generate a realistic gene expression dataset
python -m data_prep.download_dataset --output-dir real_data --size medium
# Run benchmark with real data
python benchmarks/benchmark_dask.py --use-real-data --data-dir real_dataSynthetic Data - 8kx8k matrix
==================================================
BENCHMARK COMPARISON: paper vs. Dask
==================================================
Metric | Paper (Optimal) | Dask
--------------------------------------------------
Time (s) | 28.79 | 57.00
Peak Memory (MB) | 1382.03 | 1710.95
Avg CPU Util.(%) | 170.74 | 169.30
==================================================
Synthetic Data - 16kx16k matrix
Multiplication complete.
--- Finished: Paper (Optimal Policy) in 224.7157 seconds ---
--- Running 'Dask' Benchmark ---
--- Starting: Dask ---
/usr/local/lib/python3.12/dist-packages/dask/array/routines.py:452: PerformanceWarning: Increasing number of chunks by factor of 16
out = blockwise(
--- Finished: Dask in 467.5384 seconds ---
==================================================
BENCHMARK COMPARISON: paper vs. Dask
==================================================
Metric | Paper (Optimal) | Dask
--------------------------------------------------
Time (s) | 224.72 | 467.54
Peak Memory (MB) | 3970.48 | 4738.61
Avg CPU Util.(%) | 169.33 | 162.30
==================================================
Real-World Data - Gene Expression (5k x 5k)
Paper demonstrates even better performance on structured real-world data:
======================================================================
BENCHMARK COMPARISON: Paper vs. Dask
Dataset: Real Gene Expression (5000 x 5000)
======================================================================
Metric | Paper (Optimal) | Dask
----------------------------------------------------------------------
Time (s) | 1.75 | 3.31
Peak Memory (MB) | 361.17 | 259.72
Avg CPU Util.(%) | 372.24 | 396.25
----------------------------------------------------------------------
Paper Speedup | 1.89x
Paper Memory Saving | -39.1%
======================================================================
Paper now includes a complete data preparation pipeline for working with real-world datasets. This enables benchmarking on realistic data that mimics production workloads.
Features:
- Generate realistic gene expression datasets with biological characteristics
- Convert data from common formats (HDF5, NumPy, CSV, TSV) to Paper's binary format
- Validate converted datasets for correctness
- Multiple size presets (small, medium, large, xlarge)
Quick Start:
# Generate a dataset
python -m data_prep.download_dataset --output-dir real_data --size large
# Benchmark with it
python benchmarks/benchmark_dask.py --use-real-data --data-dir real_dataSee data_prep/README.md for detailed documentation.

