Skip to content

Plug-and-Play Optimal Page Replacement Algorithm (OPT) for Torch Dataset

License

Notifications You must be signed in to change notification settings

Puiching-Memory/OPT4TorchDataset

Repository files navigation

OPT4TorchDataset

Plug-and-Play Optimal Page Replacement Algorithm (OPT) for Torch Dataset

English | 中文

Experiment Overview A Experiment Overview B

What is OPT?

OPT (Optimal Page Replacement Algorithm) is the theoretically optimal page replacement algorithm, also known as Bélády's algorithm. It requires knowing the future access pattern to decide which data item in the cache should be replaced, thereby achieving the maximum cache hit rate.

  • OPT needs to know the future access sequence
  • Theoretically best cache hit rate
  • Always replaces the data item that will be accessed "furthest in the future"

In deep learning training, data access patterns are usually predictable (determined by the sampler), which allows us to apply the OPT algorithm to deep learning training:

  • Improve data loading speed
  • Increase cache hit rate, more efficient than cache algorithms like LRU, LFU, etc.

flow.png

Multi-process Loading and Distributed Support

The project now fully supports multi-process environments (num_workers > 0):

  1. Shared OPT Cache: The SharedOPTCacheDecorator implements efficient cross-process shared caching. It utilizes shared memory technology to ensure all data loading processes access the same cache pool, maximizing memory utilization and significantly reducing computation overhead for sample generation.
  2. Picklable Caches: All traditional caches (LRU, LFU, FIFO, RR) are now picklable via the CachetoolsDecorator. They can run correctly under the spawn start method (common on Windows) without PicklingError.
  3. Computational Advantage: In scenarios with complex CPU transformations (e.g., high-resolution image augmentation), OPT caching still provides substantial performance gains by eliminating redundant computations, even when parallel prefetching is enabled.

Install

pip install OPT4TorchDataset

Currently, we have not pushed the wheel package to pypi, so this method cannot be used. You can go to our Github Actions page to get the automatically built whl package for manual installation.

Quick Start

Method 1: Using get_opt_cache (Recommended)

from OPT4TorchDataSet import generate_precomputed_file, get_opt_cache
from torch.utils.data import DataLoader

# Step 1: Offline generation of precomputed file (one-time)
generate_precomputed_file(
    dataset_size=10000,
    total_iterations=100000,
    persist_path="precomputed/my_plan.safetensors",
    random_seed=0,
    maxsize=3000
)

# Step 2: Create cache decorator (Intelligent mode: auto-handles single/multi-process)
# num_workers=0 automatically uses high-performance Python version
# num_workers>0 automatically uses Shared Memory C++ version
dataset = MyDataset()
dataset.cache = get_opt_cache(
    num_workers=0,
    precomputed_path="precomputed/my_plan.safetensors",
    maxsize=3000,
    total_iter=100000,           # Required for Python mode
    dataset_size=10000,          # Required for Shared Memory (C++) mode
    item_shape=(3, 224, 224)    # Required for Shared Memory (C++) mode
)

# Step 3: Apply to dataset
dataset.__getitem__ = dataset.cache(dataset.__getitem__)

# Use DataLoader
dataloader = DataLoader(dataset, batch_size=32, num_workers=0)
for batch in dataloader:
    pass

Method 2: CLI

python -m src.OPT4TorchDataSet.cli \
    --dataset-size 10000 \
    --total-iter 100000 \
    --output precomputed/my_experiment.safetensors \
    --seed 0
  • --dataset-size: Required. Dataset size
  • --total-iter: Required. Total number of accesses for precomputation
  • --output: Required. File path to save precomputed results (.safetensors format)
  • --seed: Random seed to ensure reproducible results (optional)
  • --no-replacement: Disable replacement sampling (optional)

Developer Guide

Local Development Usage

If you have not installed the package but are using the source code directly, you can start the CLI in the following way:

# In the project root directory
python -m src.OPT4TorchDataSet.cli \
    --total-iter 100000 \
    --output ./precomputed/imagenet_opt.safetensors \
    --seed 42

Environment Configuration

Compatibility Matrix

OS CUDA Version GPU Model SM Architecture
Ubuntu 24.04 12.8.2 H800 sm90
Windows 11 12.9.1 NVIDIA 4060Ti sm89
Windows 11 13.0.2 NVIDIA 4060Ti sm89

Installation Steps

Create Conda Environment

uv venv --python 3.14
.venv\Scripts\activate.ps1
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
uv pip install -r requirements.txt
uv pip install -U "triton-windows" # Optional for Windows

Select GPU Device

export CUDA_VISIBLE_DEVICES=0

Set Hugging Face Mirror (improve download speed)

$env:HF_ENDPOINT = "https://hf-mirror.com" # on Windows

Build Python wheel package

uv pip install build
uv python -m build

Experiment

All experiment results are saved as JSON files in the results/ subdirectory of each experiment.

Model FIFO Time(s) LFU Time(s) LRU Time(s) OPT Time(s) RR Time(s) none Time(s) warmUp Time(s)
convnextv2_base 179.1515 167.4298 149.3036 136.9226 163.4252 195.6518 159.9803
davit_base 118.902 118.8557 711.6681 82.9771 132.2939 120.6464 137.6575
mobilenetv3_small_100 97.6476 95.3437 82.9322 47.3384 69.0704 103.9736 112.7548
mobilenetv5_base 168.7884 166.6462 171.9088 150.2256 172.5975 189.045 230.02
resnet50 65.3436 71.7038 69.5199 42.7169 69.1713 84.1985 73.032
swin_base_patch4_window7_224 126.8982 137.4744 155.4214 85.9536 114.0388 125.5925 140.2842
swinv2_cr_base_224 126.8298 156.9438 181.8895 104.4104 182.2423 191.7603 158.8843
vit_base_patch16_224 115.6438 98.5015 105.8998 67.547 116.7467 127.2664 108.5507
vit_base_patch16_dinov3 94.6338 97.7933 124.6119 96.1758 91.3743 160.4503 135.4348
Batch Size: 16 | Num Workers: 0 | AMP Enabled: TRUE | Epochs: 5 | Cache Size Ratio: 0.3

About

Plug-and-Play Optimal Page Replacement Algorithm (OPT) for Torch Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published