Pylomin

Pylomin (PYtorch LOw-Memory INference) is a deep learning optimization library for low-memory inferencing in PyTorch.

Motivation

The scale of deep learning models has grown exponentially in recent years, which has greatly increased the difficulty of product deployment.

Image source: Microsoft Research Blog

The goal of this library is to enable low-cost deployment of deep learning models:

Extremely low memory requirement
- For example, we can reduce the peak memory requirement for the inference of a BERT-like model (with 1.6 GiB parameters) to 46 MiB.
Minimize memory requirements while maintaining the model throughput
- Eliminate the time waiting for parameters to load by prefetching (under development)
- TODO: add a number here after development

Peak memory is the maximum amount of memory needed to store model parameters and hidden states at any time during the model inference.

Installation

pylomin$ python3 -m pip install -e .

Getting Started

1. Lazy-loading

Load model parameters only when needed and release them immediately after use.

model = pylomin.lazy_loading(model)

Or provide a list of target_classes or target_modules to be converted to lazy-loading mode. In addition, when using target_classes, you can also provide a list of modules to be skipped.

# Use target_classes
model = pylomin.lazy_loading(model, target_classes=[nn.Linear, nn.Embedding],
                             skip_modules=[model.embeddings.word_embeddings])

# Use target_modules
target_modules = [module for module in model.modules() if some_condition]
model = pylomin.lazy_loading(model, target_modules=target_modules)

2. Chunked-embedding

Attempts to split an torch.nn.Embedding layer into multiple chunks with each has num_embeddings equal to chunk_size, except the last one.

model = pylomin.chunked_embedding(model,
                                  target_module_name='embeddings.word_embeddings',
                                  chunk_size=2048)

Examples

See examples/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pylomin

Motivation

Installation

Getting Started

1. Lazy-loading

2. Chunked-embedding

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pylomin

Motivation

Installation

Getting Started

1. Lazy-loading

2. Chunked-embedding

Examples