FMS Model Optimizer

Introduction

FMS Model Optimizer is a framework for developing reduced precision neural network models. Quantization techniques, such as quantization-aware-training (QAT), post-training quantization (PTQ), and several other optimization techniques on popular deep learning workloads are supported.

Highlights

Python API to enable model quantization: With the addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
Robust: Verified for INT 8/4-bit quantization on important vision/speech/NLP/object detection/LLMs.
Flexible: Options to analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. during quantization.
State-of-the-art INT and FP quantization techniques for weights and activations, such as SmoothQuant, SAWB+ and PACT+.
Supports key compute-intensive operations like Conv2d, Linear, LSTM, MM and BMM

Supported Models

	GPTQ	FP8	PTQ	QAT
Granite	✅	✅	✅	🔲
Llama	✅	✅	✅	🔲
Mixtral	✅	✅	✅	🔲
BERT/Roberta	✅	✅	✅	✅

Note: Direct QAT on LLMs is not recommended

Getting Started

Requirements

🐧 Linux system with Nvidia GPU (V100/A100/H100)
Python 3.9 to Python 3.11

📋 Python 3.12 is currently not supported due to PyTorch Dynamo constraint
CUDA >=12

Optional packages based on optimization functionality required:

GPTQ is a popular compression method for LLMs:
- auto_gptq or build from source
If you want to experiment with INT8 deployment in QAT and PTQ examples:
- Nvidia GPU with compute capability > 8.0 (A100 family or higher)
- Ninja
- Clone the CUTLASS repository
- PyTorch 2.3.1 (as newer version will cause issue for the custom CUDA kernel used in these examples)
FP8 is a reduced precision format like INT8:
- Nvidia A100 family or higher
- llm-compressor
To enable compute graph plotting function (mostly for troubleshooting purpose):

Note

PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.

Installation

We recommend using a Python virtual environment with Python 3.9+. Here is how to setup a virtual environment using Python venv:

python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate

Tip

If you use pyenv, Conda Miniforge or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not venv.

There are 2 ways to install the FMS Model Optimizer as follows:

From Release

To install from release (PyPi package):

python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
pip install fms-model-optimizer

From Source

To install from source(GitHub Repository):

python3 -m venv fms_mo_venv
source fms_mo_venv/bin/activate
git clone https://github.com/foundation-model-stack/fms-model-optimizer
cd fms-model-optimizer
pip install -e .

Try It Out!

To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:

Jupyter notebook tutorials (It is recommended to begin here):
- Quantization tutorial:
  - Visualizes a random Gaussian tensor step-by-step along the quantization process
  - Build a quantizer and quantized convolution module based on this process
Python script examples

Docs

Dive into the design document to get a better understanding of the framework motivation and concepts.

Contributing

Check out our contributing guide to learn how to contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github		.github
docs		docs
examples		examples
fms_mo		fms_mo
tests		tests
tutorials		tutorials
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.spellcheck-en-custom.txt		.spellcheck-en-custom.txt
.spellcheck.yml		.spellcheck.yml
.whitesource		.whitesource
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FMS Model Optimizer

Introduction

Highlights

Supported Models

Getting Started

Requirements

Installation

From Release

From Source

Try It Out!

Docs

Contributing

About

Releases 1

Packages

Contributors 5

Languages

License

foundation-model-stack/fms-model-optimizer

Folders and files

Latest commit

History

Repository files navigation

FMS Model Optimizer

Introduction

Highlights

Supported Models

Getting Started

Requirements

Installation

From Release

From Source

Try It Out!

Docs

Contributing

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages