docs: add modelopt_recipes README and PTQ recipe/scheme guide by cjluo-nv · Pull Request #1662 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-06-09T23:34:09Z

What does this PR do?

Type of change: documentation

Adds two docs under modelopt_recipes/ (no code or behavior changes):

README.md — catalog of the recipe library: its purpose (a recipe is the
single, version-controlled source of truth for how a model is optimized), the
directory layout (general/, huggingface/, models/, configs/), how to
load/select recipes (load_recipe, --recipe), and a high-level map of the
general PTQ combos, speculative-decoding, and distillation recipes.
recipe.md — a focused guide to the PTQ schemes: the general general/ptq/
body scopes (full-model FP8/NVFP4, scoped experts-only / mlp-only / omlp-only,
weight-only), KV-cache modes (kv_fp8_cast / kv_nvfp4_cast / kv_fp8),
calibration variants (max / mse / gptq / layerwise), low- vs high-concurrency
deployment guidance, and the model-specific recipes under huggingface/ and
models/ — each compared to its general baseline.

Usage

# Documentation only. The recipes themselves load as before, e.g.:
from modelopt.recipe import load_recipe
cfg = load_recipe("general/ptq/nvfp4_experts_only-kv_fp8_cast")

Testing

pre-commit run --files modelopt_recipes/README.md modelopt_recipes/recipe.md
passes (markdownlint, modelopt recipe validation, license/format hooks).

Before your PR is "Ready for review"

Is this change backward compatible?: N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: ❌

Additional Information

Documentation for the modelopt_recipes/ library; content verified against the
recipe YAMLs and the modelopt.recipe / config-loader source.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation

Added comprehensive documentation for ModelOpt recipes, detailing YAML-based optimization workflows for PTQ quantization, speculative-decoding training, and diffusion distillation.
Added PTQ recipe selection guide with guidance on variant selection, model-specific configurations, and best practices for optimization.

coderabbitai · 2026-06-09T23:34:21Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d5117115-dc44-45fb-9a86-b5500c05d941

📥 Commits

Reviewing files that changed from the base of the PR and between e88f14f and f334cf8.

📒 Files selected for processing (2)

modelopt_recipes/README.md
modelopt_recipes/ptq.md

✅ Files skipped from review due to trivial changes (2)

modelopt_recipes/ptq.md
modelopt_recipes/README.md

📝 Walkthrough

Walkthrough

This PR adds two documentation files describing ModelOpt's recipe catalog system: a top-level README explaining the recipe structure, directory layout, and composition model, and a detailed PTQ recipe guide covering naming conventions, scheme taxonomy, and selection guidance.

Changes

Recipe Documentation Catalog

Layer / File(s)	Summary
Catalog overview and system structure `modelopt_recipes/README.md`	Introduces recipes as version-controlled YAML workflows, explains `$import` composition and directory layout (`general/`, `huggingface/`, `models/`, `configs/`), documents the PTQ filename-axis model (weight scope, KV-cache, calibration), describes shipped families, shared building blocks, HF-specific conventions, checkpoint-mirroring examples, and contributor guidance.
PTQ recipes, schemes, and selection guidance `modelopt_recipes/ptq.md`	Details PTQ recipe naming, enumerates shipped recipes with a combo table, defines model-body schemes (full-model, scoped, weight-only), KV-cache schemes, and calibration variants (`max`, `mse`, `gptq`, `layerwise`), provides step-by-step recipe selection guidance, and documents model-specific deviations with architecture and algorithm examples.

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding two documentation files (README and PTQ guide) to the modelopt_recipes directory.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR adds documentation and supporting Python files with no security anti-patterns: no unsafe torch.load, numpy.load, eval, exec, hardcoded trust_remote_code, or nosec comments detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/modelopt-recipes-guide

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

meenchen

Bot review — DM the bot to share feedback.

Docs-only PR adding modelopt_recipes/README.md (catalog) and recipe.md (PTQ scheme guide). Spot-checked claims against the repo:

The "All 18 general/ptq/ recipes" table matches the 18 YAMLs in modelopt_recipes/general/ptq/.
general/speculative_decoding/{eagle3,dflash}.yaml and general/distillation/dmd2_qwen_image.yaml exist as described.
huggingface/ model_type folders (gemma, mpt, nemotron_vl, phi4mm, qwen3_5, qwen3_5_moe, step3p5) and the models/Nemotron-3-Super-120B-A12B/{super-nvfp4,super-nvfp4-max-calib}.yaml files exist.
API examples (from modelopt.recipe import load_recipe, from modelopt.torch.fastgen import load_dmd_config) resolve to real exported symbols.

No code/behavior changes, no licensing impact. Content is internally consistent and accurate.

shengliangxu · 2026-06-09T23:50:40Z

+
+| Numeric × Scope | KV-cache variants shipped | Calibration variants |
+|-----------------|---------------------------|----------------------|
+| **FP8, whole model** (`fp8_default`) | `kv_fp8`, `kv_fp8_cast` | max |


for this list and the lists below, I intentionally do not put the list in any README doc because the list can be identified by just browsing the folder. And it adds burden when we update the recipe repo. I think we can just show examples instead of enumerate all of them in the doc here.

Maybe we can just add a point here for people/agents to look at.

yeah, we should just describe the structure here, listing the contents can be done by user/agent browsing the structure.

shengliangxu · 2026-06-09T23:52:14Z

@@ -0,0 +1,297 @@
+# PTQ Recipes & Schemes


Many of the contents are in the README doc at each sub folders. We can simplify docs here to be just high level descriptions. And add pointers to the subfolder READMEs.

I would like a centralized doc to be our recipe guide for ptq. Maybe I can just move this to the ptq folder.

codecov · 2026-06-09T23:52:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.59%. Comparing base (d3acf45) to head (f334cf8).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1662   +/-   ##
=======================================
  Coverage   56.59%   56.59%           
=======================================
  Files         507      507           
  Lines       55794    55794           
=======================================
  Hits        31579    31579           
  Misses      24215    24215

Flag	Coverage Δ
unit	`54.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meenchen · 2026-06-09T23:58:11Z

+
+Most PTQ recipes are **not** one-off configs — they are a mix-and-match of four
+independent axes, which is why `general/ptq/` looks like a combinatorial matrix.
+The file name encodes the choices: `<weight-scope>-<kv-mode>[-<algorithm>].yaml`.


nit, looks like it should be: <formats-scope>-<kv-mode>[-<algorithm>].yaml

cc @juhi10071998 I think this is a good format to follow for the incoming autoquant recipes as well

meenchen · 2026-06-09T23:59:05Z

+| **Numeric format** | The precision of the quantized tensors | `fp8` (per-tensor E4M3, W8A8), `nvfp4` (E2M1 block W4A4 w/ FP8 scales), `int4`/`int8`, and `mxfp4`/`mxfp6`/`mxfp8`/`mxint8` (available as building blocks/presets) |
+| **Scope** | *Which* layers get quantized | `default` (whole model), `mlp_only` (MLP/MoE blocks), `experts_only` (MoE routed experts), `omlp_only` (MLP/MoE + attention output proj), `weight_only` (weights only, W4A16 — activations stay BF16) |
+| **KV-cache mode** | How (or whether) the attention KV cache is quantized | `kv_fp8` (calibrated), `kv_fp8_cast` (FP8 with constant amax — **no KV calibration**), `kv_nvfp4_cast`, or `kv_fp16`/`none` (KV left unquantized) |
+| **Calibration algorithm** | How scales are searched during calibration | `max` (default, fast), `mse` (often with `fp8_scale_sweep`), `gptq` (layerwise), and the AWQ/SmoothQuant/SVDQuant families (mostly via presets) |


It is possible to have stacked algorithms, like mse + gptq

meenchen · 2026-06-10T00:03:46Z

+
+| Numeric × Scope | KV-cache variants shipped | Calibration variants |
+|-----------------|---------------------------|----------------------|
+| **FP8, whole model** (`fp8_default`) | `kv_fp8`, `kv_fp8_cast` | max |


Maybe we can just add a point here for people/agents to look at.

meenchen · 2026-06-10T00:16:39Z

@@ -0,0 +1,297 @@
+# PTQ Recipes & Schemes


Should we add an experimental section talking about AutoQuant?

We can update the autoquant stuff when we have actual autoquant recipes after @juhi10071998 's work gets in.

Add a top-level README.md cataloging the recipe families (general, huggingface, models, configs) and how to load/select recipes, plus a recipe.md that walks through the general PTQ schemes (body scopes, KV-cache modes, calibration variants) and the model-specific recipes under huggingface/ and models/, comparing each to its general baseline with guidance on choosing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv requested a review from a team as a code owner June 9, 2026 23:34

cjluo-nv requested a review from shengliangxu June 9, 2026 23:34

cjluo-nv force-pushed the docs/modelopt-recipes-guide branch from 4e3b413 to 2a08c6b Compare June 9, 2026 23:37

coderabbitai Bot approved these changes Jun 9, 2026

View reviewed changes

cjluo-nv force-pushed the docs/modelopt-recipes-guide branch 2 times, most recently from f4ad44b to e88f14f Compare June 9, 2026 23:42

cjluo-nv requested review from Edwardf0t1, Fridah-nv, meenchen and sugunav14 June 9, 2026 23:44

meenchen approved these changes Jun 9, 2026

View reviewed changes

shengliangxu reviewed Jun 9, 2026

View reviewed changes

meenchen reviewed Jun 10, 2026

View reviewed changes

cjluo-nv force-pushed the docs/modelopt-recipes-guide branch from e88f14f to f334cf8 Compare June 10, 2026 04:26

Conversation

cjluo-nv commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Documentation

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shengliangxu Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cjluo-nv commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

shengliangxu Jun 9, 2026 •

edited

Loading

codecov Bot commented Jun 9, 2026 •

edited

Loading