Skip to content

docs: add modelopt_recipes README and PTQ recipe/scheme guide#1662

Open
cjluo-nv wants to merge 1 commit into
mainfrom
docs/modelopt-recipes-guide
Open

docs: add modelopt_recipes README and PTQ recipe/scheme guide#1662
cjluo-nv wants to merge 1 commit into
mainfrom
docs/modelopt-recipes-guide

Conversation

@cjluo-nv

@cjluo-nv cjluo-nv commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Type of change: documentation

Adds two docs under modelopt_recipes/ (no code or behavior changes):

  • README.md — catalog of the recipe library: its purpose (a recipe is the
    single, version-controlled source of truth for how a model is optimized), the
    directory layout (general/, huggingface/, models/, configs/), how to
    load/select recipes (load_recipe, --recipe), and a high-level map of the
    general PTQ combos, speculative-decoding, and distillation recipes.
  • recipe.md — a focused guide to the PTQ schemes: the general general/ptq/
    body scopes (full-model FP8/NVFP4, scoped experts-only / mlp-only / omlp-only,
    weight-only), KV-cache modes (kv_fp8_cast / kv_nvfp4_cast / kv_fp8),
    calibration variants (max / mse / gptq / layerwise), low- vs high-concurrency
    deployment guidance, and the model-specific recipes under huggingface/ and
    models/ — each compared to its general baseline.

Usage

# Documentation only. The recipes themselves load as before, e.g.:
from modelopt.recipe import load_recipe
cfg = load_recipe("general/ptq/nvfp4_experts_only-kv_fp8_cast")

Testing

pre-commit run --files modelopt_recipes/README.md modelopt_recipes/recipe.md
passes (markdownlint, modelopt recipe validation, license/format hooks).

Before your PR is "Ready for review"

  • Is this change backward compatible?: N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: ❌

Additional Information

Documentation for the modelopt_recipes/ library; content verified against the
recipe YAMLs and the modelopt.recipe / config-loader source.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation

  • Added comprehensive documentation for ModelOpt recipes, detailing YAML-based optimization workflows for PTQ quantization, speculative-decoding training, and diffusion distillation.
  • Added PTQ recipe selection guide with guidance on variant selection, model-specific configurations, and best practices for optimization.

@cjluo-nv cjluo-nv requested a review from a team as a code owner June 9, 2026 23:34
@cjluo-nv cjluo-nv requested a review from shengliangxu June 9, 2026 23:34
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d5117115-dc44-45fb-9a86-b5500c05d941

📥 Commits

Reviewing files that changed from the base of the PR and between e88f14f and f334cf8.

📒 Files selected for processing (2)
  • modelopt_recipes/README.md
  • modelopt_recipes/ptq.md
✅ Files skipped from review due to trivial changes (2)
  • modelopt_recipes/ptq.md
  • modelopt_recipes/README.md

📝 Walkthrough

Walkthrough

This PR adds two documentation files describing ModelOpt's recipe catalog system: a top-level README explaining the recipe structure, directory layout, and composition model, and a detailed PTQ recipe guide covering naming conventions, scheme taxonomy, and selection guidance.

Changes

Recipe Documentation Catalog

Layer / File(s) Summary
Catalog overview and system structure
modelopt_recipes/README.md
Introduces recipes as version-controlled YAML workflows, explains $import composition and directory layout (general/, huggingface/, models/, configs/), documents the PTQ filename-axis model (weight scope, KV-cache, calibration), describes shipped families, shared building blocks, HF-specific conventions, checkpoint-mirroring examples, and contributor guidance.
PTQ recipes, schemes, and selection guidance
modelopt_recipes/ptq.md
Details PTQ recipe naming, enumerates shipped recipes with a combo table, defines model-body schemes (full-model, scoped, weight-only), KV-cache schemes, and calibration variants (max, mse, gptq, layerwise), provides step-by-step recipe selection guidance, and documents model-specific deviations with architecture and algorithm examples.

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding two documentation files (README and PTQ guide) to the modelopt_recipes directory.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR adds documentation and supporting Python files with no security anti-patterns: no unsafe torch.load, numpy.load, eval, exec, hardcoded trust_remote_code, or nosec comments detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/modelopt-recipes-guide

Comment @coderabbitai help to get the list of available commands and usage tips.

@cjluo-nv cjluo-nv force-pushed the docs/modelopt-recipes-guide branch from 4e3b413 to 2a08c6b Compare June 9, 2026 23:37
@cjluo-nv cjluo-nv force-pushed the docs/modelopt-recipes-guide branch 2 times, most recently from f4ad44b to e88f14f Compare June 9, 2026 23:42

@meenchen meenchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Docs-only PR adding modelopt_recipes/README.md (catalog) and recipe.md (PTQ scheme guide). Spot-checked claims against the repo:

  • The "All 18 general/ptq/ recipes" table matches the 18 YAMLs in modelopt_recipes/general/ptq/.
  • general/speculative_decoding/{eagle3,dflash}.yaml and general/distillation/dmd2_qwen_image.yaml exist as described.
  • huggingface/ model_type folders (gemma, mpt, nemotron_vl, phi4mm, qwen3_5, qwen3_5_moe, step3p5) and the models/Nemotron-3-Super-120B-A12B/{super-nvfp4,super-nvfp4-max-calib}.yaml files exist.
  • API examples (from modelopt.recipe import load_recipe, from modelopt.torch.fastgen import load_dmd_config) resolve to real exported symbols.

No code/behavior changes, no licensing impact. Content is internally consistent and accurate.


| Numeric × Scope | KV-cache variants shipped | Calibration variants |
|-----------------|---------------------------|----------------------|
| **FP8, whole model** (`fp8_default`) | `kv_fp8`, `kv_fp8_cast` | max |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this list and the lists below, I intentionally do not put the list in any README doc because the list can be identified by just browsing the folder. And it adds burden when we update the recipe repo. I think we can just show examples instead of enumerate all of them in the doc here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just add a point here for people/agents to look at.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should just describe the structure here, listing the contents can be done by user/agent browsing the structure.

Comment thread modelopt_recipes/ptq.md
@@ -0,0 +1,297 @@
# PTQ Recipes & Schemes

@shengliangxu shengliangxu Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the contents are in the README doc at each sub folders. We can simplify docs here to be just high level descriptions. And add pointers to the subfolder READMEs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a centralized doc to be our recipe guide for ptq. Maybe I can just move this to the ptq folder.

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.59%. Comparing base (d3acf45) to head (f334cf8).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1662   +/-   ##
=======================================
  Coverage   56.59%   56.59%           
=======================================
  Files         507      507           
  Lines       55794    55794           
=======================================
  Hits        31579    31579           
  Misses      24215    24215           
Flag Coverage Δ
unit 54.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


Most PTQ recipes are **not** one-off configs — they are a mix-and-match of four
independent axes, which is why `general/ptq/` looks like a combinatorial matrix.
The file name encodes the choices: `<weight-scope>-<kv-mode>[-<algorithm>].yaml`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, looks like it should be: <formats-scope>-<kv-mode>[-<algorithm>].yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @juhi10071998 I think this is a good format to follow for the incoming autoquant recipes as well

| **Numeric format** | The precision of the quantized tensors | `fp8` (per-tensor E4M3, W8A8), `nvfp4` (E2M1 block W4A4 w/ FP8 scales), `int4`/`int8`, and `mxfp4`/`mxfp6`/`mxfp8`/`mxint8` (available as building blocks/presets) |
| **Scope** | *Which* layers get quantized | `default` (whole model), `mlp_only` (MLP/MoE blocks), `experts_only` (MoE routed experts), `omlp_only` (MLP/MoE + attention output proj), `weight_only` (weights only, W4A16 — activations stay BF16) |
| **KV-cache mode** | How (or whether) the attention KV cache is quantized | `kv_fp8` (calibrated), `kv_fp8_cast` (FP8 with constant amax — **no KV calibration**), `kv_nvfp4_cast`, or `kv_fp16`/`none` (KV left unquantized) |
| **Calibration algorithm** | How scales are searched during calibration | `max` (default, fast), `mse` (often with `fp8_scale_sweep`), `gptq` (layerwise), and the AWQ/SmoothQuant/SVDQuant families (mostly via presets) |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to have stacked algorithms, like mse + gptq


| Numeric × Scope | KV-cache variants shipped | Calibration variants |
|-----------------|---------------------------|----------------------|
| **FP8, whole model** (`fp8_default`) | `kv_fp8`, `kv_fp8_cast` | max |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just add a point here for people/agents to look at.

Comment thread modelopt_recipes/ptq.md
@@ -0,0 +1,297 @@
# PTQ Recipes & Schemes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an experimental section talking about AutoQuant?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update the autoquant stuff when we have actual autoquant recipes after @juhi10071998 's work gets in.

Add a top-level README.md cataloging the recipe families (general, huggingface, models, configs) and how to load/select recipes, plus a recipe.md that walks through the general PTQ schemes (body scopes, KV-cache modes, calibration variants) and the model-specific recipes under huggingface/ and models/, comparing each to its general baseline with guidance on choosing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv force-pushed the docs/modelopt-recipes-guide branch from e88f14f to f334cf8 Compare June 10, 2026 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants