docs: add modelopt_recipes README and PTQ recipe/scheme guide#1662
docs: add modelopt_recipes README and PTQ recipe/scheme guide#1662cjluo-nv wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughThis PR adds two documentation files describing ModelOpt's recipe catalog system: a top-level README explaining the recipe structure, directory layout, and composition model, and a detailed PTQ recipe guide covering naming conventions, scheme taxonomy, and selection guidance. ChangesRecipe Documentation Catalog
🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
4e3b413 to
2a08c6b
Compare
f4ad44b to
e88f14f
Compare
meenchen
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Docs-only PR adding modelopt_recipes/README.md (catalog) and recipe.md (PTQ scheme guide). Spot-checked claims against the repo:
- The "All 18
general/ptq/recipes" table matches the 18 YAMLs inmodelopt_recipes/general/ptq/. general/speculative_decoding/{eagle3,dflash}.yamlandgeneral/distillation/dmd2_qwen_image.yamlexist as described.huggingface/model_type folders (gemma,mpt,nemotron_vl,phi4mm,qwen3_5,qwen3_5_moe,step3p5) and themodels/Nemotron-3-Super-120B-A12B/{super-nvfp4,super-nvfp4-max-calib}.yamlfiles exist.- API examples (
from modelopt.recipe import load_recipe,from modelopt.torch.fastgen import load_dmd_config) resolve to real exported symbols.
No code/behavior changes, no licensing impact. Content is internally consistent and accurate.
|
|
||
| | Numeric × Scope | KV-cache variants shipped | Calibration variants | | ||
| |-----------------|---------------------------|----------------------| | ||
| | **FP8, whole model** (`fp8_default`) | `kv_fp8`, `kv_fp8_cast` | max | |
There was a problem hiding this comment.
for this list and the lists below, I intentionally do not put the list in any README doc because the list can be identified by just browsing the folder. And it adds burden when we update the recipe repo. I think we can just show examples instead of enumerate all of them in the doc here.
There was a problem hiding this comment.
Maybe we can just add a point here for people/agents to look at.
There was a problem hiding this comment.
yeah, we should just describe the structure here, listing the contents can be done by user/agent browsing the structure.
| @@ -0,0 +1,297 @@ | |||
| # PTQ Recipes & Schemes | |||
There was a problem hiding this comment.
Many of the contents are in the README doc at each sub folders. We can simplify docs here to be just high level descriptions. And add pointers to the subfolder READMEs.
There was a problem hiding this comment.
I would like a centralized doc to be our recipe guide for ptq. Maybe I can just move this to the ptq folder.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1662 +/- ##
=======================================
Coverage 56.59% 56.59%
=======================================
Files 507 507
Lines 55794 55794
=======================================
Hits 31579 31579
Misses 24215 24215
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
|
||
| Most PTQ recipes are **not** one-off configs — they are a mix-and-match of four | ||
| independent axes, which is why `general/ptq/` looks like a combinatorial matrix. | ||
| The file name encodes the choices: `<weight-scope>-<kv-mode>[-<algorithm>].yaml`. |
There was a problem hiding this comment.
nit, looks like it should be: <formats-scope>-<kv-mode>[-<algorithm>].yaml
There was a problem hiding this comment.
cc @juhi10071998 I think this is a good format to follow for the incoming autoquant recipes as well
| | **Numeric format** | The precision of the quantized tensors | `fp8` (per-tensor E4M3, W8A8), `nvfp4` (E2M1 block W4A4 w/ FP8 scales), `int4`/`int8`, and `mxfp4`/`mxfp6`/`mxfp8`/`mxint8` (available as building blocks/presets) | | ||
| | **Scope** | *Which* layers get quantized | `default` (whole model), `mlp_only` (MLP/MoE blocks), `experts_only` (MoE routed experts), `omlp_only` (MLP/MoE + attention output proj), `weight_only` (weights only, W4A16 — activations stay BF16) | | ||
| | **KV-cache mode** | How (or whether) the attention KV cache is quantized | `kv_fp8` (calibrated), `kv_fp8_cast` (FP8 with constant amax — **no KV calibration**), `kv_nvfp4_cast`, or `kv_fp16`/`none` (KV left unquantized) | | ||
| | **Calibration algorithm** | How scales are searched during calibration | `max` (default, fast), `mse` (often with `fp8_scale_sweep`), `gptq` (layerwise), and the AWQ/SmoothQuant/SVDQuant families (mostly via presets) | |
There was a problem hiding this comment.
It is possible to have stacked algorithms, like mse + gptq
|
|
||
| | Numeric × Scope | KV-cache variants shipped | Calibration variants | | ||
| |-----------------|---------------------------|----------------------| | ||
| | **FP8, whole model** (`fp8_default`) | `kv_fp8`, `kv_fp8_cast` | max | |
There was a problem hiding this comment.
Maybe we can just add a point here for people/agents to look at.
| @@ -0,0 +1,297 @@ | |||
| # PTQ Recipes & Schemes | |||
There was a problem hiding this comment.
Should we add an experimental section talking about AutoQuant?
There was a problem hiding this comment.
We can update the autoquant stuff when we have actual autoquant recipes after @juhi10071998 's work gets in.
Add a top-level README.md cataloging the recipe families (general, huggingface, models, configs) and how to load/select recipes, plus a recipe.md that walks through the general PTQ schemes (body scopes, KV-cache modes, calibration variants) and the model-specific recipes under huggingface/ and models/, comparing each to its general baseline with guidance on choosing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
e88f14f to
f334cf8
Compare
What does this PR do?
Type of change: documentation
Adds two docs under
modelopt_recipes/(no code or behavior changes):README.md— catalog of the recipe library: its purpose (a recipe is thesingle, version-controlled source of truth for how a model is optimized), the
directory layout (
general/,huggingface/,models/,configs/), how toload/select recipes (
load_recipe,--recipe), and a high-level map of thegeneral PTQ combos, speculative-decoding, and distillation recipes.
recipe.md— a focused guide to the PTQ schemes: the generalgeneral/ptq/body scopes (full-model FP8/NVFP4, scoped experts-only / mlp-only / omlp-only,
weight-only), KV-cache modes (
kv_fp8_cast/kv_nvfp4_cast/kv_fp8),calibration variants (max / mse / gptq / layerwise), low- vs high-concurrency
deployment guidance, and the model-specific recipes under
huggingface/andmodels/— each compared to its general baseline.Usage
Testing
pre-commit run --files modelopt_recipes/README.md modelopt_recipes/recipe.mdpasses (markdownlint, modelopt recipe validation, license/format hooks).
Before your PR is "Ready for review"
CONTRIBUTING.md: N/AAdditional Information
Documentation for the
modelopt_recipes/library; content verified against therecipe YAMLs and the
modelopt.recipe/ config-loader source.🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation