Conversation
Signed-off-by: Kai Xu <kaix@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1011 +/- ##
==========================================
- Coverage 72.12% 70.09% -2.04%
==========================================
Files 209 221 +12
Lines 23628 25459 +1831
==========================================
+ Hits 17042 17845 +803
- Misses 6586 7614 +1028 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
|
Added a separate ptq skill, needs further tuning. Claude opus can follow the skill, but sonnet needs more guide. |
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
18eb9c2 to
6968ad6
Compare
Signed-off-by: Meng Xin <mxin@nvidia.com>
bd2d3da to
4f61bad
Compare
Copy nel-assistant skill as local evaluation skill so we can extend it to support optimized model evaluation requirements. Update modelopt orchestrator to reference the evaluation skill. Signed-off-by: Kai Xu <kaix@nvidia.com>
4f61bad to
28928a1
Compare
Add deployment skill (vLLM, SGLang, TRT-LLM serving) and update modelopt orchestrator to support three pipelines: - PTQ only - PTQ + Deploy (serve as API endpoint) - PTQ + Evaluate (accuracy benchmark) Signed-off-by: Kai Xu <kaix@nvidia.com>
3a320f6 to
5c46798
Compare
What does this PR do?
Type of change: ?
Adds a Claude Code skill suite for interactive model optimization with ModelOpt. The skill guides users through an end-to-end workflow: optimize model with modelopt APIs, deploy on vLLM and benchmark speed, evaluate accuracy with NeMo Evaluator (nel), and iterate on optimization recipes until accuracy/performance targets are met. Includes a Pareto sweep mode that runs multiple formats in parallel and computes the optimal accuracy vs throughput frontier.
Usage
Invoke the skill in Claude Code:
/ptq
Say which model you want to quantize and in what quantization spec, e.g. nvfp4 mlp only
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information