Skip to content

Add agent skills for modelopt#1011

Draft
kaix-nv wants to merge 8 commits intomainfrom
kaix/modelopt_agent
Draft

Add agent skills for modelopt#1011
kaix-nv wants to merge 8 commits intomainfrom
kaix/modelopt_agent

Conversation

@kaix-nv
Copy link
Contributor

@kaix-nv kaix-nv commented Mar 9, 2026

What does this PR do?

Type of change: ?

Adds a Claude Code skill suite for interactive model optimization with ModelOpt. The skill guides users through an end-to-end workflow: optimize model with modelopt APIs, deploy on vLLM and benchmark speed, evaluate accuracy with NeMo Evaluator (nel), and iterate on optimization recipes until accuracy/performance targets are met. Includes a Pareto sweep mode that runs multiple formats in parallel and computes the optimal accuracy vs throughput frontier.

Usage

Invoke the skill in Claude Code:

/ptq

Say which model you want to quantize and in what quantization spec, e.g. nvfp4 mlp only

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Signed-off-by: Kai Xu <kaix@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kaix-nv kaix-nv requested a review from mxinO March 9, 2026 23:30
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f267fb00-1101-4c4e-b6d7-4d55e0005023

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kaix/modelopt_agent
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.09%. Comparing base (a4fde49) to head (28928a1).
⚠️ Report is 31 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1011      +/-   ##
==========================================
- Coverage   72.12%   70.09%   -2.04%     
==========================================
  Files         209      221      +12     
  Lines       23628    25459    +1831     
==========================================
+ Hits        17042    17845     +803     
- Misses       6586     7614    +1028     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mxinO added 2 commits March 9, 2026 19:38
Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Meng Xin <mxin@nvidia.com>
@mxinO
Copy link
Contributor

mxinO commented Mar 10, 2026

Added a separate ptq skill, needs further tuning. Claude opus can follow the skill, but sonnet needs more guide.

kaix-nv added 2 commits March 10, 2026 15:59
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 18eb9c2 to 6968ad6 Compare March 11, 2026 00:47
Signed-off-by: Meng Xin <mxin@nvidia.com>
@Edwardf0t1
Copy link
Contributor

Edwardf0t1 commented Mar 12, 2026

@kaix-nv @mxinO This is a great starting point to use agent skills for modelopt workflows 👍 We should test it with various models and optimization recipes to polish the skills.

@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from bd2d3da to 4f61bad Compare March 12, 2026 23:13
Copy nel-assistant skill as local evaluation skill so we can extend it
to support optimized model evaluation requirements. Update modelopt
orchestrator to reference the evaluation skill.

Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 4f61bad to 28928a1 Compare March 12, 2026 23:17
Add deployment skill (vLLM, SGLang, TRT-LLM serving) and update
modelopt orchestrator to support three pipelines:
- PTQ only
- PTQ + Deploy (serve as API endpoint)
- PTQ + Evaluate (accuracy benchmark)

Signed-off-by: Kai Xu <kaix@nvidia.com>
@kaix-nv kaix-nv force-pushed the kaix/modelopt_agent branch from 3a320f6 to 5c46798 Compare March 13, 2026 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants