Skip to content

[https://nvbugs/6335137][fix] Fix VSWA/Hybrid model crash with MAX_UTILIZATION#15441

Open
VALLIS-NERIA wants to merge 1 commit into
NVIDIA:mainfrom
VALLIS-NERIA:6335137
Open

[https://nvbugs/6335137][fix] Fix VSWA/Hybrid model crash with MAX_UTILIZATION#15441
VALLIS-NERIA wants to merge 1 commit into
NVIDIA:mainfrom
VALLIS-NERIA:6335137

Conversation

@VALLIS-NERIA

@VALLIS-NERIA VALLIS-NERIA commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Signed-off-by: Xiwen Yu 13230610+VALLIS-NERIA@users.noreply.github.com

Summary by CodeRabbit

  • Refactor
    • Optimized batch scheduling efficiency by conditionally initializing capacity scheduler components based on attention configuration settings.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ILIZATION

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

In MaxUtilizationScheduler::operator(), the call to prefillWithChunkedContextsAlreadyExecuting that populates newlyContributedContextBlocks and newlyContributedCrossContextBlocks is now wrapped in an if (skippingIsRelevant) guard. When variable-window attention is active (skippingIsRelevant is false), both sets remain empty.

Changes

Conditional prefill block population in MaxUtilizationScheduler

Layer / File(s) Summary
Guard prefill walk on skippingIsRelevant
cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp
newlyContributedContextBlocks and newlyContributedCrossContextBlocks are declared unconditionally, but the prefillWithChunkedContextsAlreadyExecuting call that fills them is now executed only when skippingIsRelevant is true; both sets stay default-initialized otherwise.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. The Description, Test Coverage sections are empty, and the PR Checklist is not verified. Only template boilerplate and a sign-off are present without substantive content. Complete the Description section explaining the issue and solution, add Test Coverage details, and verify/check off appropriate PR Checklist items before merging.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly follows the required format with NVBugs ticket, fix type, and concise summary of the crash fix for VSWA/Hybrid models with MAX_UTILIZATION.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp (1)

388-392: 💤 Low value

Consider adding an inline comment explaining the crash prevention.

While the variable name skippingIsRelevant is self-documenting and line 377 provides context, an inline comment here would help future maintainers understand that this guard prevents a crash (not just an optimization). The underlying issue is that analyzePrefixReuse (called within prefillWithChunkedContextsAlreadyExecuting at lines 55 and 64) asserts on variable-window managers.

Example comment
+    // Don't populate block sets when variable-window is active — analyzePrefixReuse would assert.
+    // See line 294 for similar guard in the prefix-reuse summary computation path.
     if (skippingIsRelevant)
     {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp` around lines 388 - 392,
Add an inline comment above or within the if (skippingIsRelevant) block at the
prefillWithChunkedContextsAlreadyExecuting call to explain that this guard
prevents a crash rather than just being an optimization. The comment should
clarify that this check is necessary because
prefillWithChunkedContextsAlreadyExecuting calls analyzePrefixReuse internally,
which asserts on variable-window managers, so calling it when skippingIsRelevant
is false would cause a crash.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`:
- Around line 388-392: Add an inline comment above or within the if
(skippingIsRelevant) block at the prefillWithChunkedContextsAlreadyExecuting
call to explain that this guard prevents a crash rather than just being an
optimization. The comment should clarify that this check is necessary because
prefillWithChunkedContextsAlreadyExecuting calls analyzePrefixReuse internally,
which asserts on variable-window managers, so calling it when skippingIsRelevant
is false would cause a crash.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14f2fcf5-1cd2-4c80-b2a6-70ef4185c1a8

📥 Commits

Reviewing files that changed from the base of the PR and between 29b228e and 37d2f0b.

📒 Files selected for processing (1)
  • cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp

@VALLIS-NERIA

Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54788 [ run ] triggered by Bot. Commit: 37d2f0b Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54788 [ run ] completed with state SUCCESS. Commit: 37d2f0b
/LLM/main/L0_MergeRequest_PR pipeline #43804 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants