[https://nvbugs/6335137][fix] Fix VSWA/Hybrid model crash with MAX_UTILIZATION by VALLIS-NERIA · Pull Request #15441 · NVIDIA/TensorRT-LLM

VALLIS-NERIA · 2026-06-17T03:35:34Z

Signed-off-by: Xiwen Yu 13230610+VALLIS-NERIA@users.noreply.github.com

Summary by CodeRabbit

Refactor
- Optimized batch scheduling efficiency by conditionally initializing capacity scheduler components based on attention configuration settings.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ILIZATION Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

coderabbitai · 2026-06-17T03:39:24Z

📝 Walkthrough

Walkthrough

In MaxUtilizationScheduler::operator(), the call to prefillWithChunkedContextsAlreadyExecuting that populates newlyContributedContextBlocks and newlyContributedCrossContextBlocks is now wrapped in an if (skippingIsRelevant) guard. When variable-window attention is active (skippingIsRelevant is false), both sets remain empty.

Changes

Conditional prefill block population in MaxUtilizationScheduler

Layer / File(s)	Summary
Guard prefill walk on `skippingIsRelevant` `cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`	`newlyContributedContextBlocks` and `newlyContributedCrossContextBlocks` are declared unconditionally, but the `prefillWithChunkedContextsAlreadyExecuting` call that fills them is now executed only when `skippingIsRelevant` is true; both sets stay default-initialized otherwise.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. The Description, Test Coverage sections are empty, and the PR Checklist is not verified. Only template boilerplate and a sign-off are present without substantive content.	Complete the Description section explaining the issue and solution, add Test Coverage details, and verify/check off appropriate PR Checklist items before merging.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly follows the required format with NVBugs ticket, fix type, and concise summary of the crash fix for VSWA/Hybrid models with MAX_UTILIZATION.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp (1)
388-392: 💤 Low value

Consider adding an inline comment explaining the crash prevention.

While the variable name skippingIsRelevant is self-documenting and line 377 provides context, an inline comment here would help future maintainers understand that this guard prevents a crash (not just an optimization). The underlying issue is that analyzePrefixReuse (called within prefillWithChunkedContextsAlreadyExecuting at lines 55 and 64) asserts on variable-window managers.
Example comment
+    // Don't populate block sets when variable-window is active — analyzePrefixReuse would assert.
+    // See line 294 for similar guard in the prefix-reuse summary computation path.
     if (skippingIsRelevant)
     {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp` around lines 388 - 392,
Add an inline comment above or within the if (skippingIsRelevant) block at the
prefillWithChunkedContextsAlreadyExecuting call to explain that this guard
prevents a crash rather than just being an optimization. The comment should
clarify that this check is necessary because
prefillWithChunkedContextsAlreadyExecuting calls analyzePrefixReuse internally,
which asserts on variable-window managers, so calling it when skippingIsRelevant
is false would cause a crash.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`:
- Around line 388-392: Add an inline comment above or within the if
(skippingIsRelevant) block at the prefillWithChunkedContextsAlreadyExecuting
call to explain that this guard prevents a crash rather than just being an
optimization. The comment should clarify that this check is necessary because
prefillWithChunkedContextsAlreadyExecuting calls analyzePrefixReuse internally,
which asserts on variable-window managers, so calling it when skippingIsRelevant
is false would cause a crash.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14f2fcf5-1cd2-4c80-b2a6-70ef4185c1a8

📥 Commits

Reviewing files that changed from the base of the PR and between 29b228e and 37d2f0b.

📒 Files selected for processing (1)

cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp

VALLIS-NERIA · 2026-06-17T07:13:44Z

/bot run

tensorrt-cicd · 2026-06-17T07:19:32Z

PR_Github #54788 [ run ] triggered by Bot. Commit: 37d2f0b Link to invocation

tensorrt-cicd · 2026-06-17T10:46:15Z

PR_Github #54788 [ run ] completed with state SUCCESS. Commit: 37d2f0b
/LLM/main/L0_MergeRequest_PR pipeline #43804 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

[https://nvbugs/6335137][fix] Fix VSWA/Hybrid model crash with MAX_UT…

37d2f0b

…ILIZATION Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>

github-actions Bot assigned VALLIS-NERIA Jun 17, 2026

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6335137][fix] Fix VSWA/Hybrid model crash with MAX_UTILIZATION#15441

[https://nvbugs/6335137][fix] Fix VSWA/Hybrid model crash with MAX_UTILIZATION#15441
VALLIS-NERIA wants to merge 1 commit into
NVIDIA:mainfrom
VALLIS-NERIA:6335137

VALLIS-NERIA commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

VALLIS-NERIA commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VALLIS-NERIA commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jun 17, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

VALLIS-NERIA commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VALLIS-NERIA commented Jun 17, 2026 •

edited

Loading