[None][test] add GLM nvfp4 stress test by xinhe-nv · Pull Request #15437 · NVIDIA/TensorRT-LLM

xinhe-nv · 2026-06-17T02:12:12Z

Summary by CodeRabbit

Tests
- Added new stress testing scenario for GLM-5-NVFP4 model with disaggregated server configuration.
- Expanded test suite with configurations supporting tensor and pipeline parallelism, enabling comprehensive coverage for distributed deployment scenarios.
- https://prod.blsm.nvidia.com/swqa-tensorrt-qa-test/view/TRT-LLM-Function-Pipelines/job/DEBUG_LLM_FUNCTION_CLUSTER_TEST/1626/testReport/B200.disaggregated/test_disaggregated/

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: GitLab CI Bot <gitlab-ci@nvidia.com>

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

coderabbitai · 2026-06-17T02:14:51Z

📝 Walkthrough

Walkthrough

Adds a new disaggregated stress-test scenario for GLM-5-NVFP4. A new YAML configuration file defines context and generation servers with TP4/EP4/DP parallelism, FP8 KV cache, DEEPGEMM MoE backend, and CUDA graph settings. The test parametrization and QA test list are updated to register and enable the new scenario.

Changes

GLM-5-NVFP4 Disaggregated Stress Test

Layer / File(s)	Summary
Disaggregated server YAML config `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp4ep4_gentp4ep4_glm5_nvfp4_dp_tllm.yaml`	New YAML defining `pytorch` backend servers with TP4/PP1/EP4, `enable_attention_dp: true`, FP8 KV cache (`enable_block_reuse: false`, `free_gpu_memory_fraction: 0.8`), `DEEPGEMM` MoE backend, DEFAULT cache transceiver with `max_tokens_in_buffer: 16384`, CUDA graph with padding for generation servers, and `null` CUDA graph config for context servers.
Test parametrization and QA list registration `tests/integration/defs/disaggregated/test_disaggregated.py`, `tests/integration/test_lists/qa/llm_function_stress.txt`	Maps `glm5_nvfp4_tp4_ep4_dp_stress` to the new YAML in the config-selection dict, adds a `TestConfig` entry with request count, accuracy threshold, cancellation settings, and `skip_less_device(4)` / `skip_pre_blackwell` marks, and registers the test case in the QA stress list.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14278: Extends the same test_disaggregated.py parametrization and adds a similarly structured YAML config and QA list entry for a different model (Qwen3 32B FP8).

Suggested reviewers

tburt-nv
pcastonguay

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is empty, containing only the template structure without any actual content in the Description or Test Coverage sections.	Add a meaningful description explaining what stress test was added, why (e.g., testing GLM-5-NVFP4 with specific parallelism settings), and which tests provide coverage for these changes.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][test] add GLM nvfp4 stress test' is concise and directly related to the main change: adding a stress test for GLM with NVFP4 quantization across configuration, test code, and test list files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DeepGemmFusedMoE only supports FP8 block scales (DeepSeek-style); NVFP4 quantization requires TRTLLMGenFusedMoE. Switch both context and generation server configs from DEEPGEMM to TRTLLM. Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

xinhe-nv · 2026-06-17T11:59:00Z

/bot run --skip-test

tensorrt-cicd · 2026-06-17T12:05:42Z

PR_Github #117 [ run ] triggered by Bot. Commit: 5dec4e5 Link to invocation

tensorrt-cicd · 2026-06-17T12:05:44Z

PR_Github #54845 [ run ] triggered by Bot. Commit: 5dec4e5 Link to invocation

tensorrt-cicd · 2026-06-17T12:41:56Z

PR_Github #117 [ run ] completed with state FAILURE. Commit: 5dec4e5
/LLM/PipelineMonitor/L0_MergeRequest_PR pipeline #91 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-06-17T13:45:39Z

PR_Github #54845 [ run ] completed with state SUCCESS. Commit: 5dec4e5
/LLM/main/L0_MergeRequest_PR pipeline #43857 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

add GLM nvfp4 stress test

49f74c8

Signed-off-by: GitLab CI Bot <gitlab-ci@nvidia.com>

xinhe-nv requested review from a team as code owners June 17, 2026 02:12

github-actions Bot assigned xinhe-nv Jun 17, 2026

Merge branch 'main' into stress-test

1f2be12

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

xinhe-nv added 3 commits June 17, 2026 14:41

Merge branch 'main' into stress-test

5ef462e

Merge branch 'main' into stress-test

5dec4e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][test] add GLM nvfp4 stress test#15437

[None][test] add GLM nvfp4 stress test#15437
xinhe-nv wants to merge 5 commits into
NVIDIA:mainfrom
xinhe-nv:stress-test

xinhe-nv commented Jun 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

xinhe-nv commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xinhe-nv commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

xinhe-nv commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

tensorrt-cicd commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xinhe-nv commented Jun 17, 2026 •

edited

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading