Skip to content

[None][test] add GLM nvfp4 stress test#15437

Open
xinhe-nv wants to merge 5 commits into
NVIDIA:mainfrom
xinhe-nv:stress-test
Open

[None][test] add GLM nvfp4 stress test#15437
xinhe-nv wants to merge 5 commits into
NVIDIA:mainfrom
xinhe-nv:stress-test

Conversation

@xinhe-nv

@xinhe-nv xinhe-nv commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: GitLab CI Bot <gitlab-ci@nvidia.com>
@xinhe-nv xinhe-nv requested review from a team as code owners June 17, 2026 02:12
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new disaggregated stress-test scenario for GLM-5-NVFP4. A new YAML configuration file defines context and generation servers with TP4/EP4/DP parallelism, FP8 KV cache, DEEPGEMM MoE backend, and CUDA graph settings. The test parametrization and QA test list are updated to register and enable the new scenario.

Changes

GLM-5-NVFP4 Disaggregated Stress Test

Layer / File(s) Summary
Disaggregated server YAML config
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp4ep4_gentp4ep4_glm5_nvfp4_dp_tllm.yaml
New YAML defining pytorch backend servers with TP4/PP1/EP4, enable_attention_dp: true, FP8 KV cache (enable_block_reuse: false, free_gpu_memory_fraction: 0.8), DEEPGEMM MoE backend, DEFAULT cache transceiver with max_tokens_in_buffer: 16384, CUDA graph with padding for generation servers, and null CUDA graph config for context servers.
Test parametrization and QA list registration
tests/integration/defs/disaggregated/test_disaggregated.py, tests/integration/test_lists/qa/llm_function_stress.txt
Maps glm5_nvfp4_tp4_ep4_dp_stress to the new YAML in the config-selection dict, adds a TestConfig entry with request count, accuracy threshold, cancellation settings, and skip_less_device(4) / skip_pre_blackwell marks, and registers the test case in the QA stress list.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#14278: Extends the same test_disaggregated.py parametrization and adds a similarly structured YAML config and QA list entry for a different model (Qwen3 32B FP8).

Suggested reviewers

  • tburt-nv
  • pcastonguay
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is empty, containing only the template structure without any actual content in the Description or Test Coverage sections. Add a meaningful description explaining what stress test was added, why (e.g., testing GLM-5-NVFP4 with specific parallelism settings), and which tests provide coverage for these changes.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][test] add GLM nvfp4 stress test' is concise and directly related to the main change: adding a stress test for GLM with NVFP4 quantization across configuration, test code, and test list files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

xinhe-nv added 3 commits June 17, 2026 14:41
DeepGemmFusedMoE only supports FP8 block scales (DeepSeek-style);
NVFP4 quantization requires TRTLLMGenFusedMoE. Switch both context
and generation server configs from DEEPGEMM to TRTLLM.

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
@xinhe-nv

Copy link
Copy Markdown
Collaborator Author

/bot run --skip-test

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #117 [ run ] triggered by Bot. Commit: 5dec4e5 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54845 [ run ] triggered by Bot. Commit: 5dec4e5 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #117 [ run ] completed with state FAILURE. Commit: 5dec4e5
/LLM/PipelineMonitor/L0_MergeRequest_PR pipeline #91 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54845 [ run ] completed with state SUCCESS. Commit: 5dec4e5
/LLM/main/L0_MergeRequest_PR pipeline #43857 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants