Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
208 commits
Select commit Hold shift + click to select a range
1cc4949
[Infra] - Add wiave list for pytest when using slurm (#6130)
EmmaQiaoCh Jul 17, 2025
44c70c8
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disag…
chuangz0 Jul 17, 2025
21efb50
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (#6…
syuoni Jul 17, 2025
de60ae4
chores: unwaive a few tests for v1.0 (#6107)
hchings Jul 17, 2025
9b45499
test: update max_beam_width to 1 due to torchsampler changes. (#6101)
nv-guomingz Jul 17, 2025
a718486
fix: Fix DeepSeek R1 CI (#6129)
yizhang-nv Jul 17, 2025
9518e14
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (#6115)
StanleySun639 Jul 17, 2025
58d22a7
[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Dr…
ziyixiong-nv Jul 17, 2025
5bff317
feat: nanobind bindings (#5961)
Linda-Stadter Jul 17, 2025
d71c6fe
[fix] Update jenkins container images (#6094)
ixlmar Jul 17, 2025
10dbf4f
[fix] Remove duplicated KVCache transmission check (#6022)
Tabrizian Jul 17, 2025
8480c12
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105)
2ez4bz Jul 17, 2025
161490f
[fix] Fixes KV Cache overrides in trtllm-bench (#6103)
FrankD412 Jul 17, 2025
2c90203
Refactor KVCacheManager: Simplify token availability calculation and …
qixiang-99 Jul 17, 2025
ae28b3a
feat: Add support for benchmarking individual gemms in MOE benchmark …
djns99 Jul 17, 2025
b75e53a
Revert "feat: nanobind bindings (#5961)" (#6160)
Tabrizian Jul 18, 2025
0155e7a
[TRTLLM-6368] Update deepep dispatch API (#6037)
yifeizhang-c Jul 18, 2025
200ea9e
fix TMA error with GEMM+AR on TP=2 (#6075)
xavier-nvidia Jul 18, 2025
992b273
[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test …
dc3671 Jul 18, 2025
812243b
feat: add support for Modelopt fp8_pb_wo quantization scheme (#6106)
achartier Jul 18, 2025
c0e4165
fix single_disagg_test (#6166)
chuangz0 Jul 18, 2025
f321692
[TRTLLM-5179] - Update bot help messages (#5277)
yiqingy0 Jul 18, 2025
519a211
[None][infra] Update the allow list of CI trigger (#6168)
niukuo Jul 18, 2025
a95f31e
chore: add more log in FmhaDispatcher (#6170)
QiJune Jul 18, 2025
77acb4f
[Infra] - Waive failed tests in post-merge (#6176)
EmmaQiaoCh Jul 18, 2025
ec2b953
refactor: Enhanced handling of decoder requests and logits within the…
Funatiq Jul 18, 2025
44040ed
update broken link of PyTorchModelEngine in arch_overview (#6171)
leslie-fang25 Jul 18, 2025
9522cde
fix: NVBug 5385576 py_batch_idx issue (#6153)
hchings Jul 18, 2025
8454640
infra: fix single-GPU stage failed will not raise error (#6165)
ZhanruiSunCh Jul 18, 2025
fd6ce7f
[ci] Speedup beam search unit tests with fixtures for LLM (#5843)
stnie Jul 18, 2025
07e8813
feat: Remove padding in attention DP. (#6064)
bobboli Jul 18, 2025
2c6fa14
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests (…
bo-nv Jul 18, 2025
22d4a8c
enh: Add script to map tests <-> jenkins stages & vice-versa (#5177)
venkywonka Jul 18, 2025
28858c8
feat(eagle3):support qwen3 dense model (#5879)
xq25478 Jul 18, 2025
6d7874a
[nvbugs/5369799] fix: Update disaggregation handling in sampler (#5762)
stnie Jul 18, 2025
d475c97
[nvbugs/5354884][fix] Update beam search workspace estimation to new …
stnie Jul 18, 2025
d9a3530
[nvbug/5393888][nvbug/5393042] Always use `py_seq_slot` (#6147)
netanel-haber Jul 18, 2025
0388ff9
[https://nvbugs/5393961][fix] record kv-cache size in MLACacheFormatt…
bo-nv Jul 18, 2025
fc8b29c
[Issue 5927][fix] Avoid memory calls during broadcast for single GPU …
johncalesp Jul 18, 2025
152e2df
[Disaggregated] Add retry knobs and handling (#5808)
arekay Jul 18, 2025
82d3587
[refactor] Unify name of NGram speculative decoding (#5937)
wili-65535 Jul 19, 2025
66030ef
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support (#6133)
ziyixiong-nv Jul 19, 2025
69e9f6d
[fix]: Skip prompt length checking for generation only requests (#6146)
LinPoly Jul 19, 2025
118307c
DeepEP LL support variable hidden size and tokens num (#6141)
yilin-void Jul 20, 2025
2e14c8f
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
byshiue Jul 20, 2025
943fd41
fix: Ensure mlx5 library is installed for deep_ep and remove deprecat…
MartinMarciniszyn Jul 20, 2025
98428f3
[TRTLLM-5826][feat] Support pytorch LoRA adapter eviction (#5616)
amitz-nv Jul 20, 2025
5300a99
W4A8 GEMM (#6005)
danielafrimi Jul 20, 2025
a433eba
enh: Lift expectation of single image per sample in Gemma3 VLM (#6195)
brb-nv Jul 21, 2025
6a3c9f8
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (…
ruodil Jul 21, 2025
ca9bc57
fix: Flush stale `PlanParams` with custom attention mask (#6163)
brb-nv Jul 21, 2025
b4c7e8c
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b…
nv-guomingz Jul 21, 2025
88076ee
[fix] Fix can_use_alltoall in fused_moe_wide_ep.py (#6173)
jinyangyuan-nvidia Jul 21, 2025
e8c068b
[TRTLLM-5863][feat] Support Weight-Only-Quantization in PyTorch Workf…
Yuening-wa Jul 21, 2025
b46fd41
test: [CI] remove closed bugs (#6201)
xinhe-nv Jul 21, 2025
3efad2e
feat: nanobind bindings (#6185)
Linda-Stadter Jul 21, 2025
3cbc23f
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (B…
ZhanruiSunCh Jul 21, 2025
aea91b2
doc: add Deprecation Policy section (#5784)
QiJune Jul 21, 2025
3e0fb60
[TRTLLM-4279] feat: Multistream initial support for torch compile flo…
liji-nv Jul 21, 2025
e41507a
[Infra] - Waive failed cases on recent post-merge (#6212)
EmmaQiaoCh Jul 21, 2025
9832bef
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve …
LinPoly Jul 21, 2025
f9b0a91
test: Enable GB200 torch compile multi gpu tests (#6145)
yizhang-nv Jul 21, 2025
d7f0b0a
[fix] Correct the returned value of has_spec_drafter (#6178)
ziyixiong-nv Jul 21, 2025
9645814
[chore] Clean up quickstart_advanced.py (#6021)
mikeiovine Jul 21, 2025
4a0951f
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive trit…
SimengLiu-nv Jul 21, 2025
7381f1d
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models …
chang-l Jul 21, 2025
ee45e0c
feat: Refactor the fetching request logic (#5786)
Shunkangz Jul 22, 2025
eb5cb5b
tests: add timeout_manager to tensorrt flow test cases (#5942)
crazydemo Jul 22, 2025
fddb7f1
feat: moe prepare support topk % 4 != 0 (#5742)
WeiHaocheng Jul 22, 2025
37d0b68
[fix] Fix flaky mistral E2E test (#6230)
2ez4bz Jul 22, 2025
db77d83
bug: [https://nvbugs/5368507] Fix test_generate_with_seed. (#6206)
bobboli Jul 22, 2025
537757e
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction…
bobboli Jul 10, 2025
f4f2176
chore: Port leftover 0.20 (#5907)
amirkl94 Jul 10, 2025
f194b65
fix [nvbug/5351244]: address remote mpi session submit (#5664)
Superjomn Jul 10, 2025
9d26b78
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm (#5849)
nekorobov Jul 10, 2025
c669410
fix: fix index out of bounds error in spec decoding (#5954)
lfr-0531 Jul 14, 2025
eb7d0f8
[nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918)
yizhang-nv Jul 14, 2025
34dd071
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation…
nv-guomingz Jul 15, 2025
a03c680
add release notes for 0.21 release (#6049)
QiJune Jul 16, 2025
310bdd9
fix: Fix triton backend build [nvbug 5396469] (#6098)
pcastonguay Jul 16, 2025
24ce6b9
[Doc][Qwen3] update qwen3 into support-matrix (#6161)
byshiue Jul 18, 2025
48ddc3d
[fix]: Revert commit 388b491 (#6143)
LinPoly Jul 18, 2025
b85ab13
doc: add supported data modality and types on multimodal serve (#5988)
yechank-nvidia Jul 22, 2025
3e18ee5
chore: bump version to 1.0.0rc5 (#6252)
yiqingy0 Jul 22, 2025
3e1a0fb
[TRTLLM-6537][infra] extend multi-gpu tests related file list (#6139)
reasonsolo Jul 22, 2025
04f2d4b
test: update test list for RTX6KD (#6213)
StanleySun639 Jul 22, 2025
6007373
fix: bindings unit tests for nanobind (#6221)
Linda-Stadter Jul 22, 2025
ff99639
Add register_fake for finegrained_mixed_dtype_gemm torch_op (#6255)
danielafrimi Jul 22, 2025
b7c8a67
[Issue 6193] Fix gemma3vl weight loader (#6233)
johncalesp Jul 22, 2025
ab7434a
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#…
2ez4bz Jul 22, 2025
ef4878d
set NVIDIA_IMEX_CHANNELS for dlcluster slurm job only (#6234)
yuanjingx87 Jul 22, 2025
5234502
[nvbug/5361223] doc: Update Llama4 deployment guide: update config & …
raayandhar Jul 22, 2025
41fb8aa
[AutoDeploy] merge feat/ad-2025-07-07 (#6196)
lucaslie Jul 22, 2025
bc2fb29
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving suppor…
Tabrizian Jul 22, 2025
8ecdeee
[refactor] Simplification of Speculative decoding configs - Part 2 (#…
wili-65535 Jul 23, 2025
f08286c
doc: Refactor documents and examples of disaggregated serving and wid…
kaiyux Jul 23, 2025
9538c8d
Add basic Nemo Ckpt Lora Loading in pytorch flow (#6019)
venkywonka Jul 23, 2025
2193ad3
[https://nvbugs/5387771] fix deadlocks due to insufficient numSemapho…
PerkzZheng Jul 23, 2025
5636c67
fix: nvbug_5398806 (#6239)
hchings Jul 23, 2025
83c3ed1
chore: set default device to cpu on Multimodal models (#5994)
yechank-nvidia Jul 23, 2025
a8253b9
chore: remove duplicate should_stop_processing check (#6242)
QiJune Jul 23, 2025
fca13b8
hopper-style context MLA (#5713)
zhou-yuxin Jul 23, 2025
ed62a06
[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue (#6136)
yweng0828 Jul 23, 2025
2b0fa24
test: [CI] Add failed cases into waives.txt (#6289)
xinhe-nv Jul 23, 2025
2486eb7
[TRTLLM-6651][feat] Enable Overlap scheduler + Beam Search in TRTLL…
stnie Jul 23, 2025
cb737a5
[Infra] - Skip failed cases (#6299)
EmmaQiaoCh Jul 23, 2025
cf4f4e8
[AutoDeploy] disable flaky MoE nvfp4 test (#6302)
lucaslie Jul 23, 2025
19696a6
[feat] Update .coderabbit.yaml with review settings and code guidelin…
venkywonka Jul 23, 2025
7740bfa
Waive tests (#6312)
Tabrizian Jul 24, 2025
82d03ca
[Infra] - Increase unittest execution time since some test exceeds 16…
EmmaQiaoCh Jul 24, 2025
5fceaa6
Revert "tests: add timeout_manager to tensorrt flow test cases (#5942…
Tabrizian Jul 24, 2025
31d3eff
doc: fix invalid links related with llm api example (#6317)
nv-guomingz Jul 24, 2025
428e340
chore: remove unused variables in pyexecutor (#6280)
QiJune Jul 24, 2025
a63a1ac
[TRTLLM-6444] Add some UCX trouble shooting docs and print UCX relate…
reasonsolo Jul 24, 2025
14d94a3
feat: Add non UB AR + Residual + Norm + Quant fusion (#6320)
liji-nv Jul 24, 2025
0ffcf9a
Update fmhaRunner.cpp to fix guardwords scan error (#6327)
zhou-yuxin Jul 24, 2025
f290108
tests: only get timeout value from pytest marker (#6287)
crazydemo Jul 24, 2025
0cc1f8c
[Infra] - Wiave failed tests in post-merge (#6331)
EmmaQiaoCh Jul 24, 2025
7b6aadc
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 w…
byshiue Jul 24, 2025
62298bc
perf: customize cublastLt algo for Llamba 3.3 70B TP4 (#6315)
zhenhuaw-me Jul 24, 2025
706f421
[Fix] the bug in the trtllm-gen heurisitcf for MLA kernels. (#6284)
PerkzZheng Jul 24, 2025
ff72ca9
Improve TransferAgentTest.SyncMessage (#6250)
bo-nv Jul 24, 2025
0df758e
[TRTLLM-6650][feat] Enhance beam search support with CUDA graph integ…
stnie Jul 24, 2025
f8f5ba6
[fix] Update to remove popping of KV cache and other args. (#6310)
FrankD412 Jul 24, 2025
375f74e
[fix][nvbugs/5399355] Fix Lamport buffer clear issue for MNNVL TwoSho…
timlee0212 Jul 25, 2025
9a99e6d
fix: integration tests with nanobind (#6326)
Linda-Stadter Jul 25, 2025
0f2f11f
[TRTLLM-6453][feat] Support chunked prefill on spec decode 2 model (#…
mikeiovine Jul 25, 2025
2dcfa90
test: skip llama3.3 70b test on cg4 (#6293)
xinhe-nv Jul 25, 2025
d974198
[TRTLLM-5312] - Add bot run rules for triton tests (#4988)
yiqingy0 Jul 25, 2025
6268a60
tests: add test_chunked_prefill for llama4 (#5549)
xinhe-nv Jul 25, 2025
e07fff4
[https://nvbugs/5340941] - fix: Correct custom ops used by Qwen3 Moe …
liji-nv Jul 25, 2025
470544c
test: [CI] Add failed cases into waives.txt (#6333)
xinhe-nv Jul 25, 2025
a0aecf0
[feat]: support logit_bias (#5354)
xq25478 Jul 25, 2025
3805976
fix: Fixing kv_cache_events unit tests [nvbug 5362412] (#6265)
pcastonguay Jul 25, 2025
b8d4cb8
feat: Support JSON Schema in OpenAI-Compatible API (#6321)
nv-guomingz Jul 25, 2025
7bff341
[doc] Add NGram tech blog (#6311)
SimengLiu-nv Jul 25, 2025
1e5e71a
Mtp optimizations round1 (#5689)
ameynaik-hub Jul 25, 2025
c35c78f
[fix][nvbugs/5390810] Improve the check for disaggregated serving tes…
Tabrizian Jul 25, 2025
08d5712
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when at…
moraxu Jul 25, 2025
54f6828
fix precompiled multi_query_token kernel not having is_fp8_out hash k…
jhaotingc Jul 26, 2025
96d004d
doc: fix invalid link in llama 4 example documentation (#6340)
lianakoleva Jul 26, 2025
d853811
[https://nvbugs/5402719][fix]: Add cuda graph dummy requests to the s…
ziyixiong-nv Jul 27, 2025
908f49a
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch (#6359)
Superjomn Jul 28, 2025
2dd3186
fix: remove cudaStreamSynchronize when using relaxed acceptance (#5262)
yweng0828 Jul 28, 2025
93a0fd0
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Ll…
hyukn Jul 28, 2025
f172fac
DeepEP LL dispatch FP4 (#6296)
yilin-void Jul 28, 2025
dc75779
[nvbugs/5401156][fix] Avoid import all models when import trtllm._com…
chang-l Jul 28, 2025
97f7e12
[fix] Fix perf regression caused by MoE autotuner when using DeepEPLo…
jinyangyuan-nvidia Jul 28, 2025
c9b8b61
Add Acceptance Rate calculation to benchmark_serving (#6240)
zerollzeng Jul 28, 2025
b3ca159
[Infa] - waive failed cases and fix a typo (#6384)
EmmaQiaoCh Jul 28, 2025
2945817
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name (#…
crazydemo Jul 28, 2025
45d441e
[TRTLLM-5061] chore: add status tags to LLM API reference (#5707)
Superjomn Jul 28, 2025
413a83f
fix: compatibility with CUDA < 12.9 on `__CUDA_ARCH_SPECIFIC__` macro…
tongyuantongyu Jul 28, 2025
4efc649
chore: add _prepare_and_schedule_batch function in PyExecutor (#6365)
QiJune Jul 28, 2025
971be1f
test: waive failed cases (#6394)
xinhe-nv Jul 28, 2025
03632a6
test: organize perf cases and add missing perflab cases in qa test li…
ruodil Jul 28, 2025
4904473
chore: delete useless gitkeep files. (#6400)
nv-guomingz Jul 28, 2025
60e4d3a
[test] Add accuracy regression test for Mistral3.1 (#6322)
2ez4bz Jul 28, 2025
cdca541
[test] Unwaive mistral3.1 small E2E test (#6352)
2ez4bz Jul 28, 2025
608ed89
[None][infra]Update slurm config keys (#6370)
yuanjingx87 Jul 28, 2025
bca1415
[infra] Add an auto-labeling github action to TRTLLM (#6373)
poweiw Jul 28, 2025
738ab61
[nvbugs/5404000] fix: waive request_perf_metrics_draft test on pre-Ho…
achartier Jul 28, 2025
2573bb7
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy…
moraxu Jul 28, 2025
2d21bca
[infra] Remove auto_apply_labels option from .coderabbit.yaml reviews…
venkywonka Jul 28, 2025
ee3cbb0
[fix] Add trust_remote_code option to prepare_dataset. (#6338)
FrankD412 Jul 28, 2025
64ba483
infra: [TRTLLM-6499] Split L0_Test into two pipeline by single GPU an…
ZhanruiSunCh Jul 29, 2025
e58afa5
doc: Add README for wide EP (#6356)
kaiyux Jul 29, 2025
d2a04ab
[fix] Fixes to parameter usage and low latency configuration. (#6343)
FrankD412 Jul 29, 2025
e11255e
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and lla…
ruodil Jul 29, 2025
13e24ab
chore: remove unused code in PyExecutor (#6351)
QiJune Jul 29, 2025
0eee2e2
[5385981] fix: Update the usage of VisionAttention init API. (#6413)
hyukn Jul 29, 2025
4fbb344
test: [CI] Add failed cases into waives.txt (#6423)
xinhe-nv Jul 29, 2025
f1086e7
test: [CI] remove closed bugs (#6381)
xinhe-nv Jul 29, 2025
7231134
doc: remove backend parameter for trtllm-bench when backend is set to…
nv-guomingz Jul 29, 2025
c3729db
infra: [TRTLLM-5873] Use build stage wheels to speed up docker releas…
ZhanruiSunCh Jul 29, 2025
7efe3cb
[fix] Add detokenization-based stop word logic to LLM API (#5948)
moraxu Jul 29, 2025
1a69309
chore: remove unused kv_cache_dtype in api reference (#6444)
Superjomn Jul 29, 2025
ad662dd
chore: disallow arbitrary in llm_args.Configs (#6367)
Superjomn Jul 29, 2025
1a8e28d
[FIX] fix bugs caused by None attention_bias during Qwen3 model conve…
Fan-Yunfan Jul 29, 2025
d6eb8e2
fix: support mixture of text & multimodal prompts (#6345)
yechank-nvidia Jul 30, 2025
ab40369
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in…
venkywonka Jul 30, 2025
5b420ad
Rename layer to comply with deepseek (#6393)
peaceh-nv Jul 30, 2025
c00d676
test: [CI] Add failed cases into waives.txt (#6457)
xinhe-nv Jul 30, 2025
c9ed1ab
[TRTLLM-6549] chore: record delay introduced by disaggregated serving…
zhengd-nv Jul 30, 2025
a427f5b
[fix] Fix wide EP when using DeepEP with online EPLB (#6429)
jinyangyuan-nvidia Jul 30, 2025
d6eed1b
[fix] Switch placement of image placeholder for mistral 3.1 (#6435)
2ez4bz Jul 30, 2025
1f39a11
chore: clean code of PyExecutor (#6445)
QiJune Jul 30, 2025
9171f88
Remove deprecated lora args from BaseLlmArgs, using peft_cache_config…
amitz-nv Jul 29, 2025
07cde29
Enabled use of LoraConfig in TRT_python flow, added tests of expected…
amitz-nv Jul 29, 2025
eabe716
Improve comments in tests
amitz-nv Jul 29, 2025
d1a896f
Correct mistake in PeftCacheConfig.num_device_module_layer description
amitz-nv Jul 29, 2025
e90872a
Add validation of unsupported field in peft cache manager
amitz-nv Jul 29, 2025
7e4e37c
Fix docstring line length
amitz-nv Jul 29, 2025
004eaf9
Fix validate_peft_cache_config
amitz-nv Jul 29, 2025
1afafa7
Fix validate_peft_cache_config formatting
amitz-nv Jul 29, 2025
c486af2
Fix lora_prefetch_dir description and 'unsupported warning' message, …
amitz-nv Jul 29, 2025
138c4b1
Fix tests to configure lora cache size by number of adapters for test…
amitz-nv Jul 29, 2025
e26ca0a
Fix tests to API update - use LoraConfig instead of base LLM args for…
amitz-nv Jul 29, 2025
ef99dd2
Fix tests to explicitly configure lora_config's max_loras and max_cpu…
amitz-nv Jul 29, 2025
797715e
Define default values in PeftCacheConfig model class for device_cache…
amitz-nv Jul 29, 2025
53b4233
Add default value to description
amitz-nv Jul 29, 2025
0d51a80
Fix PeftCacheConfig.create_from_pybind after changing python fields t…
amitz-nv Jul 29, 2025
e0fcbeb
Fix examples/llm-api/llm_multilora.py - use one LoraConfig
amitz-nv Jul 29, 2025
61a994b
Fix examples/llm-api/llm_multilora.py to not use BuildConfig that's i…
amitz-nv Jul 29, 2025
191a0ed
Changed create_from_pybind method to be a more generic classmethod in…
amitz-nv Jul 29, 2025
8cca194
Minor docstring fix
amitz-nv Jul 29, 2025
391d0f9
Fix rename
amitz-nv Jul 29, 2025
bce06ad
Fix test_ptp_quickstart_multimodal_phi4mm - for stability set lora ca…
amitz-nv Jul 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .clangd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ CompileFlags:
# Tweak the clangd parse settings for all files
CompileFlags:
Compiler: clang++
CompilationDatabase: .
CompilationDatabase: cpp/build
Add:
# report all errors
- "-ferror-limit=0"
Expand Down
19 changes: 18 additions & 1 deletion .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,26 @@
# limitations under the License.

# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
# https://docs.coderabbit.ai/getting-started/configure-coderabbit/
# In PR, comment "@coderabbitai configuration" to get the full config including defaults
language: "en-US"
reviews:
profile: chill
auto_title_placeholder: '@coderabbitai title'
auto_title_instructions: 'Should follow the format: "[fix/feat/doc/infra/...] \<summary of this PR\>". Keep it concise.'
commit_status: false
collapse_walkthrough: true
assess_linked_issues: true
related_issues: true
related_prs: true
suggested_labels: true
suggested_reviewers: true
auto_assign_reviewers: true
poem: false
auto_review:
drafts: true
base_branches: ["main", "release/.+"]
commit_status: false
knowledge_base:
code_guidelines:
enabled: true
filePatterns: ["**/CODING_GUIDELINES.md"]
21 changes: 16 additions & 5 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,29 +38,40 @@ See details below for each supported subcommand.

<details>

`run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]`
`run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]`

Launch build/test pipelines. All previously running jobs will be killed.

`--reuse-test (optional)pipeline-id ` *(OPTIONAL)* : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

`--disable-reuse-test ` *(OPTIONAL)* : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

`--disable-fail-fast ` *(OPTIONAL)* : Disable fail fast on build/tests/infra failures.

`--skip-test ` *(OPTIONAL)* : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does **NOT** update GitHub check status.

`--stage-list "A10-1, xxx"` *(OPTIONAL)* : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does **NOT** update GitHub check status.
`--stage-list "A10-PyTorch-1, xxx"` *(OPTIONAL)* : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does **NOT** update GitHub check status.

`--gpu-type "A30, H100_PCIe"` *(OPTIONAL)* : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does **NOT** update GitHub check status.

`--test-backend "pytorch, cpp"` *(OPTIONAL)* : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does **NOT** update GitHub pipeline status.

`--only-multi-gpu-test ` *(OPTIONAL)* : Only run the multi-GPU tests. Note: Does **NOT** update GitHub check status.

`--disable-multi-gpu-test ` *(OPTIONAL)* : Disable the multi-GPU tests. Note: Does **NOT** update GitHub check status.

`--add-multi-gpu-test ` *(OPTIONAL)* : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.
`--add-multi-gpu-test ` *(OPTIONAL)* : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

`--post-merge ` *(OPTIONAL)* : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

`--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"` *(OPTIONAL)* : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".
`--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"` *(OPTIONAL)* : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

`--detailed-log ` *(OPTIONAL)* : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

`--debug ` *(OPTIONAL)* : **Experimental feature**. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the `stage-list` parameter to access the appropriate container environment. Note: Does **NOT** update GitHub check status.

For guidance on mapping tests to stage names, see `docs/source/reference/ci-overview.md`.
For guidance on mapping tests to stage names, see `docs/source/reference/ci-overview.md`
and the `scripts/test_to_stage_mapping.py` helper.

### kill

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/blossom-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
startsWith(github.event.comment.body, '/bot skip --comment') ||
startsWith(github.event.comment.body, '/bot reuse-pipeline') ||
startsWith(github.event.comment.body, '/bot kill')) && contains(
fromJson('["byshiue","chuangz0","funatiq","hypdeb","jdemouth-nvidia","joyang-nv","lowsfer","Tabrizian","yweng0828","Shixiaowei02","MartinMarciniszyn","schetlur-nv","dcampora","pcastonguay","Naveassaf","lfr-0531","nekorobov","PerkzZheng","kaiyux","nv-guomingz","LinPoly","thorjohnsen","jiahanc","latency1024","tburt-nv","zeroepoch","chzblych","niukuo","ZhanruiSunCh","EmmaQiaoCh","yiqingy0","achartier","suyoggupta","amukkara","mk-nvidia","QiJune","lucaslie","davidmlw","hlu1","nvzhou","syuoni","NVGaryJi","symphonylyh","hello-11","zongfeijing","Jackch-NV","jinyangyuan-nvidia","LarryXFly","crazydemo","jaedeok-nvidia","wm2012011492","rosenrodt","zhuoyao1012","xinhe-nv","Yuening-wa","Shunkangz","zhengd-nv","yibinl-nvidia","StanleySun639","KingsleyLiu-NV","kxdc","yingcanw","BestJuly","ChristinaZ","bobboli","xueweilnvidia","kunlunl","cherichy","lucifer1004","Autumn1998","litaotju","peaceh-nv","liji-nv","SimengLiu-nv","yuxianq","yechank-nvidia","vallis-neria","DylanChen-NV","Tracin","zhhuang-nv","ISEEKYAN","xupinjie","tongyuantongyu","laikhtewari","zhuolingwang","dominicshanshan","jershi425","shifangx","StudyingShao","Superjomn","dongjiyingdjy","guangyunh-nv","wili-65535","tiffany940107","DanBlanaru","mikeiovine","djns99","ruodil","xiaoweiw-nv","xuwchen","bashimao","yizhang-nv","hyukn","nvpohanh","yuki-666","juney-nvidia","barry-delaney","Kefeng-Duan","MinaHuai","yilin-void","jhaotingc","jmydurant","katec846","CarstyYou","Njuapp","Jie-Fang","nvbrantz","inocsin","ruoqianguo","chenfeiz0326","ming-wei","eopXD","longlee0622","dongfengy","georgeliu95","evezhier","rakib-hasan","shangz-ai","JyChang012","wangsiping1997","yuanjings-nvda","tomeras91","roikoren755","amirkl94","shaharmor98","danielafrimi","amitz-nv","hijkzzz","rzilberstein-nvidia","dc3671","hchings","yuhengxnv","dongxuy04","qiaoxj07","omera-nv","DomBrown","brb-nv","FrankD412","yuhsuan-t","Fridah-nv","a-mccarthy","HuiGao-NV","alexmsettle","meenchen","sugunav14","cjluo-nv","kyleliang-nv","chang-l","WeiHaocheng","qixiang-99","BatshevaBlack","ebarilanM","xmchen1987","lingjiew","heyuhhh","netanel-haber","jiefangz-nv","wyw1267","yunruis","sklevtsov-nvidia","jgangani","pamelap-nvidia","ixlmar","GalSha","Dido0o0","rabiel","nvzhihanj","milesial","fzmu727","zackyoray","RoeyAzran1992","viraatc","v-shobhit","yuanjingx87","uchihatmtkinu","nvrohanv","vegaluisjose","qsang-nv","ChunhuanLin","timlee0212","venkywonka","zbpatel","tijyojwad","shyeh25","zihaok","nv-yilinf","ttyio","farazkh80","yuantailing","JennyLiu-nv","moraxu","IzzyPutterman","nvchenghaoz","nvxuanyuc","poweiw","stnie","zhanga5","nzmora-nvidia","greg-kwasniewski1","linda-stadter","Tom-Zheng","vanshilshah97","ixlmar","MatthiasKohl","Wanli-Jiang", "arekay", "davidclark-nv", "2ez4bz", "tcherckez-nvidia", "MrGeva", "galagam", "limin2021", "dhansen-nvidia","talorabr","kanghui0204","wu6u3tw","hvagadia","xavier-nvidia","raayandhar","dbari","nvjullin","elvischenv","zhenhuaw-me","weireweire","yifeizhang-c","jiaganc","ziyixiong-nv","FelixXidddd","JunyiXu-nv","bo-nv","zerollzeng","RayenTian","ameynaik-hub"]'),
fromJson('["byshiue","chuangz0","funatiq","hypdeb","jdemouth-nvidia","joyang-nv","lowsfer","Tabrizian","yweng0828","Shixiaowei02","MartinMarciniszyn","schetlur-nv","dcampora","pcastonguay","Naveassaf","lfr-0531","nekorobov","PerkzZheng","kaiyux","nv-guomingz","LinPoly","thorjohnsen","jiahanc","latency1024","tburt-nv","zeroepoch","chzblych","niukuo","ZhanruiSunCh","EmmaQiaoCh","yiqingy0","achartier","suyoggupta","amukkara","mk-nvidia","QiJune","lucaslie","davidmlw","hlu1","nvzhou","syuoni","NVGaryJi","symphonylyh","hello-11","zongfeijing","Jackch-NV","jinyangyuan-nvidia","LarryXFly","crazydemo","jaedeok-nvidia","wm2012011492","rosenrodt","zhuoyao1012","xinhe-nv","Yuening-wa","Shunkangz","zhengd-nv","yibinl-nvidia","StanleySun639","KingsleyLiu-NV","kxdc","yingcanw","BestJuly","ChristinaZ","bobboli","xueweilnvidia","kunlunl","cherichy","lucifer1004","Autumn1998","litaotju","peaceh-nv","liji-nv","SimengLiu-nv","yuxianq","yechank-nvidia","vallis-neria","DylanChen-NV","Tracin","zhhuang-nv","ISEEKYAN","xupinjie","tongyuantongyu","laikhtewari","zhuolingwang","dominicshanshan","jershi425","shifangx","StudyingShao","Superjomn","dongjiyingdjy","guangyunh-nv","wili-65535","tiffany940107","DanBlanaru","mikeiovine","djns99","ruodil","xiaoweiw-nv","xuwchen","bashimao","yizhang-nv","hyukn","nvpohanh","yuki-666","juney-nvidia","barry-delaney","Kefeng-Duan","MinaHuai","yilin-void","jhaotingc","jmydurant","katec846","CarstyYou","Njuapp","Jie-Fang","nvbrantz","inocsin","ruoqianguo","chenfeiz0326","ming-wei","eopXD","longlee0622","dongfengy","georgeliu95","evezhier","rakib-hasan","shangz-ai","JyChang012","wangsiping1997","yuanjings-nvda","tomeras91","roikoren755","amirkl94","shaharmor98","danielafrimi","amitz-nv","hijkzzz","rzilberstein-nvidia","dc3671","hchings","yuhengxnv","dongxuy04","qiaoxj07","omera-nv","DomBrown","brb-nv","FrankD412","yuhsuan-t","Fridah-nv","a-mccarthy","HuiGao-NV","alexmsettle","meenchen","sugunav14","cjluo-nv","kyleliang-nv","chang-l","WeiHaocheng","qixiang-99","BatshevaBlack","ebarilanM","xmchen1987","lingjiew","heyuhhh","netanel-haber","jiefangz-nv","wyw1267","yunruis","sklevtsov-nvidia","jgangani","pamelap-nvidia","ixlmar","GalSha","Dido0o0","rabiel","nvzhihanj","milesial","fzmu727","zackyoray","RoeyAzran1992","viraatc","v-shobhit","yuanjingx87","uchihatmtkinu","nvrohanv","vegaluisjose","qsang-nv","ChunhuanLin","timlee0212","venkywonka","zbpatel","tijyojwad","shyeh25","zihaok","nv-yilinf","ttyio","farazkh80","yuantailing","JennyLiu-nv","moraxu","IzzyPutterman","nvchenghaoz","nvxuanyuc","poweiw","stnie","zhanga5","nzmora-nvidia","greg-kwasniewski1","linda-stadter","Tom-Zheng","vanshilshah97","ixlmar","MatthiasKohl","Wanli-Jiang", "arekay", "davidclark-nv", "2ez4bz", "tcherckez-nvidia", "MrGeva", "galagam", "limin2021", "dhansen-nvidia","talorabr","kanghui0204","wu6u3tw","hvagadia","xavier-nvidia","raayandhar","dbari","nvjullin","elvischenv","zhenhuaw-me","weireweire","yifeizhang-c","jiaganc","ziyixiong-nv","FelixXidddd","JunyiXu-nv","bo-nv","zerollzeng","RayenTian","ameynaik-hub","raymochen","shuyixiong","johncalesp","leslie-fang25","reasonsolo","zhou-yuxin","vadiklyutiy","yali-arch","NVShreyas","h-guo18","pengbowang-nv"]'),
github.actor)
steps:
- name: Check if comment is issued by authorized person
Expand Down
13 changes: 9 additions & 4 deletions .github/workflows/bot-command.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,22 @@ jobs:
"Run `/bot [-h|--help]` to print this help message.\n\n" +
"See details below for each supported subcommand.\n\n" +
"<details>\n\n" +
"`run [--disable-fail-fast --skip-test --stage-list \"A10-1, xxx\" --gpu-type \"A30, H100_PCIe\" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage \"H100_PCIe-[Post-Merge]-1, xxx\"]`\n\n" +
"`run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list \"A10-PyTorch-1, xxx\" --gpu-type \"A30, H100_PCIe\" --test-backend \"pytorch, cpp\" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage \"H100_PCIe-TensorRT-Post-Merge-1, xxx\" --detailed-log --debug(experimental)]`\n\n" +
"Launch build/test pipelines. All previously running jobs will be killed.\n\n" +
"`--reuse-test (optional)pipeline-id ` *(OPTIONAL)* : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.\n\n" +
"`--disable-reuse-test ` *(OPTIONAL)* : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.\n\n" +
"`--disable-fail-fast ` *(OPTIONAL)* : Disable fail fast on build/tests/infra failures.\n\n" +
"`--skip-test ` *(OPTIONAL)* : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does **NOT** update GitHub check status.\n\n" +
"`--stage-list \"A10-1, xxx\"` *(OPTIONAL)* : Only run the specified test stages. Examples: \"A10-1, xxx\". Note: Does **NOT** update GitHub check status.\n\n" +
"`--stage-list \"A10-PyTorch-1, xxx\"` *(OPTIONAL)* : Only run the specified test stages. Examples: \"A10-PyTorch-1, xxx\". Note: Does **NOT** update GitHub check status.\n\n" +
"`--gpu-type \"A30, H100_PCIe\"` *(OPTIONAL)* : Only run the test stages on the specified GPU types. Examples: \"A30, H100_PCIe\". Note: Does **NOT** update GitHub check status.\n\n" +
"`--test-backend \"pytorch, cpp\"` *(OPTIONAL)* : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: \"pytorch, cpp\" (does not run test stages with tensorrt or triton backend). Note: Does **NOT** update GitHub pipeline status.\n\n" +
"`--only-multi-gpu-test ` *(OPTIONAL)* : Only run the multi-GPU tests. Note: Does **NOT** update GitHub check status.\n\n" +
"`--disable-multi-gpu-test ` *(OPTIONAL)* : Disable the multi-GPU tests. Note: Does **NOT** update GitHub check status.\n\n" +
"`--add-multi-gpu-test ` *(OPTIONAL)* : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.\n\n" +
"`--add-multi-gpu-test ` *(OPTIONAL)* : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.\n\n" +
"`--post-merge ` *(OPTIONAL)* : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.\n\n" +
"`--extra-stage \"H100_PCIe-[Post-Merge]-1, xxx\"` *(OPTIONAL)* : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage \"H100_PCIe-[Post-Merge]-1, xxx\".\n\n" +
"`--extra-stage \"H100_PCIe-TensorRT-Post-Merge-1, xxx\"` *(OPTIONAL)* : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage \"H100_PCIe-TensorRT-Post-Merge-1, xxx\".\n\n" +
"`--detailed-log ` *(OPTIONAL)* : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.\n\n" +
"`--debug ` *(OPTIONAL)* : **Experimental feature**. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the `stage-list` parameter to access the appropriate container environment. Note: Does **NOT** update GitHub check status.\n\n" +
"### kill\n\n" +
"`kill `\n\n" +
"Kill all running builds associated with pull request.\n\n" +
Expand Down
47 changes: 47 additions & 0 deletions .github/workflows/label_issue.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Label New Issues

on:
issues:
types: [opened]

permissions:
issues: write
contents: read

jobs:
label-issue:
runs-on: ubuntu-latest
steps:
- name: Checkout private action repository
uses: actions/checkout@v4
with:
repository: poweiw/goggles_action
path: ./.github/actions/goggles_action # local path to store the action
token: ${{ secrets.GOGGLES_ACTION_REPO_TOKEN}} # token to access poweiw/goggles_action
ref: v1.2.1

- name: AI Label Issue
uses: ./.github/actions/goggles_action/actions/llm_label
with:
ACTION_TOKEN: ${{ secrets.GITHUB_TOKEN }}
LLM_MODEL_NAME: ${{ secrets.GOGGLES_LLM_MODEL_NAME }}
LLM_TOKEN_SERVER_URL: ${{ secrets.GOGGLES_LLM_TOKEN_SERVER_URL }}
LLM_TOKEN_CLIENT_ID: ${{ secrets.GOGGLES_LLM_TOKEN_CLIENT_ID }}
LLM_TOKEN_CLIENT_SECRET: ${{ secrets.GOGGLES_LLM_TOKEN_CLIENT_SECRET }}
LLM_GENERATE_URL: ${{ secrets.GOGGLES_LLM_GENERATE_URL }}
LLM_TOKEN_SCOPE: ${{ secrets.GOGGLES_LLM_TOKEN_SCOPE }}
REPO_OWNER: ${{ github.repository_owner }}
REPO_NAME: ${{ github.event.repository.name }}
ISSUE_NUMBER: ${{ github.event.issue.number }}
ISSUE_TITLE: ${{ github.event.issue.title }}
ISSUE_BODY: ${{ github.event.issue.body }}
GITHUB_API_URL: ${{ github.api_url }}
ACTIONS_STEP_VERBOSE: false
EXCLUDED_LABELS: "bug,Community want to contribute,Community Engagement,duplicate,help wanted,Investigating,need more info,question,roadmap,stale,waiting for feedback,wontfix"
LLM_SYSTEM_PROMPT: |
You are an expert GitHub issue labeler. Your task is to analyze the provided issue title, issue body, and a list of available labels with their descriptions.
Based on this information, select the single most appropriate label from the list that best captures the primary issue or request.
Prefer selecting only one label that represents the main topic or problem. Only suggest multiple labels if the issue genuinely spans multiple distinct areas that are equally important.
Respond with ONLY the chosen label name (e.g., 'bug', 'feature-request') or comma-separated names if multiple are truly needed.
If no labels seem appropriate, respond with 'NONE'.
Do not add any other text, explanation, or markdown formatting.
Loading