Skip to content

Commit 1469922

Browse files
Update changelog for 2.5.0 (#14890)
* beep boop: Update changelog Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Remove 2.4.0 cherry-picks Signed-off-by: Charlie Truong <[email protected]> * Add speech highlights Signed-off-by: Charlie Truong <[email protected]> * Update changelog Signed-off-by: Charlie Truong <[email protected]> * Update the changelog Signed-off-by: Charlie Truong <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Charlie Truong <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent cbfca94 commit 1469922

File tree

1 file changed

+219
-0
lines changed

1 file changed

+219
-0
lines changed

CHANGELOG.md

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,225 @@
11
# Changelog
22

33
<!-- Next changelog -->
4+
## NVIDIA Neural Modules 2.5.0
5+
6+
### Highlights
7+
8+
- Collections:
9+
- LLM
10+
- Nano v2 12B and 9B
11+
- Speech
12+
- New SpeechLM2 collection
13+
- Streaming Softformer model
14+
- Deprecate Confidence Ensemble models
15+
- parakeet-tdt-0.6b-v3 and canary-1b-v2 models
16+
- Added chunk inference support with .transcribe() for canary based models
17+
- Enable prediction of timestamps with streaming ASR
18+
- Improve ASR models’ invariance to padding/batch size
19+
- Qwen prompt format support, SALM generation fixes
20+
- High-level SALM model.generate API closely resembling HF models
21+
- SALM model initialization with time/memory optimization
22+
- SpeechLM2: fixed excessive padding, support on-the-fly resampling for SALM
23+
24+
- Automodel and Export-Deploy functionality are available in their individual repositories respectively and deprecated in NeMo2
25+
26+
### Detailed Changelogs:
27+
28+
#### ASR
29+
30+
<details><summary>Changelog</summary>
31+
32+
- Modernize logger interface by @emmanuel-ferdman :: PR: #13783
33+
- Higher-level API for SALM.generate by @pzelasko :: PR: #14034
34+
- add/refactor docs for asr lm customization by @lilithgrigoryan :: PR: #14088
35+
- Improve NEST GPU Utilization 1/N by @MahmoudAshraf97 :: PR: #14086
36+
- Improve ASR models' invariance to padding/batch size by @pzelasko :: PR: #13827
37+
- Clean up transducer decoding initialization by @artbataev :: PR: #14112
38+
- Improve NEST GPU Utilization 2/N by @MahmoudAshraf97 :: PR: #14089
39+
- GPU-accelerated Phrase-Boosting (GPU-PB) for AED decoding by @andrusenkoau :: PR: #14108
40+
- Fix decoding with ngpu-lm when training (#13994) by @hoangtran9122 :: PR: #13995
41+
- fix eval_beamsearch_ngram_ctc script by @lilithgrigoryan :: PR: #14238
42+
- fix wrong typing for ctc-ws context graph by @andrusenkoau :: PR: #14262
43+
- fix frame vad by @stevehuang52 :: PR: #14337
44+
- Improve NEST GPU Utilization 3/N by @MahmoudAshraf97 :: PR: #14234
45+
- remove confidence ensemble models by @lilithgrigoryan :: PR: #14343
46+
- Fix ASR decoding issues with CUDA graphs in training by @artbataev :: PR: #14184
47+
- Streaming Sortformer release PR01: uploading bugfixes, refactored variables and yaml file name changes by @tango4j :: PR: #14416
48+
- Streaming Sortformer release PR02: unit tests for streaming models and modules by @tango4j :: PR: #14417
49+
- GPU-accelerated Phrase-Boosting (GPU-PB) for CTC, RNN-T, and TDT decoding by @andrusenkoau :: PR: #14277
50+
- Fix subsampling chunking test by @monica-sekoyan :: PR: #14452
51+
- Canary2 with NFA by @monica-sekoyan :: PR: #14121
52+
- Initial Chunking by @nune-tadevosyan :: PR: #14321
53+
- Chunking fix by @nune-tadevosyan :: PR: #14482
54+
- Tutorial and doc update by @nune-tadevosyan :: PR: #14484
55+
- Streaming Sortformer release PR03: NeMo documentations and tutorial notebook by @tango4j :: PR: #14388
56+
- Add wget_from_nemo by @nune-tadevosyan :: PR: #14623
57+
- Downgrade "datasets" library version in ASR tutorial to ensure compatibility with HF Datasets used by @KunalDhawan :: PR: #14685
58+
- Canary tutorial fix by @nune-tadevosyan :: PR: #14708
59+
- Force activations and weights cast to FP32 Jasper Encoder Squeeze-Excite by @erastorgueva-nv :: PR: #14715
60+
61+
</details>
62+
63+
#### TTS
64+
65+
<details><summary>Changelog</summary>
66+
67+
- Improve ASR models' invariance to padding/batch size by @pzelasko :: PR: #13827
68+
- remove nlp modules by @dimapihtar :: PR: #14127
69+
- Temporarily Remove Encoder PP Support by @yaoyu-33 :: PR: #14167
70+
- Remove T5-TTS by @blisc :: PR: #14252
71+
72+
</details>
73+
74+
#### NLP / NMT
75+
76+
<details><summary>Changelog</summary>
77+
78+
- add extra params for MegatronDataSampler by @dimapihtar :: PR: #13956
79+
- Modernize logger interface by @emmanuel-ferdman :: PR: #13783
80+
- remove dialogue collection by @dimapihtar :: PR: #14087
81+
- remove QA collection by @dimapihtar :: PR: #14092
82+
- remove text nlp collection by @dimapihtar :: PR: #14110
83+
- remove nlp modules by @dimapihtar :: PR: #14127
84+
- remove rag collection by @dimapihtar :: PR: #14157
85+
- remove nmt collection by @dimapihtar :: PR: #14191
86+
- Fix importerror in transformer_lm_model after nlp module removals by @chtruong814 :: PR: #14199
87+
- fix QA comments NVBug by @huvunvidia :: PR: #14196
88+
- Temporarily Remove Encoder PP Support by @yaoyu-33 :: PR: #14167
89+
- remove mixins collections by @dimapihtar :: PR: #14281
90+
- feat: print expert groups on megatron init by @clumsy :: PR: #13874
91+
- [speechlm2] [lhotse] sharegpt data and testloader by @huckiyang :: PR: #14294
92+
- Add notebook for LoRA on GPT-OSS-20B by @shashank3959 :: PR: #14439
93+
- Sketch dist-ckpt content versioning by @mikolajblaz :: PR: #13839
94+
- Change to enable full iteration CUDA graph for LLMs by @vasunvidia :: PR: #14077
95+
96+
</details>
97+
98+
#### Text Normalization / Inverse Text Normalization
99+
100+
<details><summary>Changelog</summary>
101+
102+
- Check lightning and core imports in install test by @chtruong814 :: PR: #14403
103+
104+
</details>
105+
106+
#### Export
107+
108+
<details><summary>Changelog</summary>
109+
110+
- ci: Set L2_NeMo_2_Export_Deploy_Query_In_Framework to be optional by @chtruong814 :: PR: #13946
111+
- Remove old export doc by @oyilmaz-nvidia :: PR: #14292
112+
- Llama4 Export: Remove outdated MLP weight transform by @suiyoubi :: PR: #14297
113+
- Update mllama hf import/export for transformers 4.53 by @meatybobby :: PR: #14327
114+
115+
</details>
116+
117+
#### Bugfixes
118+
119+
<details><summary>Changelog</summary>
120+
121+
- Bugfix for Hyena to the get_t function which comes up when doing longer context inference by @jstjohn :: PR: #14256
122+
- fix skipped cuHyena kernel while training by @farhadrgh :: PR: #14365
123+
- Remove flaky Evo2 dataset performance test by @jstjohn :: PR: #14371
124+
- Use module prefix in restore_modelopt_state by @jenchen13 :: PR: #14384
125+
126+
</details>
127+
128+
#### Uncategorized:
129+
130+
<details><summary>Changelog</summary>
131+
132+
- Version bump to `2.5.0rc0.dev0` by @github-actions[bot] :: PR: #13944
133+
- [Llama4] Enable tp comm overlap for llama4 by @gdengk :: PR: #13940
134+
- Fix for Squad Dataset Download by @rhmukundan :: PR: #13893
135+
- add nmh HF conversion by @JRD971000 :: PR: #13941
136+
- Speechlm2 SALM improvements by @pzelasko :: PR: #13829
137+
- fix dataset issue by @dimapihtar :: PR: #13953
138+
- Editing MMLU to pull from the correct repo by @ruchaa-apte :: PR: #13991
139+
- move classes to module to use __target__ feature (#14023) by @nithinraok :: PR: #14031
140+
- Add Nemotron-H prompt format, fix cut-to-conversation custom attr propagation by @pzelasko :: PR: #13963
141+
- Bump release_library template to v0.40.0 by @chtruong814 :: PR: #14046
142+
- [automodel] add support for layer-freezing by @akoumpa :: PR: #14000
143+
- [Qwen3] Recipe config bug fix by @gdengk :: PR: #14084
144+
- Add TE import guard in qwen2vl vision module by @chtruong814 :: PR: #14091
145+
- Update bitsandbytes dependency to v0.46.0 by @pramodk :: PR: #14050
146+
- Update FSDP2 docstring by @BoxiangW :: PR: #14105
147+
- Interface to enable fsdp-double-buffer without enabling NCCL-UB by @youngeunkwon0405 :: PR: #14076
148+
- SpeechLM2 SALM: load ckpt faster, with less GPU memory by @pzelasko :: PR: #14113
149+
- Add object_storage_cache_path to PreTrainingDataModule by @shunjiad :: PR: #14103
150+
- Update changelog for `r2.3.0` by @github-actions[bot] :: PR: #14160
151+
- Fix FLUX test with correct env var by @suiyoubi :: PR: #14149
152+
- add mmap_bin_files param by @dimapihtar :: PR: #14122
153+
- Add option to suppress import checks in `Dockerfile.speech` by @artbataev :: PR: #14185
154+
- Safely import optional python packages by @roclark :: PR: #13936
155+
- Set flux test as optional by @chtruong814 :: PR: #14190
156+
- Revert "Safely import optional python packages (#13936)" by @chtruong814 :: PR: #14197
157+
- Fix "Safely import optional python packages (#13936)" by @chtruong814 :: PR: #14198
158+
- Add fix for evo2 generate/inference by @jwilber :: PR: #14027
159+
- Fixing file path suffix by @gautham-kollu :: PR: #14179
160+
- Update AVLM finetune example for vanilla fine-tuning by @huvunvidia :: PR: #14232
161+
- [finetune] Add dataset_kwargs to prepare packed sequence data by @jiajunly :: PR: #14169
162+
- Allow exception in hf ckpt load attempt before fallback to standard l… by @trvachov :: PR: #14214
163+
- Load master weights from checkpoint by @kunlunl :: PR: #14072
164+
- Add deploy lora adapter portion by @ruchaa-apte :: PR: #14255
165+
- fix speechlm lhotse loading nemo_tarred by @stevehuang52 :: PR: #14314
166+
- Update changelog for `r2.4.0` by @github-actions[bot] :: PR: #14334
167+
- Flaky test timing out: @pytest.mark.pleasefixme by @pablo-garay :: PR: #14351
168+
- Support dump perf recipe diff from base recipe by @guyueh1 :: PR: #14206
169+
- Bugfix degenerate bases evo2 dataset by @jstjohn :: PR: #14359
170+
- Hyena support for flash decode API by @jstjohn :: PR: #14315
171+
- Fix Gemma2/3 & Llava (Next) & Llama4 conversion issue with latest transformers by @suiyoubi :: PR: #14367
172+
- fix: reduce the excessive test time of test_msdd_diar_inference by @tango4j :: PR: #14366
173+
- SpeechLM2: S2S->S2T data reader, excessive padding fixes by @pzelasko :: PR: #14124
174+
- chore: Release 2.5.0rc0 by @ko3n1g :: PR: #14389
175+
- Add pyxis flag for container writable. by @sudostock :: PR: #14395
176+
- [MoE] Partial Cudagraph support for MoE by @gdengk :: PR: #14362
177+
- Revert "[MoE] Partial Cudagraph support for MoE (#14362)" by @chtruong814 :: PR: #14402
178+
- Update AVLM recipes for NeMo-CI runs by @huvunvidia :: PR: #14397
179+
- Remove nemo1 multimodal and vision by @yaoyu-33 :: PR: #14095
180+
- Fix LazyNeMoIterator supervision for multi-channel cuts by @anteju :: PR: #14409
181+
- Bump Mcore to 7f7439f by @chtruong814 :: PR: #14373
182+
- Use cuhyena rearrange when available. by @moradza :: PR: #14383
183+
- Fix model training/eval state after PTL validation loop by @paul-gibbons :: PR: #14152
184+
- Add deprecation notice to eval code by @athitten :: PR: #14316
185+
- Streaming Sortformer release PR04: Adding functional tests for streaming sortformer by @tango4j :: PR: #14435
186+
- QWEN2.5-VL 7B Performance Recipe by @tomlifu :: PR: #14401
187+
- Discount FLOPs in dot-product att by @erhoo82 :: PR: #14424
188+
- Bump to pytorch 25.06 and newer TE commit by @chtruong814 :: PR: #14423
189+
- Enable precision aware optimizer for dsv3 by @guyueh1 :: PR: #14444
190+
- Make VBoost activation conditional by @bdubauski :: PR: #14458
191+
- cuHyena FFTConv support for Hyena Long Implicit (LI) Layer by @farhadrgh :: PR: #14396
192+
- Alit/nano v2 by @JRD971000 :: PR: #14464
193+
- Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8 by @guyueh1 :: PR: #14445
194+
- Fix loss mask for chat datasets by @cuichenx :: PR: #14369
195+
- Rename to subquadratic_ops by @farhadrgh :: PR: #14486
196+
- Allows using other signals (than SIGTERM) with PreemptionPlugin by @zachmoshe :: PR: #14248
197+
- Qwen2.5-VL 32B Performance Recipe by @tomlifu :: PR: #14485
198+
- Alit/nanov2 12b by @JRD971000 :: PR: #14483
199+
- Freeze tags in in `r2.5.0` by @github-actions[bot] :: PR: #14513
200+
- deprecate t0 by @dimapihtar :: PR: #14599
201+
- Cherry pick `Use hugginface_hub for downloading the FLUX checkpoint (14638)` into `r2.5.0` by @chtruong814 :: PR: #14640
202+
- Cherry pick `Fix function calling notebook (14643)` into `r2.5.0` by @chtruong814 :: PR: #14650
203+
- Cherry pick `remove service launch scripts (14647)` into `r2.5.0` by @chtruong814 :: PR: #14648
204+
- Cherry pick `Delete tutorials/llm/llama/biomedical-qa directory (14653)` into `r2.5.0` by @chtruong814 :: PR: #14654
205+
- Cherry pick `Remove PEFT scheme condition from recipe (14661)` into `r2.5.0` by @chtruong814 :: PR: #14662
206+
- Cherry pick `fixing kernel restarting when transcribing (14665)` into `r2.5.0` by @chtruong814 :: PR: #14672
207+
- Delete nemo 1 notebooks by @cuichenx :: PR: #14675
208+
- Cherry pick `Fixing Sortformer training tutorial notebook (14680)` into `r2.5.0` by @chtruong814 :: PR: #14681
209+
- Cherry-pick `Update get_tensor_shapes function whose signature was refactored` (14594) into `r2.5.0` by @chtruong814 :: PR: #14678
210+
- Cherry pick `Skip trt-llm and vllm install in install test (14663)` into `r2.5.0` by @chtruong814 :: PR: #14697
211+
- Cherry pick `Fix for \EncDecRNNTBPEModel transcribe() failed with TypeError\ (14698)` into `r2.5.0` by @chtruong814 :: PR: #14709
212+
- Cherry pick `Fix broken link in Reasoning-SFT.ipynb (14716)` into `r2.5.0` by @chtruong814 :: PR: #14717
213+
- cherry-pick add load-in-4bit param (14636) into r2.5.0 by @dimapihtar :: PR: #14719
214+
- Cherry pick `Fix deepseek export dtype (14307)` into `r2.5.0` by @chtruong814 :: PR: #14682
215+
- Cherry pick `remove env var (14739)` into `r2.5.0` by @chtruong814 :: PR: #14746
216+
- Cherry-pick 'Bump modelopt to 0.35.0 and remove `safe_import("modelopt")` in llm collection (#14656)' into 'r2.5.0' by @chtruong814 :: PR: #14771
217+
- Cherry pick `Update prune-distill notebooks to Qwen3 + simplify + mmlu eval (14785)` into `r2.5.0` by @chtruong814 :: PR: #14789
218+
- Cherry pick `Remove export-deploy, automodel, and eval tutorials (14790)` into `r2.5.0` by @chtruong814 :: PR: #14792
219+
- Cherry pick `ci: Automodel deprecation warning (14787)` into `r2.5.0` by @chtruong814 :: PR: #14791
220+
221+
</details>
222+
4223
## NVIDIA Neural Modules 2.4.1
5224

6225
### Detailed Changelogs:

0 commit comments

Comments
 (0)