|
1 | 1 | # Changelog |
2 | 2 |
|
3 | 3 | <!-- Next changelog --> |
| 4 | +## NVIDIA Neural Modules 2.5.0 |
| 5 | + |
| 6 | +### Highlights |
| 7 | + |
| 8 | +- Collections: |
| 9 | + - LLM |
| 10 | + - Nano v2 12B and 9B |
| 11 | + - Speech |
| 12 | + - New SpeechLM2 collection |
| 13 | + - Streaming Softformer model |
| 14 | + - Deprecate Confidence Ensemble models |
| 15 | + - parakeet-tdt-0.6b-v3 and canary-1b-v2 models |
| 16 | + - Added chunk inference support with .transcribe() for canary based models |
| 17 | + - Enable prediction of timestamps with streaming ASR |
| 18 | + - Improve ASR models’ invariance to padding/batch size |
| 19 | + - Qwen prompt format support, SALM generation fixes |
| 20 | + - High-level SALM model.generate API closely resembling HF models |
| 21 | + - SALM model initialization with time/memory optimization |
| 22 | + - SpeechLM2: fixed excessive padding, support on-the-fly resampling for SALM |
| 23 | + |
| 24 | +- Automodel and Export-Deploy functionality are available in their individual repositories respectively and deprecated in NeMo2 |
| 25 | + |
| 26 | +### Detailed Changelogs: |
| 27 | + |
| 28 | +#### ASR |
| 29 | + |
| 30 | +<details><summary>Changelog</summary> |
| 31 | + |
| 32 | +- Modernize logger interface by @emmanuel-ferdman :: PR: #13783 |
| 33 | +- Higher-level API for SALM.generate by @pzelasko :: PR: #14034 |
| 34 | +- add/refactor docs for asr lm customization by @lilithgrigoryan :: PR: #14088 |
| 35 | +- Improve NEST GPU Utilization 1/N by @MahmoudAshraf97 :: PR: #14086 |
| 36 | +- Improve ASR models' invariance to padding/batch size by @pzelasko :: PR: #13827 |
| 37 | +- Clean up transducer decoding initialization by @artbataev :: PR: #14112 |
| 38 | +- Improve NEST GPU Utilization 2/N by @MahmoudAshraf97 :: PR: #14089 |
| 39 | +- GPU-accelerated Phrase-Boosting (GPU-PB) for AED decoding by @andrusenkoau :: PR: #14108 |
| 40 | +- Fix decoding with ngpu-lm when training (#13994) by @hoangtran9122 :: PR: #13995 |
| 41 | +- fix eval_beamsearch_ngram_ctc script by @lilithgrigoryan :: PR: #14238 |
| 42 | +- fix wrong typing for ctc-ws context graph by @andrusenkoau :: PR: #14262 |
| 43 | +- fix frame vad by @stevehuang52 :: PR: #14337 |
| 44 | +- Improve NEST GPU Utilization 3/N by @MahmoudAshraf97 :: PR: #14234 |
| 45 | +- remove confidence ensemble models by @lilithgrigoryan :: PR: #14343 |
| 46 | +- Fix ASR decoding issues with CUDA graphs in training by @artbataev :: PR: #14184 |
| 47 | +- Streaming Sortformer release PR01: uploading bugfixes, refactored variables and yaml file name changes by @tango4j :: PR: #14416 |
| 48 | +- Streaming Sortformer release PR02: unit tests for streaming models and modules by @tango4j :: PR: #14417 |
| 49 | +- GPU-accelerated Phrase-Boosting (GPU-PB) for CTC, RNN-T, and TDT decoding by @andrusenkoau :: PR: #14277 |
| 50 | +- Fix subsampling chunking test by @monica-sekoyan :: PR: #14452 |
| 51 | +- Canary2 with NFA by @monica-sekoyan :: PR: #14121 |
| 52 | +- Initial Chunking by @nune-tadevosyan :: PR: #14321 |
| 53 | +- Chunking fix by @nune-tadevosyan :: PR: #14482 |
| 54 | +- Tutorial and doc update by @nune-tadevosyan :: PR: #14484 |
| 55 | +- Streaming Sortformer release PR03: NeMo documentations and tutorial notebook by @tango4j :: PR: #14388 |
| 56 | +- Add wget_from_nemo by @nune-tadevosyan :: PR: #14623 |
| 57 | +- Downgrade "datasets" library version in ASR tutorial to ensure compatibility with HF Datasets used by @KunalDhawan :: PR: #14685 |
| 58 | +- Canary tutorial fix by @nune-tadevosyan :: PR: #14708 |
| 59 | +- Force activations and weights cast to FP32 Jasper Encoder Squeeze-Excite by @erastorgueva-nv :: PR: #14715 |
| 60 | + |
| 61 | +</details> |
| 62 | + |
| 63 | +#### TTS |
| 64 | + |
| 65 | +<details><summary>Changelog</summary> |
| 66 | + |
| 67 | +- Improve ASR models' invariance to padding/batch size by @pzelasko :: PR: #13827 |
| 68 | +- remove nlp modules by @dimapihtar :: PR: #14127 |
| 69 | +- Temporarily Remove Encoder PP Support by @yaoyu-33 :: PR: #14167 |
| 70 | +- Remove T5-TTS by @blisc :: PR: #14252 |
| 71 | + |
| 72 | +</details> |
| 73 | + |
| 74 | +#### NLP / NMT |
| 75 | + |
| 76 | +<details><summary>Changelog</summary> |
| 77 | + |
| 78 | +- add extra params for MegatronDataSampler by @dimapihtar :: PR: #13956 |
| 79 | +- Modernize logger interface by @emmanuel-ferdman :: PR: #13783 |
| 80 | +- remove dialogue collection by @dimapihtar :: PR: #14087 |
| 81 | +- remove QA collection by @dimapihtar :: PR: #14092 |
| 82 | +- remove text nlp collection by @dimapihtar :: PR: #14110 |
| 83 | +- remove nlp modules by @dimapihtar :: PR: #14127 |
| 84 | +- remove rag collection by @dimapihtar :: PR: #14157 |
| 85 | +- remove nmt collection by @dimapihtar :: PR: #14191 |
| 86 | +- Fix importerror in transformer_lm_model after nlp module removals by @chtruong814 :: PR: #14199 |
| 87 | +- fix QA comments NVBug by @huvunvidia :: PR: #14196 |
| 88 | +- Temporarily Remove Encoder PP Support by @yaoyu-33 :: PR: #14167 |
| 89 | +- remove mixins collections by @dimapihtar :: PR: #14281 |
| 90 | +- feat: print expert groups on megatron init by @clumsy :: PR: #13874 |
| 91 | +- [speechlm2] [lhotse] sharegpt data and testloader by @huckiyang :: PR: #14294 |
| 92 | +- Add notebook for LoRA on GPT-OSS-20B by @shashank3959 :: PR: #14439 |
| 93 | +- Sketch dist-ckpt content versioning by @mikolajblaz :: PR: #13839 |
| 94 | +- Change to enable full iteration CUDA graph for LLMs by @vasunvidia :: PR: #14077 |
| 95 | + |
| 96 | +</details> |
| 97 | + |
| 98 | +#### Text Normalization / Inverse Text Normalization |
| 99 | + |
| 100 | +<details><summary>Changelog</summary> |
| 101 | + |
| 102 | +- Check lightning and core imports in install test by @chtruong814 :: PR: #14403 |
| 103 | + |
| 104 | +</details> |
| 105 | + |
| 106 | +#### Export |
| 107 | + |
| 108 | +<details><summary>Changelog</summary> |
| 109 | + |
| 110 | +- ci: Set L2_NeMo_2_Export_Deploy_Query_In_Framework to be optional by @chtruong814 :: PR: #13946 |
| 111 | +- Remove old export doc by @oyilmaz-nvidia :: PR: #14292 |
| 112 | +- Llama4 Export: Remove outdated MLP weight transform by @suiyoubi :: PR: #14297 |
| 113 | +- Update mllama hf import/export for transformers 4.53 by @meatybobby :: PR: #14327 |
| 114 | + |
| 115 | +</details> |
| 116 | + |
| 117 | +#### Bugfixes |
| 118 | + |
| 119 | +<details><summary>Changelog</summary> |
| 120 | + |
| 121 | +- Bugfix for Hyena to the get_t function which comes up when doing longer context inference by @jstjohn :: PR: #14256 |
| 122 | +- fix skipped cuHyena kernel while training by @farhadrgh :: PR: #14365 |
| 123 | +- Remove flaky Evo2 dataset performance test by @jstjohn :: PR: #14371 |
| 124 | +- Use module prefix in restore_modelopt_state by @jenchen13 :: PR: #14384 |
| 125 | + |
| 126 | +</details> |
| 127 | + |
| 128 | +#### Uncategorized: |
| 129 | + |
| 130 | +<details><summary>Changelog</summary> |
| 131 | + |
| 132 | +- Version bump to `2.5.0rc0.dev0` by @github-actions[bot] :: PR: #13944 |
| 133 | +- [Llama4] Enable tp comm overlap for llama4 by @gdengk :: PR: #13940 |
| 134 | +- Fix for Squad Dataset Download by @rhmukundan :: PR: #13893 |
| 135 | +- add nmh HF conversion by @JRD971000 :: PR: #13941 |
| 136 | +- Speechlm2 SALM improvements by @pzelasko :: PR: #13829 |
| 137 | +- fix dataset issue by @dimapihtar :: PR: #13953 |
| 138 | +- Editing MMLU to pull from the correct repo by @ruchaa-apte :: PR: #13991 |
| 139 | +- move classes to module to use __target__ feature (#14023) by @nithinraok :: PR: #14031 |
| 140 | +- Add Nemotron-H prompt format, fix cut-to-conversation custom attr propagation by @pzelasko :: PR: #13963 |
| 141 | +- Bump release_library template to v0.40.0 by @chtruong814 :: PR: #14046 |
| 142 | +- [automodel] add support for layer-freezing by @akoumpa :: PR: #14000 |
| 143 | +- [Qwen3] Recipe config bug fix by @gdengk :: PR: #14084 |
| 144 | +- Add TE import guard in qwen2vl vision module by @chtruong814 :: PR: #14091 |
| 145 | +- Update bitsandbytes dependency to v0.46.0 by @pramodk :: PR: #14050 |
| 146 | +- Update FSDP2 docstring by @BoxiangW :: PR: #14105 |
| 147 | +- Interface to enable fsdp-double-buffer without enabling NCCL-UB by @youngeunkwon0405 :: PR: #14076 |
| 148 | +- SpeechLM2 SALM: load ckpt faster, with less GPU memory by @pzelasko :: PR: #14113 |
| 149 | +- Add object_storage_cache_path to PreTrainingDataModule by @shunjiad :: PR: #14103 |
| 150 | +- Update changelog for `r2.3.0` by @github-actions[bot] :: PR: #14160 |
| 151 | +- Fix FLUX test with correct env var by @suiyoubi :: PR: #14149 |
| 152 | +- add mmap_bin_files param by @dimapihtar :: PR: #14122 |
| 153 | +- Add option to suppress import checks in `Dockerfile.speech` by @artbataev :: PR: #14185 |
| 154 | +- Safely import optional python packages by @roclark :: PR: #13936 |
| 155 | +- Set flux test as optional by @chtruong814 :: PR: #14190 |
| 156 | +- Revert "Safely import optional python packages (#13936)" by @chtruong814 :: PR: #14197 |
| 157 | +- Fix "Safely import optional python packages (#13936)" by @chtruong814 :: PR: #14198 |
| 158 | +- Add fix for evo2 generate/inference by @jwilber :: PR: #14027 |
| 159 | +- Fixing file path suffix by @gautham-kollu :: PR: #14179 |
| 160 | +- Update AVLM finetune example for vanilla fine-tuning by @huvunvidia :: PR: #14232 |
| 161 | +- [finetune] Add dataset_kwargs to prepare packed sequence data by @jiajunly :: PR: #14169 |
| 162 | +- Allow exception in hf ckpt load attempt before fallback to standard l… by @trvachov :: PR: #14214 |
| 163 | +- Load master weights from checkpoint by @kunlunl :: PR: #14072 |
| 164 | +- Add deploy lora adapter portion by @ruchaa-apte :: PR: #14255 |
| 165 | +- fix speechlm lhotse loading nemo_tarred by @stevehuang52 :: PR: #14314 |
| 166 | +- Update changelog for `r2.4.0` by @github-actions[bot] :: PR: #14334 |
| 167 | +- Flaky test timing out: @pytest.mark.pleasefixme by @pablo-garay :: PR: #14351 |
| 168 | +- Support dump perf recipe diff from base recipe by @guyueh1 :: PR: #14206 |
| 169 | +- Bugfix degenerate bases evo2 dataset by @jstjohn :: PR: #14359 |
| 170 | +- Hyena support for flash decode API by @jstjohn :: PR: #14315 |
| 171 | +- Fix Gemma2/3 & Llava (Next) & Llama4 conversion issue with latest transformers by @suiyoubi :: PR: #14367 |
| 172 | +- fix: reduce the excessive test time of test_msdd_diar_inference by @tango4j :: PR: #14366 |
| 173 | +- SpeechLM2: S2S->S2T data reader, excessive padding fixes by @pzelasko :: PR: #14124 |
| 174 | +- chore: Release 2.5.0rc0 by @ko3n1g :: PR: #14389 |
| 175 | +- Add pyxis flag for container writable. by @sudostock :: PR: #14395 |
| 176 | +- [MoE] Partial Cudagraph support for MoE by @gdengk :: PR: #14362 |
| 177 | +- Revert "[MoE] Partial Cudagraph support for MoE (#14362)" by @chtruong814 :: PR: #14402 |
| 178 | +- Update AVLM recipes for NeMo-CI runs by @huvunvidia :: PR: #14397 |
| 179 | +- Remove nemo1 multimodal and vision by @yaoyu-33 :: PR: #14095 |
| 180 | +- Fix LazyNeMoIterator supervision for multi-channel cuts by @anteju :: PR: #14409 |
| 181 | +- Bump Mcore to 7f7439f by @chtruong814 :: PR: #14373 |
| 182 | +- Use cuhyena rearrange when available. by @moradza :: PR: #14383 |
| 183 | +- Fix model training/eval state after PTL validation loop by @paul-gibbons :: PR: #14152 |
| 184 | +- Add deprecation notice to eval code by @athitten :: PR: #14316 |
| 185 | +- Streaming Sortformer release PR04: Adding functional tests for streaming sortformer by @tango4j :: PR: #14435 |
| 186 | +- QWEN2.5-VL 7B Performance Recipe by @tomlifu :: PR: #14401 |
| 187 | +- Discount FLOPs in dot-product att by @erhoo82 :: PR: #14424 |
| 188 | +- Bump to pytorch 25.06 and newer TE commit by @chtruong814 :: PR: #14423 |
| 189 | +- Enable precision aware optimizer for dsv3 by @guyueh1 :: PR: #14444 |
| 190 | +- Make VBoost activation conditional by @bdubauski :: PR: #14458 |
| 191 | +- cuHyena FFTConv support for Hyena Long Implicit (LI) Layer by @farhadrgh :: PR: #14396 |
| 192 | +- Alit/nano v2 by @JRD971000 :: PR: #14464 |
| 193 | +- Fix reuse_grad_buf_for_mxfp8_param_ag for mxfp8 by @guyueh1 :: PR: #14445 |
| 194 | +- Fix loss mask for chat datasets by @cuichenx :: PR: #14369 |
| 195 | +- Rename to subquadratic_ops by @farhadrgh :: PR: #14486 |
| 196 | +- Allows using other signals (than SIGTERM) with PreemptionPlugin by @zachmoshe :: PR: #14248 |
| 197 | +- Qwen2.5-VL 32B Performance Recipe by @tomlifu :: PR: #14485 |
| 198 | +- Alit/nanov2 12b by @JRD971000 :: PR: #14483 |
| 199 | +- Freeze tags in in `r2.5.0` by @github-actions[bot] :: PR: #14513 |
| 200 | +- deprecate t0 by @dimapihtar :: PR: #14599 |
| 201 | +- Cherry pick `Use hugginface_hub for downloading the FLUX checkpoint (14638)` into `r2.5.0` by @chtruong814 :: PR: #14640 |
| 202 | +- Cherry pick `Fix function calling notebook (14643)` into `r2.5.0` by @chtruong814 :: PR: #14650 |
| 203 | +- Cherry pick `remove service launch scripts (14647)` into `r2.5.0` by @chtruong814 :: PR: #14648 |
| 204 | +- Cherry pick `Delete tutorials/llm/llama/biomedical-qa directory (14653)` into `r2.5.0` by @chtruong814 :: PR: #14654 |
| 205 | +- Cherry pick `Remove PEFT scheme condition from recipe (14661)` into `r2.5.0` by @chtruong814 :: PR: #14662 |
| 206 | +- Cherry pick `fixing kernel restarting when transcribing (14665)` into `r2.5.0` by @chtruong814 :: PR: #14672 |
| 207 | +- Delete nemo 1 notebooks by @cuichenx :: PR: #14675 |
| 208 | +- Cherry pick `Fixing Sortformer training tutorial notebook (14680)` into `r2.5.0` by @chtruong814 :: PR: #14681 |
| 209 | +- Cherry-pick `Update get_tensor_shapes function whose signature was refactored` (14594) into `r2.5.0` by @chtruong814 :: PR: #14678 |
| 210 | +- Cherry pick `Skip trt-llm and vllm install in install test (14663)` into `r2.5.0` by @chtruong814 :: PR: #14697 |
| 211 | +- Cherry pick `Fix for \EncDecRNNTBPEModel transcribe() failed with TypeError\ (14698)` into `r2.5.0` by @chtruong814 :: PR: #14709 |
| 212 | +- Cherry pick `Fix broken link in Reasoning-SFT.ipynb (14716)` into `r2.5.0` by @chtruong814 :: PR: #14717 |
| 213 | +- cherry-pick add load-in-4bit param (14636) into r2.5.0 by @dimapihtar :: PR: #14719 |
| 214 | +- Cherry pick `Fix deepseek export dtype (14307)` into `r2.5.0` by @chtruong814 :: PR: #14682 |
| 215 | +- Cherry pick `remove env var (14739)` into `r2.5.0` by @chtruong814 :: PR: #14746 |
| 216 | +- Cherry-pick 'Bump modelopt to 0.35.0 and remove `safe_import("modelopt")` in llm collection (#14656)' into 'r2.5.0' by @chtruong814 :: PR: #14771 |
| 217 | +- Cherry pick `Update prune-distill notebooks to Qwen3 + simplify + mmlu eval (14785)` into `r2.5.0` by @chtruong814 :: PR: #14789 |
| 218 | +- Cherry pick `Remove export-deploy, automodel, and eval tutorials (14790)` into `r2.5.0` by @chtruong814 :: PR: #14792 |
| 219 | +- Cherry pick `ci: Automodel deprecation warning (14787)` into `r2.5.0` by @chtruong814 :: PR: #14791 |
| 220 | + |
| 221 | +</details> |
| 222 | + |
4 | 223 | ## NVIDIA Neural Modules 2.4.1 |
5 | 224 |
|
6 | 225 | ### Detailed Changelogs: |
|
0 commit comments