v0.6.3.post2+rocm
github-actions
released this
01 Nov 23:03
·
92 commits
to main
since this release
What's Changed
- fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
- Sccache removal from Dockerfile.rocm by @omirosh in #253
- Update Dockerfile.rocm by @shajrawi in #254
- Using the correct type hints by @gshtras in #256
- Revert "Update Dockerfile.rocm" by @gshtras in #257
- Creating ROCm whl upon release by @gshtras in #259
Full Changelog: v0.6.3.post1+rocm...v0.6.3.post2+rocm
What's Changed
- Miscellaneous cosmetic changes by @mawong-amd in #166
- V5.5 upstream merge rc by @gshtras in #167
- fnuz support for fbgemm fp8 by @gshtras in #169
- Fixing mypy after a rushed merge by @gshtras in #171
- [fix] moe padding for reading correct tuned config by @divakar-amd in #172
- Upstream merge 24/9/9 by @gshtras in #174
- Restoring deleted .buildkite/test-template.j2 by @Alexei-V-Ivanov-AMD in #177
- Support commandr on ROCm by @shajrawi in #180
- Correct type hint by @gshtras in #173
- update custom PA kernel with support for fp8 kv cache dtype by @sanyalington in #87
- Support Grok-1 by @kkHuang-amd in #181
- Adding MLPerf optimization to 0.6.0 by @charlifu in #182
- 6.2 dockerfile by @gshtras in #176
- [Grok1] fix the name of input scale factor for autofp8 run by @kkHuang-amd in #183
- [Grok-1] fix the run-time error "Can't pickle <class 'transformers_mo… by @kkHuang-amd in #184
- Upstream merge 24/09/16 by @gshtras in #187
- Perf improvement: remove redundant torch slice; Match decode PA partition size to csrc by @sanyalington in #188
- refactor dbrx experts to use FusedMoe layer by @divakar-amd in #186
- Disable moe padding by default and enable fp8 padding by default. by @charlifu in #190
- Enabling Splitting HW by Buildkite Agents by @Alexei-V-Ivanov-AMD in #191
- Revert "remove redundant slice; match decode PA partition size with csrc (#188)" by @gshtras in #194
- [Grok-1] 1. upload moe configuration file for moe kernel optimization… by @kkHuang-amd in #193
- Removing the original text in reminder_comment.yml by @Alexei-V-Ivanov-AMD in #195
- Fix PA custom and PA v2 tests and partition sizes by @mawong-amd in #196
- Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
- Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
- remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
- Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
- Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
- Upstream merge 24 9 23 by @gshtras in #203
- Gating n=0 case from skinny gemm by @gshtras in #204
- Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
- re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
- add block_manager_v2.py into setup_cython by @sanyalington in #210
- extend moe padding to DUMMY weights by @divakar-amd in #211
- [Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
- RPD Profiling by @dllehr-amd in #208
- Cythonize vllm build by @maleksan85 in #214
- Fix Dockerfile.rocm by @gshtras in #215
- fix dbrx weight loader by @divakar-amd in #212
- Upstream merge 24 09 27 0.6.2 by @gshtras in #213
- Make rpdtracer import only when required by @Rohan138 in #216
- Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
- Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
- llama3.2 + cross attn test by @maleksan85 in #220
- Optimize CAR for ROCm by @iotamudelta in #225
- Custom PA perf improvements by @sanyalington in #222
- Upstream merge 24 10 08 by @gshtras in #226
- customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
- added timeout for vllm build in rocm by @maleksan85 in #230
- Add fp8 for dbrx by @charlifu in #231
- Update Buildkite env variable by @dhonnappa-amd in #232
- cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
- [Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
- prefix-enabled FA perf issue by @seungrokj in #239
- Custom PA Partition size 256 to improve performance by @sanyalington in #238
- [Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
- [BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241
- Upstream merge 24 10 21 by @gshtras in #240
- Using the correct datatype on prefix prefill for fp8 kv cache by @gshtras in #242
- Update CMakeLists.txt by @gshtras in #244
- update block_manager usage in setup_cython by @saienduri in #243
- [Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case by @rasmith in #237
- Add fp8 support for llama model family on Navi4x by @qli88 in #245
- Custom all reduce fix mi250 by @omirosh in #247
- Upstream merge 24 10 28 by @gshtras in #248
- fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
- Sccache removal from Dockerfile.rocm by @omirosh in #253
- Update Dockerfile.rocm by @shajrawi in #254
- Using the correct type hints by @gshtras in #256
- Revert "Update Dockerfile.rocm" by @gshtras in #257
- Creating ROCm whl upon release by @gshtras in #259
New Contributors
- @kkHuang-amd made their first contribution in #181
- @Rohan138 made their first contribution in #216
- @AdrianAbeyta made their first contribution in #218
- @dhonnappa-amd made their first contribution in #232
- @seungrokj made their first contribution in #236
- @tjtanaa made their first contribution in #234
- @saienduri made their first contribution in #243
- @qli88 made their first contribution in #245
- @omirosh made their first contribution in #247
Full Changelog: v0.4.3_rocm...v0.6.3.post2+rocm