Release v0.6.3.post2+rocm · ROCm/vllm

What's Changed

fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
Sccache removal from Dockerfile.rocm by @omirosh in #253
Update Dockerfile.rocm by @shajrawi in #254
Using the correct type hints by @gshtras in #256
Revert "Update Dockerfile.rocm" by @gshtras in #257
Creating ROCm whl upon release by @gshtras in #259

Full Changelog: v0.6.3.post1+rocm...v0.6.3.post2+rocm

What's Changed

Miscellaneous cosmetic changes by @mawong-amd in #166
V5.5 upstream merge rc by @gshtras in #167
fnuz support for fbgemm fp8 by @gshtras in #169
Fixing mypy after a rushed merge by @gshtras in #171
[fix] moe padding for reading correct tuned config by @divakar-amd in #172
Upstream merge 24/9/9 by @gshtras in #174
Restoring deleted .buildkite/test-template.j2 by @Alexei-V-Ivanov-AMD in #177
Support commandr on ROCm by @shajrawi in #180
Correct type hint by @gshtras in #173
update custom PA kernel with support for fp8 kv cache dtype by @sanyalington in #87
Support Grok-1 by @kkHuang-amd in #181
Adding MLPerf optimization to 0.6.0 by @charlifu in #182
6.2 dockerfile by @gshtras in #176
[Grok1] fix the name of input scale factor for autofp8 run by @kkHuang-amd in #183
[Grok-1] fix the run-time error "Can't pickle <class 'transformers_mo… by @kkHuang-amd in #184
Upstream merge 24/09/16 by @gshtras in #187
Perf improvement: remove redundant torch slice; Match decode PA partition size to csrc by @sanyalington in #188
refactor dbrx experts to use FusedMoe layer by @divakar-amd in #186
Disable moe padding by default and enable fp8 padding by default. by @charlifu in #190
Enabling Splitting HW by Buildkite Agents by @Alexei-V-Ivanov-AMD in #191
Revert "remove redundant slice; match decode PA partition size with csrc (#188)" by @gshtras in #194
[Grok-1] 1. upload moe configuration file for moe kernel optimization… by @kkHuang-amd in #193
Removing the original text in reminder_comment.yml by @Alexei-V-Ivanov-AMD in #195
Fix PA custom and PA v2 tests and partition sizes by @mawong-amd in #196
Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
Upstream merge 24 9 23 by @gshtras in #203
Gating n=0 case from skinny gemm by @gshtras in #204
Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
add block_manager_v2.py into setup_cython by @sanyalington in #210
extend moe padding to DUMMY weights by @divakar-amd in #211
[Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
RPD Profiling by @dllehr-amd in #208
Cythonize vllm build by @maleksan85 in #214
Fix Dockerfile.rocm by @gshtras in #215
fix dbrx weight loader by @divakar-amd in #212
Upstream merge 24 09 27 0.6.2 by @gshtras in #213
Make rpdtracer import only when required by @Rohan138 in #216
Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
llama3.2 + cross attn test by @maleksan85 in #220
Optimize CAR for ROCm by @iotamudelta in #225
Custom PA perf improvements by @sanyalington in #222
Upstream merge 24 10 08 by @gshtras in #226
customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
added timeout for vllm build in rocm by @maleksan85 in #230
Add fp8 for dbrx by @charlifu in #231
Update Buildkite env variable by @dhonnappa-amd in #232
cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
[Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
prefix-enabled FA perf issue by @seungrokj in #239
Custom PA Partition size 256 to improve performance by @sanyalington in #238
[Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
[BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241
Upstream merge 24 10 21 by @gshtras in #240
Using the correct datatype on prefix prefill for fp8 kv cache by @gshtras in #242
Update CMakeLists.txt by @gshtras in #244
update block_manager usage in setup_cython by @saienduri in #243
[Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case by @rasmith in #237
Add fp8 support for llama model family on Navi4x by @qli88 in #245
Custom all reduce fix mi250 by @omirosh in #247
Upstream merge 24 10 28 by @gshtras in #248
fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
Sccache removal from Dockerfile.rocm by @omirosh in #253
Update Dockerfile.rocm by @shajrawi in #254
Using the correct type hints by @gshtras in #256
Revert "Update Dockerfile.rocm" by @gshtras in #257
Creating ROCm whl upon release by @gshtras in #259

New Contributors

@kkHuang-amd made their first contribution in #181
@Rohan138 made their first contribution in #216
@AdrianAbeyta made their first contribution in #218
@dhonnappa-amd made their first contribution in #232
@seungrokj made their first contribution in #236
@tjtanaa made their first contribution in #234
@saienduri made their first contribution in #243
@qli88 made their first contribution in #245
@omirosh made their first contribution in #247

Full Changelog: v0.4.3_rocm...v0.6.3.post2+rocm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.3.post2+rocm

What's Changed

What's Changed

New Contributors

Contributors