Releases · ROCm/vllm

13 Feb 00:07

github-actions

v0.7.2+rocm

cbbbecb

v0.7.2+rocm Latest

Latest

What's Changed

20250127 docs update by @arakowsk-amd in #392
Faster Custom Paged Attention kernels by @sanyalington in #372
Improved memory profiling by @gshtras in #394
Aiter readme by @gshtras in #400
fix None dict for quark by @hliuca in #402
Upstream merge 25 02 03 by @gshtras in #403
Mbatch p3l by @Alexei-V-Ivanov-AMD in #401
Fix quark fp8 format loading. by @fxmarty-amd in #395
WARP_SIZE in sgl moe kernel by @gshtras in #406
Update README.md 20250205_aiter by @arakowsk-amd in #407
fix rocm get_device name by @divakar-amd in #359
Fixing the output formatting in P3L by @gshtras in #414
Add tuned moe config for qwen1.5_moe_A2.7B by @sky0530 in #398
Update Benchmark Profiling Scripts by @AdrianAbeyta in #417
updating 20250207 image manifiest by @arakowsk-amd in #416
Upstream merge 25 02 10 by @gshtras in #418
Aiter base by @gshtras in #419

New Contributors

@arakowsk-amd made their first contribution in #392
@fxmarty-amd made their first contribution in #395
@sky0530 made their first contribution in #398

Full Changelog: v0.7.0+rocm...v0.7.2+rocm

Contributors

sanyalington, sky0530, and 7 other contributors

Assets 3

28 Jan 19:48

github-actions

v0.7.0+rocm

b2c3b22

v0.7.0+rocm

What's Changed

Deepseek V2 FP8 support by @Concurrensee in #352
Multi-lingual P3L by @Alexei-V-Ivanov-AMD in #356
Upstream merge 25 01 13 by @gshtras in #358
Enable user marker for vllm profiling by @Lzy17 in #357
Deepseek V3 support by @gshtras in #364
Upstream merge 25 01 20 by @gshtras in #368
Using ROCm6.3.1 base docker and building hipblas-common by @gshtras in #366
Update pre-commit.yml by @gshtras in #374
Skip tokenize/detokenize when it is disabled by arg --skip-tokenizer-init by @maleksan85 in #367
FP8 FA fixes by @ilia-cher in #381
Returning the use of the proper stream in allreduce by @gshtras in #382
Pytorch rowwise scaled_mm by @gshtras in #384
Applying scales rename to fp8 config by @gshtras in #387
Dev-docker Documentation Updates by @JArnoldAMD in #378
Support FP8 FA from Quark format by @BowenBao in #388
Upstream merge 25 01 27 by @gshtras in #391

New Contributors

@Lzy17 made their first contribution in #357
@BowenBao made their first contribution in #388

Full Changelog: v0.6.6+rocm...v0.7.0+rocm

Contributors

BowenBao, ilia-cher, and 6 other contributors

Assets 3

08 Jan 22:44

github-actions

v0.6.6+rocm

c040f0e

v0.6.6+rocm

What's Changed

Upstream merge 25 1 6 by @gshtras in #350
Revert nccl changes by @gshtras in #351

Full Changelog: v0.6.4.post1+rocm...v0.6.6+rocm

Contributors

gshtras

Assets 3

08 Jan 15:52

github-actions

v0.6.4.post1+rocm

2053351

v0.6.4.post1+rocm

What's Changed

[BUGFIX] Corrected types for strides in triton FA (#274) by @maleksan85 in #276
[OPT] improve rms_norm kernel by @kkHuang-amd in #258
Cuda compile fix2 by @hliuca in #284
use CK FA for glm-4v on navi3 by @jfactory07 in #281
Disable custom all-reduce on two Navi GPUs by @hyoon1 in #287
Base docker image by @gshtras in #290
Added --output-json parameter in the P3l script. Using arg_utils to support all vllm args by @gshtras in #289
devdocker README from https://github.com/powderluv/vllm-docs by @gshtras in #292
Run clang-format on develop by @gshtras in #296
Fix correctness regression (from PR#258) in Llama-3.2-90B-Vision-Instruct-FP8-KV test by @kkHuang-amd in #294
Upstream merge 24/11/25 and 24/12/2 by @gshtras in #297
Fix type hints for cython by @gshtras in #299
fused_moe configs for MI325X by @JArnoldAMD in #300
enable softcap and gemma2 by @hliuca in #288
[vllm] Add support for FP8 in Triton FA kernel by @ilia-cher in #301
Update test-template.j2 by @dhonnappa-amd in #283
re-tune fp8 mixtral8x22B by @divakar-amd in #304
rm old moe tune file. Add bash script for tuning reference by @divakar-amd in #305
(temp workaround for Triton bug) by @ilia-cher in #306
Always use 64 as the block size of moe_align kernel to avoid lds out of limit by @charlifu in #303
Fix vllm_test_utils install. by @saienduri in #307
Using ROCm6.3 release image as a base by @gshtras in #308
Fix kernel cache miss and add RDNA configs by @hyoon1 in #246
Update README.md by @t-parry in #309
Fix max_seqlens_q/k initialization for Navi GPUs by @hyoon1 in #310
Setting the value for the scpecilative decoding worker class on rocm platform by @gshtras in #313
Upstream merge 24 12 09 by @gshtras in #314
Triton version in the base docker by @gshtras in #315
Navi docker by @gshtras in #316
fix GemmTuner import in gradlib by @Rohan138 in #319
Storing the installed commit hashes and customizations in a file by @gshtras in #320
Option to override PYTORCH_ROCM_ARCH inherited from the base image by @gshtras in #321
Update README.md by @t-parry in #322
Disable auto enabling chunked prefill by @gshtras in #324
Fix logging of the vLLM Config (vllm-project#11143) by @gshtras in #325
Upstream merge 24 12 16 by @gshtras in #330
Fix regression from #246 by @gshtras in #332
Dynamic Scale Factor Calculations for Key/Value Scales With FP8 KV Caching by @micah-wil in #317
Fixed the new condition for fp8 type by @gshtras in #333
Mllama kv scale fix by @gshtras in #335
Using the generic base image created by the vllm-ci pipeline by @gshtras in #336
Properly initializing the new field in the attn metadata by @gshtras in #337
Ingest FP8 attn scales and use them in ROCm FlashAttention by @mawong-amd in #338
Library versions bump by @gshtras in #343
Updated fused_moe configs for MI325X with Triton 3.2 by @JArnoldAMD in #345
deepseek overflow fix by @Concurrensee in #349

New Contributors

@hliuca made their first contribution in #284
@jfactory07 made their first contribution in #281
@hyoon1 made their first contribution in #287
@ilia-cher made their first contribution in #301
@t-parry made their first contribution in #309
@micah-wil made their first contribution in #317
@Concurrensee made their first contribution in #349

Full Changelog: v0.6.4+rocm...v0.6.4.post1+rocm

Contributors

charlifu, jfactory07, and 15 other contributors

Assets 2

19 Nov 20:31

gshtras

v0.6.4+rocm

62334b5

v0.6.4+rocm

What's Changed

Base ROCm 6.2.2 by @gshtras in #260
Upstream merge 24 11 04 by @gshtras in #262
Add gfx1201 to supported ARCH list by @qli88 in #264
[Bugfix] A fix to enable FORCED sampling again. by @Alexei-V-Ivanov-AMD in #265
Eliminated -Wswitch-bool warning and a leftover incorrect import by @gshtras in #266
Navi correctness fix 1 to 300 count by @maleksan85 in #263
Navi 1 to 300 correctness fix follow up by @maleksan85 in #267
Update profiling benchmarks to take in new EngArgs method. by @AdrianAbeyta in #255
Rpd build arg by @gshtras in #269
Build flash attn after torch by @gshtras in #270
Update P3L.py by @gshtras in #271
Upstream merge 24 11 11 by @gshtras in #272
[BUGFIX] Llama3.2 fa crash fix by @maleksan85 in #274
Running linter actions on develop branch by @gshtras in #275
rocm support for moe tuning script by @divakar-amd in #251
mixtral8x22B moe configs mi300 TP=1,2,4,8 by @divakar-amd in #277
Improve the heuristic logic for fp8 weight padding by @charlifu in #279
Gradlib torch extension cmake by @gshtras in #282
Upstream merge 24 11 18 by @gshtras in #286

Full Changelog: v0.6.3.post2+rocm...v0.6.4+rocm

Contributors

charlifu, AdrianAbeyta, and 5 other contributors

Assets 2

01 Nov 23:03

github-actions

v0.6.3.post2+rocm

733f79a

v0.6.3.post2+rocm

What's Changed

fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
Sccache removal from Dockerfile.rocm by @omirosh in #253
Update Dockerfile.rocm by @shajrawi in #254
Using the correct type hints by @gshtras in #256
Revert "Update Dockerfile.rocm" by @gshtras in #257
Creating ROCm whl upon release by @gshtras in #259

Full Changelog: v0.6.3.post1+rocm...v0.6.3.post2+rocm

What's Changed

Miscellaneous cosmetic changes by @mawong-amd in #166
V5.5 upstream merge rc by @gshtras in #167
fnuz support for fbgemm fp8 by @gshtras in #169
Fixing mypy after a rushed merge by @gshtras in #171
[fix] moe padding for reading correct tuned config by @divakar-amd in #172
Upstream merge 24/9/9 by @gshtras in #174
Restoring deleted .buildkite/test-template.j2 by @Alexei-V-Ivanov-AMD in #177
Support commandr on ROCm by @shajrawi in #180
Correct type hint by @gshtras in #173
update custom PA kernel with support for fp8 kv cache dtype by @sanyalington in #87
Support Grok-1 by @kkHuang-amd in #181
Adding MLPerf optimization to 0.6.0 by @charlifu in #182
6.2 dockerfile by @gshtras in #176
[Grok1] fix the name of input scale factor for autofp8 run by @kkHuang-amd in #183
[Grok-1] fix the run-time error "Can't pickle <class 'transformers_mo… by @kkHuang-amd in #184
Upstream merge 24/09/16 by @gshtras in #187
Perf improvement: remove redundant torch slice; Match decode PA partition size to csrc by @sanyalington in #188
refactor dbrx experts to use FusedMoe layer by @divakar-amd in #186
Disable moe padding by default and enable fp8 padding by default. by @charlifu in #190
Enabling Splitting HW by Buildkite Agents by @Alexei-V-Ivanov-AMD in #191
Revert "remove redundant slice; match decode PA partition size with csrc (#188)" by @gshtras in #194
[Grok-1] 1. upload moe configuration file for moe kernel optimization… by @kkHuang-amd in #193
Removing the original text in reminder_comment.yml by @Alexei-V-Ivanov-AMD in #195
Fix PA custom and PA v2 tests and partition sizes by @mawong-amd in #196
Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
Upstream merge 24 9 23 by @gshtras in #203
Gating n=0 case from skinny gemm by @gshtras in #204
Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
add block_manager_v2.py into setup_cython by @sanyalington in #210
extend moe padding to DUMMY weights by @divakar-amd in #211
[Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
RPD Profiling by @dllehr-amd in #208
Cythonize vllm build by @maleksan85 in #214
Fix Dockerfile.rocm by @gshtras in #215
fix dbrx weight loader by @divakar-amd in #212
Upstream merge 24 09 27 0.6.2 by @gshtras in #213
Make rpdtracer import only when required by @Rohan138 in #216
Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
llama3.2 + cross attn test by @maleksan85 in #220
Optimize CAR for ROCm by @iotamudelta in #225
Custom PA perf improvements by @sanyalington in #222
Upstream merge 24 10 08 by @gshtras in #226
customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
added timeout for vllm build in rocm by @maleksan85 in #230
Add fp8 for dbrx by @charlifu in #231
Update Buildkite env variable by @dhonnappa-amd in #232
cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
[Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
prefix-enabled FA perf issue by @seungrokj in #239
Custom PA Partition size 256 to improve performance by @sanyalington in #238
[Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
[BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241
Upstream merge 24 10 21 by @gshtras in #240
Using the correct datatype on prefix prefill for fp8 kv cache by @gshtras in #242
Update CMakeLists.txt by @gshtras in #244
update block_manager usage in setup_cython by @saienduri in #243
[Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case by @rasmith in #237
Add fp8 support for llama model family on Navi4x by @qli88 in #245
Custom all reduce fix mi250 by @omirosh in #247
Upstream merge 24 10 28 by @gshtras in #248
fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 by @divakar-amd in #250
Sccache removal from Dockerfile.rocm by @omirosh in #253
Update Dockerfile.rocm by @shajrawi in #254
Using the correct type hints by @gshtras in #256
Revert "Update Dockerfile.rocm" by @gshtras in #257
Creating ROCm whl upon release by @gshtras in #259

New Contributors

@kkHuang-amd made their first contribution in #181
@Rohan138 made their first contribution in #216
@AdrianAbeyta made their first contribution in #218
@dhonnappa-amd made their first contribution in #232
@seungrokj made their first contribution in #236
@tjtanaa made their first contribution in #234
@saienduri made their first contribution in #243
@qli88 made their first contribution in #245
@omirosh made their first contribution in #247

Full Changelog: v0.4.3_rocm...v0.6.3.post2+rocm

Contributors

rasmith, charlifu, and 19 other contributors

Assets 6

29 Oct 21:12

github-actions

v0.6.3.post1+rocm

7aa6982

v0.6.3.post1+rocm Pre-release

Pre-release

What's Changed

Upstream merge 24 10 21 by @gshtras in #240
Using the correct datatype on prefix prefill for fp8 kv cache by @gshtras in #242
Update CMakeLists.txt by @gshtras in #244
update block_manager usage in setup_cython by @saienduri in #243
[Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case by @rasmith in #237
Add fp8 support for llama model family on Navi4x by @qli88 in #245
Custom all reduce fix mi250 by @omirosh in #247
Upstream merge 24 10 28 by @gshtras in #248

New Contributors

@saienduri made their first contribution in #243
@qli88 made their first contribution in #245
@omirosh made their first contribution in #247

Full Changelog: v0.6.2.post1+rocm...v0.6.3.post1+rocm

Contributors

rasmith, omirosh, and 3 other contributors

Assets 2

23 Oct 00:14

github-actions

v0.6.2.post1+rocm

69d5e1d

v0.6.2.post1+rocm Pre-release

Pre-release

What's Changed

Make rpdtracer import only when required by @Rohan138 in #216
Improve profiling setup and documentation, sync benchmarks with main by @AdrianAbeyta in #218
Installing the requirements before invoking setup.py since it now imports setuptools_scm by @gshtras in #221
llama3.2 + cross attn test by @maleksan85 in #220
Optimize CAR for ROCm by @iotamudelta in #225
Custom PA perf improvements by @sanyalington in #222
Upstream merge 24 10 08 by @gshtras in #226
customPA write fp8 small ctx fix; enable customPA write fp8 by default by @sanyalington in #227
added timeout for vllm build in rocm by @maleksan85 in #230
Add fp8 for dbrx by @charlifu in #231
Update Buildkite env variable by @dhonnappa-amd in #232
cuda graph + num-scheduler-steps bug fix by @seungrokj in #236
[Model] [BUG] Fix code path logic to load mllama model by @tjtanaa in #234
prefix-enabled FA perf issue by @seungrokj in #239
Custom PA Partition size 256 to improve performance by @sanyalington in #238
[Build/CI] Minor changes to fix internal CI process. by @Alexei-V-Ivanov-AMD in #235
[BUGFIX] Restored handling of ROCM FA output as before adaptation of llama3.2 by @maleksan85 in #241

New Contributors

@Rohan138 made their first contribution in #216
@AdrianAbeyta made their first contribution in #218
@dhonnappa-amd made their first contribution in #232
@seungrokj made their first contribution in #236
@tjtanaa made their first contribution in #234

Full Changelog: v0.6.2+rocm...v0.6.2.post1+rocm

Contributors

charlifu, iotamudelta, and 9 other contributors

Assets 2

02 Oct 17:29

github-actions

v0.6.2+rocm

030374b

v0.6.2+rocm Pre-release

Pre-release

What's Changed

fix dbrx weight loader by @divakar-amd in #212
Upstream merge 24 09 27 0.6.2 by @gshtras in #213

Full Changelog: v0.6.1.post1+rocm...v0.6.2+rocm

Contributors

divakar-amd and gshtras

Assets 2

27 Sep 21:48

github-actions

v0.6.1.post1+rocm

956b831

v0.6.1.post1+rocm Pre-release

Pre-release

What's Changed

Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
Upstream merge 24 9 23 by @gshtras in #203
Gating n=0 case from skinny gemm by @gshtras in #204
Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
add block_manager_v2.py into setup_cython by @sanyalington in #210
extend moe padding to DUMMY weights by @divakar-amd in #211
[Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
RPD Profiling by @dllehr-amd in #208
Cythonize vllm build by @maleksan85 in #214
Fix Dockerfile.rocm by @gshtras in #215

Full Changelog: v0.6.1_rocm...v0.6.1.post1+rocm

Contributors

sanyalington, dllehr-amd, and 5 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: ROCm/vllm

v0.7.2+rocm

What's Changed

New Contributors

Contributors

v0.7.0+rocm

What's Changed

New Contributors

Contributors

v0.6.6+rocm

What's Changed

Contributors

v0.6.4.post1+rocm

What's Changed

New Contributors

Contributors

v0.6.4+rocm

What's Changed

Contributors

v0.6.3.post2+rocm

What's Changed

What's Changed

New Contributors

Contributors

v0.6.3.post1+rocm

What's Changed

New Contributors

Contributors

v0.6.2.post1+rocm

What's Changed

New Contributors

Contributors

v0.6.2+rocm

What's Changed

Contributors

v0.6.1.post1+rocm

What's Changed

Contributors