v0.6.1.post1+rocm
Pre-release
Pre-release
github-actions
released this
27 Sep 21:48
·
580 commits
to main
since this release
What's Changed
- Adding P3L measurement to the benchmarks collection tools. by @Alexei-V-Ivanov-AMD in #197
- Swapping the order of sampling operations in the conditional selector. by @Alexei-V-Ivanov-AMD in #199
- remove redundant slice when chunked prefill feature is disabled by @sanyalington in #201
- Fixing P3L incompatibility with cython. by @Alexei-V-Ivanov-AMD in #200
- Bias and more metadata in gradlib and tuned gemm by @gshtras in #202
- Upstream merge 24 9 23 by @gshtras in #203
- Gating n=0 case from skinny gemm by @gshtras in #204
- Revert "[Kernel] changing fused moe kernel chunk size default to 32k (vllm-project#7995)" by @gshtras in #207
- re-enable avoid torch slice fix when chunked prefill is disabled by @sanyalington in #209
- add block_manager_v2.py into setup_cython by @sanyalington in #210
- extend moe padding to DUMMY weights by @divakar-amd in #211
- [Int4-AWQ] Fix AWQ Marlin check for ROCm by @hegemanjw4amd in #206
- RPD Profiling by @dllehr-amd in #208
- Cythonize vllm build by @maleksan85 in #214
- Fix Dockerfile.rocm by @gshtras in #215
Full Changelog: v0.6.1_rocm...v0.6.1.post1+rocm