v0.4.0
What's Changed
- Features integration without fp8 by @gshtras in #7
- Layernorm optimizations by @mawong-amd in #8
- Bringing in the latest commits from upstream by @mawong-amd in #9
- Bump Docker to ROCm 6.1, add gradlib for tuned gemm, include RCCL fixes by @mawong-amd in #12
- add mi300 fused_moe tuned configs by @divakar-amd in #13
- Correctly calculating the same value for the required cache blocks num for all torchrun processes by @gshtras in #15
- [ROCm] adding a missing triton autotune config by @hongxiayang in #17
- make the vllm setup mode configurable and make install mode as defaul… by @hongxiayang in #18
- enable fused topK_softmax kernel for hip by @divakar-amd in #14
- Fix ambiguous fma call by @cjatin in #16
- Rccl dockerfile updates by @mawong-amd in #19
- Dockerfile improvements: multistage by @mawong-amd in #20
- Integrate PagedAttention Optimization custom kernel into vLLM by @lcskrishna in #22
- Updates to custom PagedAttention for supporting context len upto 32k. by @lcskrishna in #25
- Update max_context_len for custom paged attention. by @lcskrishna in #26
- Update RCCL, hipBLASLt, base image in Dockerfile.rocm by @shajrawi in #24
- Adding fp8 gemm computation by @charlifu in #29
- fix the model loading fp8 by @charlifu in #30
- Update linear.py by @gshtras in #32
- Update base docker image with Pytorch 2.3 by @charlifu in #35
New Contributors
- @divakar-amd made their first contribution in #13
- @hongxiayang made their first contribution in #17
- @cjatin made their first contribution in #16
- @lcskrishna made their first contribution in #22
- @shajrawi made their first contribution in #24
Full Changelog: v0.3.3...v0.4.0