v0.6.0_rocm
Pre-release
Pre-release
github-actions
released this
05 Sep 17:10
·
813 commits
to main
since this release
What's Changed
- Features integration without fp8 by @gshtras in #7
- Layernorm optimizations by @mawong-amd in #8
- Bringing in the latest commits from upstream by @mawong-amd in #9
- Bump Docker to ROCm 6.1, add gradlib for tuned gemm, include RCCL fixes by @mawong-amd in #12
- add mi300 fused_moe tuned configs by @divakar-amd in #13
- Correctly calculating the same value for the required cache blocks num for all torchrun processes by @gshtras in #15
- [ROCm] adding a missing triton autotune config by @hongxiayang in #17
- make the vllm setup mode configurable and make install mode as defaul… by @hongxiayang in #18
- enable fused topK_softmax kernel for hip by @divakar-amd in #14
- Fix ambiguous fma call by @cjatin in #16
- Rccl dockerfile updates by @mawong-amd in #19
- Dockerfile improvements: multistage by @mawong-amd in #20
- Integrate PagedAttention Optimization custom kernel into vLLM by @lcskrishna in #22
- Updates to custom PagedAttention for supporting context len upto 32k. by @lcskrishna in #25
- Update max_context_len for custom paged attention. by @lcskrishna in #26
- Update RCCL, hipBLASLt, base image in Dockerfile.rocm by @shajrawi in #24
- Adding fp8 gemm computation by @charlifu in #29
- fix the model loading fp8 by @charlifu in #30
- Update linear.py by @gshtras in #32
- Update base docker image with Pytorch 2.3 by @charlifu in #35
- Removed HIP specific matvec logic that is duplicated from tuned_gemm.py and doesn't support bf16 by @gshtras in #23
- Use inp_view for out = F.linear() in TunedGemm by @charlifu in #36
- Fix the symbol not found issue of the new base image by @charlifu in #37
- G42 bias triton fix rocm main by @gshtras in #38
- Update ROCm vLLM to 0.4.3 by @mawong-amd in #40
- Re-applying G42 bias triton fix on 0.4.3 by @gshtras in #41
- Fix RCCL install, linear.py logic, CMake custom extension, update requirement for FP8 compute by @mawong-amd in #42
- Linting main in line with upstream requirements by @mawong-amd in #43
- Include benchmark scripts in container by @mawong-amd in #45
- Adding fp8 to gradlib by @charlifu in #44
- Update fp8_gemm_tuner.py exchange import torch and hipbsolidxgemm by @liligwu in #46
- Supporting quantized weights from Quark by default. by @charlifu in #47
- Update quark quantizer command in fp8 instruction by @charlifu in #49
- Fix LLMM1 kernel by @fxmarty in #28
- Use scaled mm for untuned fp8 gemm by @charlifu in #50
- tuned moe configs v2 by @divakar-amd in #33
- Revert "Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe kern…" by @hthangirala in #51
- Revert "Revert "Tune fused_moe_kernel for TP 1,2,4,8 and bf16 and fp16, updated moe kern…"" by @divakar-amd in #53
- fix init files by @divakar-amd in #52
- adds wvSpltK optimization for skinny gemm. by @amd-hhashemi in #54
- Fix 8K decode latency jump issue. by @lcskrishna in #55
- Adding quantization_weights_path for fp8 weights by @charlifu in #57
- Refactor custom gemm heuristics by @gshtras in #56
- wvSpltK fix for 10GB+ output tensors by @amd-hhashemi in #61
- uint64_t instead of unsigned long for clarity by @mawong-amd in #62
- fix for oob LDS fill in wvSpltK slm version by @amd-hhashemi in #63
- [Kernel] Enable custom AR on ROCm by @wenkaidu in #27
- Fix the Runtime Error When Loading kv cache scales by @charlifu in #65
- Fix numpy and XGMI 1-hop detection by @mawong-amd in #67
- Fix XGMI linting by @mawong-amd in #68
- Merging fp8_gemm_tuner.py to gemm_tuner.py by @charlifu in #66
- Wokaround for SWDEV-470361 by @gshtras in #69
- [1/2] Fix up ROCm 6.2 tests correctly in main by @mawong-amd in #72
- [2/2] Using xfail instead of skip for ROCm 6.2 tests by @mawong-amd in #70
- Dockerfile updates: base image, preemptive uninstalls; restore ROCm 6.2 metrics test by @mawong-amd in #73
- Return int64 dtype for solidx in tuning results by @charlifu in #74
- [Build/CI] tests for rocm/vllm:main as of 2024-06-28 by @Alexei-V-Ivanov-AMD in #77
- Fix gradlib fp8 output by @charlifu in #76
- Allocate workspace for hipblaslt fp8 gemm. by @charlifu in #78
- Mixtral moe tuning for mi308 by @divakar-amd in #80
- Remove elementwise kernel before each fp8 gemm by @charlifu in #81
- Charlifu/avoid tensor creation before each gemm by @HaiShaw in #82
- TP=1 moe tuning for mixtral-8x7B by @divakar-amd in #84
- Mixtral-8x22B tuning mi308x by @divakar-amd in #85
- moe tuning for larger input lens by @divakar-amd in #86
- Reduce csv writes by @charlifu in #92
- fix the type error due to the miss-use of the logging module by @liligwu in #105
- Update Dockerfile.rocm by @shajrawi in #107
- Greg/fast server by @gshtras in #106
- converts wvSpltK reduce to pure dpp for further perf uplift. by @amd-hhashemi in #64
- Revert "Fix 8K decode latency jump issue." by @mawong-amd in #108
- adding a simple model invocation involving fp8 calculation/storage by @Alexei-V-Ivanov-AMD in #109
- Adding bf16 output dtype for fp8 gemm by @charlifu in #111
- Running server and LLM in different processes by @gshtras in #110
- Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters by @gshtras in #114
- Add distributed executor backend to benchmark scripts by @mawong-amd in #118
- Add weight padding for moe by @charlifu in #119
- [BugFix] Fix navi build after many custom for MI kernels added by @maleksan85 in #116
- add emtpy_cache() after each padding by @charlifu in #120
- [FIX] Gradlib OOM on Navi and sometimes on MI by @maleksan85 in #124
- Save shape when fp8 solution not found by @charlifu in #123
- Fix unit test for moe by adding padding by @charlifu in #128
- Llama3.1 by @gshtras in #129
- chat/completions endpoint by @gshtras in #121
- Optimize custom all reduce by @iotamudelta in #130
- Add BF16 support to custom PA by @sanyalington in #133
- Making check for output match in original types. It saves some memory. by @maleksan85 in #135
- Make CAR ROCm 6.1 compatible. by @iotamudelta in #137
- Car revert by @gshtras in #140
- Using the correct datatypes for streaming non-chat completions by @gshtras in #134
- Adding UNREACHABLE_CODE macro for non MI300 and MI250 cards by @maleksan85 in #138
- [FIX] gfx90a typo fix by @maleksan85 in #142
- wvsplitk templatized and better tuned for MI300 by @amd-hhashemi in #132
- [Bugfix] Dockerfile.rocm by @zstreet87 in #141
- Update test-template.j2 by @okakarpa in #145
- Adding Triton implementations awq_dequantize and awq_gemm to ROCm by @rasmith in #136
- Adding fp8 padding by @charlifu in #144
- [Int4-AWQ] Torch Int-4 AWQ Dequantization and Configuration Options by @hegemanjw4amd in #146
- buildkit requirement for building docker images by @hongxiayang in #149
- cupy build fix for SWDEV-475036 by @hongxiayang in #147
- fix outdated env for turning off triton flash attention by @hongxiayang in #151
- Nccl env for performance by @hongxiayang in #152
- Render experiments by @okakarpa in #159
- Workaround PyTorch IPC handle issue by @wenkaidu in #161
- rocm6.3 fix for docker build and debug option for gpu code by @maleksan85 in #157
- Miscellaneous cosmetic changes by @mawong-amd in #166
- V5.5 upstream merge rc by @gshtras in #167
- fnuz support for fbgemm fp8 by @gshtras in #169
- Fixing mypy after a rushed merge by @gshtras in #171
New Contributors
- @gshtras made their first contribution in #7
- @hongxiayang made their first contribution in #17
- @cjatin made their first contribution in #16
- @lcskrishna made their first contribution in #22
- @shajrawi made their first contribution in #24
- @liligwu made their first contribution in #46
- @fxmarty made their first contribution in #28
- @hthangirala made their first contribution in #51
- @amd-hhashemi made their first contribution in #54
- @wenkaidu made their first contribution in #27
- @HaiShaw made their first contribution in #82
- @maleksan85 made their first contribution in #116
- @iotamudelta made their first contribution in #130
- @zstreet87 made their first contribution in #141
- @okakarpa made their first contribution in #145
- @rasmith made their first contribution in #136
- @hegemanjw4amd made their first contribution in #146
Full Changelog: v0.6.0...v0.6.0_rocm