Release v0.1.4 July release · ROCm/aiter

mxfp4 enable for gfx950, including GEMM, MoE, and per1x32 Quant
multi-gpu tuning enable for most kind of GEMMs
fp8 all reduce
numbers of triton kernels

What's Changed

[TRITON] Add Triton Topk Kernel by @hubertlu-tw in #458
Find executable in rocm home when not found in PATH by @xli in #549
[TRITON]: Disable int4 moe UT by @rahulbatra85 in #563
add a4w4 asm_moe by @valarLip in #482
Improved detection of setup.py install by @ekuznetsov139 in #534
Disable mha related modules in prebuild by @slippedJim in #567
Fix format error in .clang-format by @poyenc in #568
update pa asm by @amd-ruitang3 in #553
[TRITON]: Reorg mha code and use common fp8 type by @rahulbatra85 in #561
[TRITON]: Gemm refactor by @rahulbatra85 in #558
[Triton]: Add has_attr check in get_config by @rahulbatra85 in #572
[TRITON]: GEMM updates for DS by @rahulbatra85 in #573
update_codegen by @amd-ruitang3 in #581
mi350_pa by @amd-ruitang3 in #579
Change input tensor format to [B,S,H,d] and add batch support for causal by @valechen in #578
update tune config file by @solinzby1 in #569
[TRITON] Add RMSNorm bwd Triton Kernels by @lucas-santos-amd in #576
fix prebuild by @junhaha666 in #592
[TRITON]: Quantization updates(add int8 and use common fp8 dtypes) by @rahulbatra85 in #588
Dispatch combine by @junhaha666 in #571
update args by @amd-ruitang3 in #590
Pa rocm refresh4 by @fsx950223 in #591
[update]: update all-reduce by @TennyWang1223 in #552
Fix compile error in MI350 with ROCm7 by @rocking5566 in #599
new codegen for elementwise by @TennyWang1223 in #585
[fix]: elementwise prebuild slow by @TennyWang1223 in #609
[TRITON]: Fp4gemm m=256 tuning by @Chi-Chu319 in #533
add MI350 support for skinny_gemm by @yanguahe in #602
Fix prebuild 350 by @junhaha666 in #608
[fix]: change ar namespace by @TennyWang1223 in #611
compile flag clean up by @valarLip in #615
DIY_args by @amd-ruitang3 in #596
fix NUM_Q_HEADS - 1 in remap_xcd in _attn_fwd by @juuso-oskari in #612
add ck gemm a4w4 blockscale with splitK support by @ukannika-amd in #603
[TRITON]: pid grid fix by @Chi-Chu319 in #618
Refine ck instance and update a8w8_bpreshuffle_tuned_gemm.csv by @solinzby1 in #621
merge moe from 350 launch by @lalala-sh in #580
Remove seqlen limit on FA fwd kernel by @slippedJim in #622
(Triton] RoPE dev by @k50112113 in #606
[TRITON]: Fix num_warps typo which was causing performance issues by @valechen in #604
Topksoftmax_opt by @junhaha666 in #626
update hip quant for corner case by @valarLip in #633
[TRITON]: use int64 strides by default for MHA by @rahulbatra85 in #634
[TRITON]: Standardize GEMM weight shape to (N, K) and TN memory layout (by default) by @willzhou-amd in #597
[TRITON] Add Softmax Triton Kernel by @lucas-santos-amd in #605
Enable gfx942 FA fwd asm kernels by @slippedJim in #619
Update CK by @poyenc in #635
Fix error message for rocminfo by @Rohan138 in #636
[TRITON]: Moe tuning mi350 by @Chi-Chu319 in #610
Fix test_pa_ragged.py use_alibi=True test cases by @poyenc in #639
Fix FA fwd nan issue by @slippedJim in #646
fix for fp8 e4m3fn by @valarLip in #640
[TRITON]: Kernel benchmarking improvements (for op_benchmarks/triton) by @willzhou-amd in #594
[Triton]: Disable fused+causal for MHA bkwd by @rahulbatra85 in #642
enable parallel tuning on CK kernels by @yzhou103 in #625
Pa fix2 by @fsx950223 in #645
Update dependencies and add backup for unknown hw by @kunaltyagi in #623
Optimize topksoftmax WARPS_PER_TB for higher occupancy and remove redundant precision conversion by @CuiCu-618 in #652

New Contributors

@hubertlu-tw made their first contribution in #458
@xli made their first contribution in #549
@ekuznetsov139 made their first contribution in #534
@valechen made their first contribution in #578
@willzhou-amd made their first contribution in #597
@Rohan138 made their first contribution in #636
@yzhou103 made their first contribution in #625
@kunaltyagi made their first contribution in #623
@CuiCu-618 made their first contribution in #652

Full Changelog: v0.1.3...v0.1.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.4 July release

What's Changed

New Contributors

Contributors

Uh oh!