Releases · sgl-project/sgl-kernel-npu · GitHub

08 Sep 03:08

20250908 Latest

Latest

What's Changed

Update speculative tree ops by @MichelleWu351 in #64
Add test low latency by @luanyundu in #65
Solve the mlapo operator's precision anomaly in acl graph by @shengzhaotian in #68
normal_dispatch enable quant by @zuje123 in #66
Use Ascend name by @jia-rundong in #62
Lint by @BourneSun0527 in #71
Separate the buffers used by D/C and notify_dispatch to avoid conflicts by @lih827 in #67
feat:add activerank test para by @Yael-X in #75
torch memory saver for npu. by @lbk-sys in #74

New Contributors

@BourneSun0527 made their first contribution in #71
@lih827 made their first contribution in #67
@lbk-sys made their first contribution in #74

Full Changelog: 2025090...2025090

Contributors

shengzhaotian, luanyundu, and 7 other contributors

Assets 2

01 Sep 01:19

20250901

What's Changed

[Feat] Support Buffer::get_dispatch_layout interface by @Yael-X in #15
support shared expert by @zuje123 in #16
add CI workflow for Intranode test and Low-latency test by @jia-rundong in #23
added helloworld ops as example by @xiaomingbao008 in #21
fix deepep cmakelist bug by @RuixuanZhang06 in #24
Pass the SOC_VERSION as a parameter in build.sh by @RuixuanZhang06 in #27
fix SOC_VERSION not found in build.sh by @RuixuanZhang06 in #28
Add cache_assign op and packaing functionality for sgl_kernel_npu by @RuixuanZhang06 in #31
Add build deepep step by @jia-rundong in #32
MTP build tree op and verify tree op by @MichelleWu351 in #35
add dispatch normal kernel for prefill moe stage by @yuankunhan103 in #34
add combine normal kernel for prefill moe stage by @yuankunhan103 in #36
add notify dispatch kernel for prefill stage by @zuje123 in #37
support normal dispatch and combine by @zuje123 in #33
[feat] Add assign cache op by @randgun in #29
Add UT for D_C precision by @jia-rundong in #38
support offset: int64 and fix buffer waste by @RuixuanZhang06 in #41
add clang-format for code style by @xiaomingbao008 in #45
support fused cache assign op by @RuixuanZhang06 in #46
adapter a2 with empty tp by @zuje123 in #50
refined clang-format and disabled unused-variable check by @xiaomingbao008 in #52
Support only build deepep by @jia-rundong in #53
Fusion operator for MLA Preprocess by @shengzhaotian in #51
fix dispatch return param recv_count mismatch by @zuje123 in #49
fix combine memory problem by @zuje123 in #54
alloc_extend for tokens slots alloc by @ranjiewen in #43
Delete temporary performance data by @jia-rundong in #58
[fix] rename assign cache op by @randgun in #48
support cache loc update op by @RuixuanZhang06 in #60
Support build option deepep/kernels by @jia-rundong in #59
fix hostbound before notify dispatch by @zuje123 in #61
Kernel layout by @luanyundu in #57
combine use dedicated memory by @kip-cxj in #63

New Contributors

@jia-rundong made their first contribution in #23
@RuixuanZhang06 made their first contribution in #24
@MichelleWu351 made their first contribution in #35
@yuankunhan103 made their first contribution in #34
@randgun made their first contribution in #29
@shengzhaotian made their first contribution in #51
@ranjiewen made their first contribution in #43
@luanyundu made their first contribution in #57
@kip-cxj made their first contribution in #63

Full Changelog: 2025080...2025090

Contributors

xiaomingbao008, shengzhaotian, and 10 other contributors

Assets 2

01 Sep 01:18

20250802

What's Changed

created code structure which follow the rule of sgl-kernel by @xiaomingbao008 in #1
add deep_ep_ascend by @Yael-X in #2
Merge br_feature/bootstrap_730 into main by @iforgetmyname in #5
remove meaningless print by @zuje123 in #9
add deepep test example by @zuje123 in #10
update deepep readme by @zuje123 in #11
solve C++ ABI incompatibility by @zuje123 in #12
set ABI to adapt to different torch_npu versions by @zuje123 in #13

New Contributors

@iforgetmyname made their first contribution in #5

Full Changelog: https://github.com/sgl-project/sgl-kernel-npu/commits/20250802

Contributors

xiaomingbao008, iforgetmyname, and 2 other contributors

Assets 2