Releases: sgl-project/sgl-kernel-npu
Releases · sgl-project/sgl-kernel-npu
20250908
What's Changed
- Update speculative tree ops by @MichelleWu351 in #64
- Add test low latency by @luanyundu in #65
- Solve the mlapo operator's precision anomaly in acl graph by @shengzhaotian in #68
- normal_dispatch enable quant by @zuje123 in #66
- Use Ascend name by @jia-rundong in #62
- Lint by @BourneSun0527 in #71
- Separate the buffers used by D/C and notify_dispatch to avoid conflicts by @lih827 in #67
- feat:add activerank test para by @Yael-X in #75
- torch memory saver for npu. by @lbk-sys in #74
New Contributors
- @BourneSun0527 made their first contribution in #71
- @lih827 made their first contribution in #67
- @lbk-sys made their first contribution in #74
Full Changelog: 2025090...2025090
20250901
What's Changed
- [Feat] Support Buffer::get_dispatch_layout interface by @Yael-X in #15
- support shared expert by @zuje123 in #16
- add CI workflow for Intranode test and Low-latency test by @jia-rundong in #23
- added helloworld ops as example by @xiaomingbao008 in #21
- fix deepep cmakelist bug by @RuixuanZhang06 in #24
- Pass the SOC_VERSION as a parameter in build.sh by @RuixuanZhang06 in #27
- fix SOC_VERSION not found in build.sh by @RuixuanZhang06 in #28
- Add cache_assign op and packaing functionality for sgl_kernel_npu by @RuixuanZhang06 in #31
- Add build deepep step by @jia-rundong in #32
- MTP build tree op and verify tree op by @MichelleWu351 in #35
- add dispatch normal kernel for prefill moe stage by @yuankunhan103 in #34
- add combine normal kernel for prefill moe stage by @yuankunhan103 in #36
- add notify dispatch kernel for prefill stage by @zuje123 in #37
- support normal dispatch and combine by @zuje123 in #33
- [feat] Add assign cache op by @randgun in #29
- Add UT for D_C precision by @jia-rundong in #38
- support offset: int64 and fix buffer waste by @RuixuanZhang06 in #41
- add clang-format for code style by @xiaomingbao008 in #45
- support fused cache assign op by @RuixuanZhang06 in #46
- adapter a2 with empty tp by @zuje123 in #50
- refined clang-format and disabled unused-variable check by @xiaomingbao008 in #52
- Support only build deepep by @jia-rundong in #53
- Fusion operator for MLA Preprocess by @shengzhaotian in #51
- fix dispatch return param recv_count mismatch by @zuje123 in #49
- fix combine memory problem by @zuje123 in #54
- alloc_extend for tokens slots alloc by @ranjiewen in #43
- Delete temporary performance data by @jia-rundong in #58
- [fix] rename assign cache op by @randgun in #48
- support cache loc update op by @RuixuanZhang06 in #60
- Support build option deepep/kernels by @jia-rundong in #59
- fix hostbound before notify dispatch by @zuje123 in #61
- Kernel layout by @luanyundu in #57
- combine use dedicated memory by @kip-cxj in #63
New Contributors
- @jia-rundong made their first contribution in #23
- @RuixuanZhang06 made their first contribution in #24
- @MichelleWu351 made their first contribution in #35
- @yuankunhan103 made their first contribution in #34
- @randgun made their first contribution in #29
- @shengzhaotian made their first contribution in #51
- @ranjiewen made their first contribution in #43
- @luanyundu made their first contribution in #57
- @kip-cxj made their first contribution in #63
Full Changelog: 2025080...2025090
20250802
What's Changed
- created code structure which follow the rule of sgl-kernel by @xiaomingbao008 in #1
- add deep_ep_ascend by @Yael-X in #2
- Merge br_feature/bootstrap_730 into main by @iforgetmyname in #5
- remove meaningless print by @zuje123 in #9
- add deepep test example by @zuje123 in #10
- update deepep readme by @zuje123 in #11
- solve C++ ABI incompatibility by @zuje123 in #12
- set ABI to adapt to different torch_npu versions by @zuje123 in #13
New Contributors
- @iforgetmyname made their first contribution in #5
Full Changelog: https://github.com/sgl-project/sgl-kernel-npu/commits/20250802