v0.5.1
What's Changed
- Update dependencies and package organization. by @tgale96 in #52
- Remove errant "*" in README by @tgale96 in #54
- Update Megatron-LM scripts and integration for latest Docker container. by @tgale96 in #55
- Update setup.py to support multiple device capabilities by @simon-mo in #56
- enable arg enabled normalization of routing weights by @vchiley in #58
- More customizable norm for expert weights by @snarayan21 in #60
- Update README.md by @eltociear in #63
- enable custom activation functions by @vchiley in #65
- Skip updating load balancing loss on eval by @sedrick-keh-tri in #69
- Change router weight norm from in-place by @sashaDoubov in #70
- add mem optimized grouped glu by @vchiley in #66
- Add cast to tensor for DTensor inputs for groupedmlp by @eracah in #71
- Dtensor to all paths by @mvpatel2000 in #73
- Refactor dtesnor by @mvpatel2000 in #74
- Mem opt glu bkwd by @mvpatel2000 in #72
- Add dmlp registry args by @j316chuck in #75
- Fix default to be sparse by @mvpatel2000 in #76
- Fix
moe_normalize_expert_weights
whentop_k=1
by @152334H in #87 - Updt triton pin by @vchiley in #89
New Contributors
- @simon-mo made their first contribution in #56
- @snarayan21 made their first contribution in #60
- @eltociear made their first contribution in #63
- @sedrick-keh-tri made their first contribution in #69
- @eracah made their first contribution in #71
- @j316chuck made their first contribution in #75
- @152334H made their first contribution in #87
Full Changelog: v0.5.0...v0.5.1