Release v0.5.1 · databricks/megablocks

What's Changed

Update dependencies and package organization. by @tgale96 in #52
Remove errant "*" in README by @tgale96 in #54
Update Megatron-LM scripts and integration for latest Docker container. by @tgale96 in #55
Update setup.py to support multiple device capabilities by @simon-mo in #56
enable arg enabled normalization of routing weights by @vchiley in #58
More customizable norm for expert weights by @snarayan21 in #60
Update README.md by @eltociear in #63
enable custom activation functions by @vchiley in #65
Skip updating load balancing loss on eval by @sedrick-keh-tri in #69
Change router weight norm from in-place by @sashaDoubov in #70
add mem optimized grouped glu by @vchiley in #66
Add cast to tensor for DTensor inputs for groupedmlp by @eracah in #71
Dtensor to all paths by @mvpatel2000 in #73
Refactor dtesnor by @mvpatel2000 in #74
Mem opt glu bkwd by @mvpatel2000 in #72
Add dmlp registry args by @j316chuck in #75
Fix default to be sparse by @mvpatel2000 in #76
Fix moe_normalize_expert_weights when top_k=1 by @152334H in #87
Updt triton pin by @vchiley in #89

Full Changelog: v0.5.0...v0.5.1