Releases · databricks/megablocks

31 Aug 14:49

v0.6.1

342c297

v0.6.1 Latest

Latest

What's New

Patch release to remove dependencies specified via github and instead use released versions through pypi (specifically, stanford-stk and grouped-gemm). This allows for releasing megablocks itself via pypi.

What's Changed

Remove direct dependencies, allowing for megablocks pypi release by @snarayan21 in #149

Full Changelog: v0.6.0...v0.6.1

Contributors

snarayan21

Assets 2

30 Aug 18:55

eitanturok

v0.6.0

9243f70

v0.6.0

What's New

1. Torch 2.4 Compatibility (#145)

MegaBlocks now supports Torch 2.4!

2. New CI/CD

MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (#131) and run tests on a GPU (#127).

3. Remove Weight Parallelism (#137)

Weight parallelism was not in use and so we removed it.

4. Shared Experts (#109)
Implement shared experts, based on the DeepSeekMoE paper.

Bug Fixes

Better handle incompatible ffn sizes (#108)
Fix AMP for memory optimized options (#111)
Don't save moe lb-loss tensors (#119)

What's Changed

Remove turbo by @dblalock in #96
Update README.md by @dakinggg in #98
Fix for ffn_hidden_size of 128, and better error message for incompatible ffn sizes. by @snarayan21 in #108
Add Shared Expert by @vchiley in #109
Fix AMP for memory optimized options by @mvpatel2000 in #111
bump and pin versions by @vchiley in #112
dont save moe lb-loss tensors if args.moe_loss_weight=0 by @michael-go in #119
bump by @vchiley in #116
Minor changes to batched_load_balancing_loss function by @ShashankMosaicML in #121
Migrate tests to pytest + add GA by @eitanturok in #127
Change Runner in GA by @eitanturok in #129
Clean up setup.py by @eitanturok in #128
only run GA if repo owner is Databricks by @eitanturok in #135
GA to Lint + Format MegaBlocks by @eitanturok in #131
bump ci-testing to v0.1.2 by @eitanturok in #138
remove weight parallelism by @eitanturok in #137
refactor testing by @eitanturok in #140
Type Checking by @eitanturok in #141
Bump torch to <2.4.1 by @eitanturok in #145

New Contributors

@dakinggg made their first contribution in #98
@michael-go made their first contribution in #119
@ShashankMosaicML made their first contribution in #121

Full Changelog: v0.5.1...v0.6.0

Contributors

dblalock, michael-go, and 6 other contributors

Assets 2

11 Jan 22:14

tgale96

v0.5.1

f05609c

v0.5.1

What's Changed

Update dependencies and package organization. by @tgale96 in #52
Remove errant "*" in README by @tgale96 in #54
Update Megatron-LM scripts and integration for latest Docker container. by @tgale96 in #55
Update setup.py to support multiple device capabilities by @simon-mo in #56
enable arg enabled normalization of routing weights by @vchiley in #58
More customizable norm for expert weights by @snarayan21 in #60
Update README.md by @eltociear in #63
enable custom activation functions by @vchiley in #65
Skip updating load balancing loss on eval by @sedrick-keh-tri in #69
Change router weight norm from in-place by @sashaDoubov in #70
add mem optimized grouped glu by @vchiley in #66
Add cast to tensor for DTensor inputs for groupedmlp by @eracah in #71
Dtensor to all paths by @mvpatel2000 in #73
Refactor dtesnor by @mvpatel2000 in #74
Mem opt glu bkwd by @mvpatel2000 in #72
Add dmlp registry args by @j316chuck in #75
Fix default to be sparse by @mvpatel2000 in #76
Fix moe_normalize_expert_weights when top_k=1 by @152334H in #87
Updt triton pin by @vchiley in #89