fix(deepspeed): chunk ZeRO-3 missing-key param gather to avoid OOM by itxsamad1 · Pull Request #46918 · huggingface/transformers

itxsamad1 · 2026-06-26T12:25:41Z

Summary

Fixes OOM during from_pretrained under DeepSpeed ZeRO-3 when many missing parameters need initialization (e.g. large sparse MoE models like MiniMax-M3).
Instead of gathering all uninitialized parameters in one GatheredParameters context, gather them in bounded chunks so peak rank-0 memory stays bounded.

Context

When loading large sparse MoE models under ZeRO-3, _initialize_missing_keys coalesced every uninitialized parameter into a single GatheredParameters all-gather. For models with packed expert weights this can re-materialize hundreds of GB on rank 0 and OOM.

Test plan

Existing test_init_zero3_missing_params still validates missing-key initialization under ZeRO-3
Manual repro from ZeRO-3 zero.Init does not partition composite minimax_m3_vl language submodule -> OOM on multi-GPU load #46822 on multi-GPU with DeepSpeed ZeRO-3

Fixes #46822

When initializing missing weights under DeepSpeed ZeRO-3, gathering all uninitialized parameters in a single GatheredParameters context can OOM on large sparse MoE models. Gather parameters in bounded chunks instead so peak rank-0 memory stays bounded by the chunk size. Fixes huggingface#46822

github-actions · 2026-06-26T12:42:24Z

CI Dashboard: View test results in Grafana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(deepspeed): chunk ZeRO-3 missing-key param gather to avoid OOM#46918

fix(deepspeed): chunk ZeRO-3 missing-key param gather to avoid OOM#46918
itxsamad1 wants to merge 1 commit into
huggingface:mainfrom
itxsamad1:fix/zero3-chunked-missing-keys-init

itxsamad1 commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

itxsamad1 commented Jun 26, 2026

Summary

Context

Test plan

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant