Blog of zenflow binding study #7614

delock · 2025-10-01T15:20:57Z

This PR add a blog/lab for study of zenflow and zero offload performance with DeepSpeed CPU core binding.

Copilot

Pull Request Overview

This PR introduces a comprehensive blog/lab study examining ZenFlow technology and its performance improvements with DeepSpeed CPU core binding. The study investigates how CPU core binding affects both ZeRO Offload and ZenFlow performance, documenting specific optimizations and measurement results.

Documents ZenFlow technology and its relationship with DeepSpeed CPU core binding
Presents performance testing results comparing different core binding strategies
Introduces improvements to ZenFlow's core binding mechanism developed in collaboration with ZenFlow authors

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

blogs/zenflow-corebinding/README.md

sfc-gh-truwase · 2025-10-01T17:50:52Z

@delock thanks for the PR. Can you please modify https://github.com/deepspeedai/DeepSpeed/blob/master/docs/index.md in order to update the Latest News section of the home page?

Signed-off-by: Guokai Ma <[email protected]>

This PR adds a blog post for SuperOffload. More specifically, the blog covers the design and motivation behind SuperOffload, comparisons with previous approaches, key experiences and insights, and guidance on enabling and using SuperOffload. See also: [PR#7559](#7559) - SuperOffload implementation. [PR#990](deepspeedai/DeepSpeedExamples#990) - Examples. --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Since `make format` will generate `venv` directory, we should add it to `.gitignore`. Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

This PR improves state management for DeepCompile in the engine. Previously, the system relied only on the config flag indicating whether DeepCompile was enabled. However, DeepCompile is actually activated only when `compile()` is called. This meant that if DeepCompile was enabled in the config but `compile()` was never called, it could lead to invalid internal states (as shown in #7598). Since `enabled == True` should be interpreted as an option that modifies the behavior of `compile()`, this PR introduces clearer state management: - If .compile() is not called, the DeepCompile config has no effect on behavior. A one-time message is shown instead. - A new state, DeepCompile activated, is introduced. This represents the condition where DeepCompile is both enabled in the config and .compile() has been called. --------- Signed-off-by: Masahiro Tanaka <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

PR #6993 replaces the flat IPG buffers with a dict maintaining type-indexed buckets. The member is also renamed from `_ipg_bucket_flat_buffer` to `ipg_buckets`. Update the bucket clearing logic in `init_z3` accordingly. Signed-off-by: Junjie Mao <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Polish SuperOffload blog post; minor grammar and style fixes --------- Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

… when world size expansion. (#7599) When the world size expands from 2 to 4, then convert to universal checkpoint, and load from universal checkpoint. The new rank, for example, rank3 will load model file `zero_pp_rank_3_mp_rank_00_model_states.pt`. But this file was not produced during the last execution. For stage3, just load the first file, that is `zero_pp_rank_0_mp_rank_00_model_states`. The existing unit test TestZeROUniversalCheckpointDP::test_dp_world_size_2to4 can verify this problem. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Signed-off-by: Guokai Ma <[email protected]>

delock · 2025-10-03T06:34:50Z

Hi @sfc-gh-truwase the link from index page had been added, along with other wording fixes. Thanks for the suggestions!

sfc-gh-truwase · 2025-10-03T09:46:34Z

blogs/zenflow-corebinding/README.md

@@ -0,0 +1,199 @@
+# [LAB] Study of ZenFlow and ZeRO offload performance with DeepSpeed CPU core binding


@delock what does the tag [LAB] stand for? Is it necessary?

Hi @sfc-gh-truwase The intention of LAB tag is to differentiate it from a blog which describe a new feature and novel work. LAB would more focus on unique use case and perspective of DeepSpeed features. I would suggest reproduciabilty to be one of the criteria of LAB. Reader would want to reproduce LAB results and get new understanding of DeepSpeed features. This would be the purpose of a LAB. Want to hear your thoughts on a proper tag name.

delock requested a review from Copilot October 1, 2025 15:20

Copilot AI reviewed Oct 1, 2025

View reviewed changes

delock force-pushed the gma/zenflow_binding_study branch 2 times, most recently from a8a1be5 to 92d4d21 Compare October 1, 2025 15:26

sfc-gh-truwase reviewed Oct 1, 2025

View reviewed changes

blogs/zenflow-corebinding/README.md Outdated Show resolved Hide resolved

sfc-gh-truwase reviewed Oct 1, 2025

View reviewed changes

blogs/zenflow-corebinding/README.md Outdated Show resolved Hide resolved

sfc-gh-truwase reviewed Oct 1, 2025

View reviewed changes

blogs/zenflow-corebinding/README.md Outdated Show resolved Hide resolved

sfc-gh-truwase reviewed Oct 1, 2025

View reviewed changes

blogs/zenflow-corebinding/README.md Outdated Show resolved Hide resolved

delock requested review from loadams and tjruwase as code owners October 3, 2025 03:23

delock and others added 19 commits October 3, 2025 11:23

blog for zenflow binding study

41b2edb

Signed-off-by: Guokai Ma <[email protected]>

update blog see format effect

268ab56

Signed-off-by: Guokai Ma <[email protected]>

update

c728ca3

Signed-off-by: Guokai Ma <[email protected]>

continue update format

be5e694

Signed-off-by: Guokai Ma <[email protected]>

formatting update

30ad5b3

Signed-off-by: Guokai Ma <[email protected]>

update table caption

b6eb559

Signed-off-by: Guokai Ma <[email protected]>

make test URL more obvious

53ae97a

Signed-off-by: Guokai Ma <[email protected]>

update format

a779106

Signed-off-by: Guokai Ma <[email protected]>

finetune format

192110d

Signed-off-by: Guokai Ma <[email protected]>

finetune format

0bac650

Signed-off-by: Guokai Ma <[email protected]>

add a column in the table

14865ba

Signed-off-by: Guokai Ma <[email protected]>

finetune format

991c1be

Signed-off-by: Guokai Ma <[email protected]>

fix typo

636ebb9

Signed-off-by: Guokai Ma <[email protected]>

Add venv to .gitignore (#7605)

d68b830

Since `make format` will generate `venv` directory, we should add it to `.gitignore`. Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Minor fix in the SuperOffload blog (#7612)

b2db0b8

Polish SuperOffload blog post; minor grammar and style fixes --------- Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

delock and others added 5 commits October 3, 2025 11:23

json code block tag

801fb56

Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

json code block tag

5cfa9a0

Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Wording adjustment

878d2fc

Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

Wording adjustment

4717136

Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>

add link to the blog from index page

b21cb82

Signed-off-by: Guokai Ma <[email protected]>

delock force-pushed the gma/zenflow_binding_study branch from 891454d to b21cb82 Compare October 3, 2025 03:24

delock requested a review from tohtana as a code owner October 3, 2025 03:24

delock added 2 commits October 3, 2025 11:28

Merge branch 'master' into gma/zenflow_binding_study

bf8d76b

remove unused code in logging.py

6d0abbb

Signed-off-by: Guokai Ma <[email protected]>

sfc-gh-truwase reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blog of zenflow binding study #7614

Blog of zenflow binding study #7614

delock commented Oct 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-truwase commented Oct 1, 2025

Uh oh!

delock commented Oct 3, 2025

Uh oh!

sfc-gh-truwase Oct 3, 2025

Uh oh!

delock Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

		@@ -0,0 +1,199 @@
		# [LAB] Study of ZenFlow and ZeRO offload performance with DeepSpeed CPU core binding

Blog of zenflow binding study #7614

Are you sure you want to change the base?

Blog of zenflow binding study #7614

Conversation

delock commented Oct 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-truwase commented Oct 1, 2025

Uh oh!

delock commented Oct 3, 2025

Uh oh!

sfc-gh-truwase Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

delock Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

delock Oct 3, 2025 •

edited

Loading