-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Blog of zenflow binding study #7614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive blog/lab study examining ZenFlow technology and its performance improvements with DeepSpeed CPU core binding. The study investigates how CPU core binding affects both ZeRO Offload and ZenFlow performance, documenting specific optimizations and measurement results.
- Documents ZenFlow technology and its relationship with DeepSpeed CPU core binding
- Presents performance testing results comparing different core binding strategies
- Introduces improvements to ZenFlow's core binding mechanism developed in collaboration with ZenFlow authors
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
a8a1be5
to
92d4d21
Compare
@delock thanks for the PR. Can you please modify https://github.com/deepspeedai/DeepSpeed/blob/master/docs/index.md in order to update the Latest News section of the home page? |
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
This PR adds a blog post for SuperOffload. More specifically, the blog covers the design and motivation behind SuperOffload, comparisons with previous approaches, key experiences and insights, and guidance on enabling and using SuperOffload. See also: [PR#7559](#7559) - SuperOffload implementation. [PR#990](deepspeedai/DeepSpeedExamples#990) - Examples. --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Since `make format` will generate `venv` directory, we should add it to `.gitignore`. Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
This PR improves state management for DeepCompile in the engine. Previously, the system relied only on the config flag indicating whether DeepCompile was enabled. However, DeepCompile is actually activated only when `compile()` is called. This meant that if DeepCompile was enabled in the config but `compile()` was never called, it could lead to invalid internal states (as shown in #7598). Since `enabled == True` should be interpreted as an option that modifies the behavior of `compile()`, this PR introduces clearer state management: - If .compile() is not called, the DeepCompile config has no effect on behavior. A one-time message is shown instead. - A new state, DeepCompile activated, is introduced. This represents the condition where DeepCompile is both enabled in the config and .compile() has been called. --------- Signed-off-by: Masahiro Tanaka <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
PR #6993 replaces the flat IPG buffers with a dict maintaining type-indexed buckets. The member is also renamed from `_ipg_bucket_flat_buffer` to `ipg_buckets`. Update the bucket clearing logic in `init_z3` accordingly. Signed-off-by: Junjie Mao <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Polish SuperOffload blog post; minor grammar and style fixes --------- Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
… when world size expansion. (#7599) When the world size expands from 2 to 4, then convert to universal checkpoint, and load from universal checkpoint. The new rank, for example, rank3 will load model file `zero_pp_rank_3_mp_rank_00_model_states.pt`. But this file was not produced during the last execution. For stage3, just load the first file, that is `zero_pp_rank_0_mp_rank_00_model_states`. The existing unit test TestZeROUniversalCheckpointDP::test_dp_world_size_2to4 can verify this problem. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: Guokai Ma <[email protected]>
Signed-off-by: Guokai Ma <[email protected]>
891454d
to
b21cb82
Compare
Signed-off-by: Guokai Ma <[email protected]>
Hi @sfc-gh-truwase the link from index page had been added, along with other wording fixes. Thanks for the suggestions! |
@@ -0,0 +1,199 @@ | |||
# [LAB] Study of ZenFlow and ZeRO offload performance with DeepSpeed CPU core binding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@delock what does the tag [LAB]
stand for? Is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sfc-gh-truwase The intention of LAB tag is to differentiate it from a blog which describe a new feature and novel work. LAB would more focus on unique use case and perspective of DeepSpeed features. I would suggest reproduciabilty to be one of the criteria of LAB. Reader would want to reproduce LAB results and get new understanding of DeepSpeed features. This would be the purpose of a LAB. Want to hear your thoughts on a proper tag name.
This PR add a blog/lab for study of zenflow and zero offload performance with DeepSpeed CPU core binding.