Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models #6553

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

gyou2021
Copy link
Contributor

@gyou2021 gyou2021 commented Sep 18, 2024

Auto TP in auto_tp.py handles linear type modules in emerging complex models. 1) The result of some linear modules in a model should operate all reduce operation after running on multiple HPU/GPU cards; The name of those linear modules may be different from those in the method tp_parser(). 2) The weight of some linear modules in a model CANNOT be split to multiple HPU/GPU cards; 3) The weight of some linear modules in a model should NOT be split to multiple HPU/GPU cards to avoid decreasing performance because of afterward all gather operation (gather result from all cards). In case 1) the Linear type should change to LinearAllreduce type in DeepSpeed. In cases 2) and 3) the linear modules should keep Linear type. A configurable auto TP was proposed to handle those cases easily. The method tp_parser() will add the linear modules in case 1) (Here module name list was stored in the environment variable 'DS_ALL_REDUCE_LINEAR_ITEMS') and the method _replace() will add the linear modules in case 2) and 3) (Here module name list was stored in the environment variable 'DS_KEEP_LINEAR_ITEMS'). Those environment variables are configurable. They can be configured directly in environment variables or a configuration file.

Take the Mixtral 8x7B model as an example:
We will add 'w2' to LinearAllreduce, and keep 'gate' as Linear. 'o_proj' is the default deepspeed LinearAllreduce layer.
Add the following into some main code.
import os
os.environ["DS_ALL_REDUCE_LINEAR_ITEMS"] = "{'w2':'mixtral'}"
os.environ["DS_KEEP_LINEAR_ITEMS"] = "{'gate':'mixtral'}"

Origin Mixtral model:
image

Mixtral model of auto_TP:
image

@delock
Copy link
Collaborator

delock commented Sep 19, 2024

Hi @gyou2021 I like the goal to avoid repetition of same logic from L296 to L315, but I also have concern that models enabled by these lines will not be able to run out-of-box with this PR. This may not be friendly to self-helping users without access to proper BKC documentation to various models.

Could allReduceLinearItems have an initial value as a built-in list, then pre-pend with os.environment to get runtime configurability? I think if the model to be enabled by environment is a public model, it should be contributed to the built-in list to provide OOB experience, right?

@gyou2021

This comment was marked as resolved.

@loadams
Copy link
Contributor

loadams commented Jan 8, 2025

Hi @delock and @gyou2021 - what more needs to be done to complete this PR? Just a review/approval? Any other changes?

@loadams loadams self-assigned this Jan 13, 2025
@delock
Copy link
Collaborator

delock commented Jan 14, 2025

@loadams let me check with gyou on this PR status.

@gyou2021
Copy link
Contributor Author

Sure. I updated the code to enable it to run out-of-box. Thank you for your comments.

Hi @gyou2021 I like the goal to avoid repetition of same logic from L296 to L315, but I also have concern that models enabled by these lines will not be able to run out-of-box with this PR. This may not be friendly to self-helping users without access to proper BKC documentation to various models.

Could allReduceLinearItems have an initial value as a built-in list, then pre-pend with os.environment to get runtime configurability? I think if the model to be enabled by environment is a public model, it should be contributed to the built-in list to provide OOB experience, right?

@delock
Copy link
Collaborator

delock commented Jan 21, 2025

@loadams my questions are all resolved and I have no further question to @gyou2021 , thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants