Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change process group order to optimize inter-node communication #17

Open
cailun01 opened this issue Dec 22, 2024 · 3 comments
Open

Change process group order to optimize inter-node communication #17

cailun01 opened this issue Dec 22, 2024 · 3 comments

Comments

@cailun01
Copy link

To decrease inter-node communication volume, we intend to configure the ranks within the Pipeline Parallel process group to facilitate inter-node communication. Maybe self.grid of ProcessGroupManager should be:

self.grid = torch.arange(self.world_size).view(pp_size, dp_size, cp_size, tp_size)  # PP * DP * CP * TP grid

instead of

# https://github.com/huggingface/picotron/blob/df3ae8a5f0cce213816b6b287b7febc75ab98a53/picotron/process_group_manager.py#L13
self.grid = torch.arange(self.world_size).view(dp_size, pp_size, cp_size, tp_size)  # DP * PP * CP * TP grid
@zzhhjjj
Copy link
Collaborator

zzhhjjj commented Jan 10, 2025

Is there any particular reason for this order change? I was following the LLaMA 3.1 paper but didn’t really run the benchmarks

@cailun01
Copy link
Author

Is there any particular reason for this order change? I was following the LLaMA 3.1 paper but didn’t really run the benchmarks

Llama3.1 uses FSDP but picotron doesn't.

@Hannibal046
Copy link

agreed! communication volume: dp > pp, communication speed: intra-node > inter-node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants