Skip to content

Change process group order to optimize inter-node communication #17

Open
@cailun01

Description

@cailun01

To decrease inter-node communication volume, we intend to configure the ranks within the Pipeline Parallel process group to facilitate inter-node communication. Maybe self.grid of ProcessGroupManager should be:

self.grid = torch.arange(self.world_size).view(pp_size, dp_size, cp_size, tp_size)  # PP * DP * CP * TP grid

instead of

# https://github.com/huggingface/picotron/blob/df3ae8a5f0cce213816b6b287b7febc75ab98a53/picotron/process_group_manager.py#L13
self.grid = torch.arange(self.world_size).view(dp_size, pp_size, cp_size, tp_size)  # DP * PP * CP * TP grid

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions