-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Merge dsv3 tainer part #2487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Merge dsv3 tainer part #2487
Conversation
Thanks for your contribution! |
# if pp_first, the order = ["dp", "pp", "moe_sharding", "sharding", "sep", "ep", "mp"] | ||
# if sharding_first, the order is ["dp", "moe_sharding", "sharding", "pp", "sep", "ep", "mp"] | ||
order.insert(sd_idx, "moe_sharding") | ||
order = order[1:-1] + ["dp", "mp"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么要删除,删除之后会不会对原来逻辑有影响
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不修改的话会报错
File "/PaddleFormers/paddleformers/trainer/training_args.py", line 1561, in __post_init__
self.add_moe_comm_group()
File "/PaddleFormers/paddleformers/trainer/training_args.py", line 2071, in add_moe_comm_group
sharding_parallel_groups = topo.get_comm_list("sharding")
File "/py3.10/lib/python3.10/site-packages/paddle/distributed/fleet/base/topology.py", line 227, in get_comm_list
assert axis_name in self._parallel_names
AssertionError
assert ( | ||
"split_param" in sharding_parallel_config | ||
), "split_param should be set when enable_stage1_allgather_overlap." | ||
use_casual_mask = os.getenv("USE_CASUAL_MASK", "False") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
logger.warning( | ||
"pdc_download_ckpt can only be set as true inside FT environment. Automatically disable it now." | ||
) | ||
self.pdc_download_ckpt = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,这里是nlp的修改,我们的处于较旧版本,应该和paddleformers里的新版本保持一致。建议先不删进行验证。
hidden_states, | ||
self.dispatched_routing_map, | ||
num_out_tokens=sum(self.tokens_per_expert), | ||
token_permuted_indices, prob_permuted_indices = topk_to_permuted_indices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件改动比较大,应该需要新起功能模块
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经对新增的函数名称加上 _fast
后缀
filenames = [filenames] | ||
|
||
# check repo id | ||
if download_hub is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方不应该删。download文件影响其他所有模型加载,我们在这里不能对其原有功能做大变动。之前说适配新版 tokenizer 只需要在 run_pretrain.py 里加 kwarg 就行?为什么还要更改这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经修改成只增加最后一个 if else
and moe_group == "data" | ||
): | ||
self.moe_group = dist.fleet.get_hybrid_communicate_group().get_data_parallel_group() | ||
if is_fleet_init and dist.get_world_size() > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不修改的话会报错,因为 dist.fleet.get_hybrid_communicate_group().get_data_parallel_world_size()
的结果是 1
logger.info(strategy) | ||
|
||
if self.expert_parallel_degree > 1: | ||
self.add_moe_comm_group() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删掉的话会报错
File "/lib/python3.10/site-packages/paddle/distributed/fleet/meta_parallel/parallel_layers/pp_layers.py", line 79, in build_layer
return self.layer_func(*self.inputs, **{**self.kwargs, **extra_kwargs})
File "/PaddleFormers/paddleformers/transformers/deepseek_v2/modeling.py", line 2275, in __init__
DeepseekV2MoE(
File "/PaddleFormers/paddleformers/transformers/deepseek_v2/modeling.py", line 1018, in __init__
super().__init__(
File "/PaddleFormers/paddleformers/transformers/moe_layer.py", line 225, in __init__
self.moe_group = dist.fleet.get_hybrid_communicate_group().expert_parallel_group
AttributeError: 'HybridCommunicateGroup' object has no attribute 'expert_parallel_group'. Did you mean: 'get_data_parallel_group'?
PR types
New features
PR changes
Models
Description