Skip to content

Conversation

hushenwei2000
Copy link

PR types

New features

PR changes

Models

Description

Copy link

paddle-bot bot commented Aug 27, 2025

Thanks for your contribution!

# if pp_first, the order = ["dp", "pp", "moe_sharding", "sharding", "sep", "ep", "mp"]
# if sharding_first, the order is ["dp", "moe_sharding", "sharding", "pp", "sep", "ep", "mp"]
order.insert(sd_idx, "moe_sharding")
order = order[1:-1] + ["dp", "mp"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要删除,删除之后会不会对原来逻辑有影响

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不修改的话会报错

  File "/PaddleFormers/paddleformers/trainer/training_args.py", line 1561, in __post_init__
    self.add_moe_comm_group()
  File "/PaddleFormers/paddleformers/trainer/training_args.py", line 2071, in add_moe_comm_group
    sharding_parallel_groups = topo.get_comm_list("sharding")
  File "/py3.10/lib/python3.10/site-packages/paddle/distributed/fleet/base/topology.py", line 227, in get_comm_list
    assert axis_name in self._parallel_names
AssertionError

assert (
"split_param" in sharding_parallel_config
), "split_param should be set when enable_stage1_allgather_overlap."
use_casual_mask = os.getenv("USE_CASUAL_MASK", "False")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

logger.warning(
"pdc_download_ckpt can only be set as true inside FT environment. Automatically disable it now."
)
self.pdc_download_ckpt = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,这里是nlp的修改,我们的处于较旧版本,应该和paddleformers里的新版本保持一致。建议先不删进行验证。

hidden_states,
self.dispatched_routing_map,
num_out_tokens=sum(self.tokens_per_expert),
token_permuted_indices, prob_permuted_indices = topk_to_permuted_indices(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件改动比较大,应该需要新起功能模块

Copy link
Author

@hushenwei2000 hushenwei2000 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经对新增的函数名称加上 _fast 后缀

filenames = [filenames]

# check repo id
if download_hub is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方不应该删。download文件影响其他所有模型加载,我们在这里不能对其原有功能做大变动。之前说适配新版 tokenizer 只需要在 run_pretrain.py 里加 kwarg 就行?为什么还要更改这里

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经修改成只增加最后一个 if else

and moe_group == "data"
):
self.moe_group = dist.fleet.get_hybrid_communicate_group().get_data_parallel_group()
if is_fleet_init and dist.get_world_size() > 1:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不修改的话会报错,因为 dist.fleet.get_hybrid_communicate_group().get_data_parallel_world_size() 的结果是 1

logger.info(strategy)

if self.expert_parallel_degree > 1:
self.add_moe_comm_group()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删掉的话会报错

 File "/lib/python3.10/site-packages/paddle/distributed/fleet/meta_parallel/parallel_layers/pp_layers.py", line 79, in build_layer
    return self.layer_func(*self.inputs, **{**self.kwargs, **extra_kwargs})
  File "/PaddleFormers/paddleformers/transformers/deepseek_v2/modeling.py", line 2275, in __init__
    DeepseekV2MoE(
  File "/PaddleFormers/paddleformers/transformers/deepseek_v2/modeling.py", line 1018, in __init__
    super().__init__(
  File "/PaddleFormers/paddleformers/transformers/moe_layer.py", line 225, in __init__
    self.moe_group = dist.fleet.get_hybrid_communicate_group().expert_parallel_group
AttributeError: 'HybridCommunicateGroup' object has no attribute 'expert_parallel_group'. Did you mean: 'get_data_parallel_group'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants