Skip to content

Conversation

@K11OntheBoat
Copy link
Collaborator

Motivation

支持Qwen3-235B能够在FD使用PD + EP 部署.

@paddle-bot
Copy link

paddle-bot bot commented Oct 30, 2025

Thanks for your contribution!

create=False,
)
step_shm_value.value[0] = -1
if not envs.ENABLE_V1_KVCACHE_SCHEDULER:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前这里和最新代码有冲突,后面直接使用最新代码即可(解决的问题一样).

shard_id=shard_id,
shard_dim=SHARD_ID_TO_SHARDED_DIM[shard_id],
)
if expert_id - self.expert_id_offset >= 0 and expert_id - self.expert_id_offset < self.num_local_experts:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

外面直接用了expert_id,结果里面还加了if expert_id is None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ bukejiyu
❌ K11OntheBoat


K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

quant_method = getattr(model_sublayer, "quant_method", None)
if not hasattr(quant_method, "process_weights_after_loading"):
return
if param is not None and hasattr(param, "tensor_track") and param.tensor_track is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是啥意思?没看懂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binhan的代码里,可能出现重复处理tensor_track的逻辑,加了这一行才可以规避

num_experts = model_config.moe_num_experts[0]
else:
num_experts = model_config.moe_num_experts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这这修改还是规避吧,不要引入这种历史在文件中

if not hasattr(self, "mla_use_absorb"):
self.mla_use_absorb = False

if hasattr(self, "num_experts") and getattr(self, "moe_num_experts") is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种修改最好和原始的self.num_experts赋值放在一起,单独放在这里很突兀

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前问了risheng,这里专门是一个override函数,就是兼容不同的命名方法的,放回原来的地方的话,就和重构之前一样了,又回到旧版本的风格了

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants