-
Notifications
You must be signed in to change notification settings - Fork 645
[PD Disaggregation] Support Qwen3-MoE use PD + EP inference. #4691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
fastdeploy/worker/worker_process.py
Outdated
| create=False, | ||
| ) | ||
| step_shm_value.value[0] = -1 | ||
| if not envs.ENABLE_V1_KVCACHE_SCHEDULER: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前这里和最新代码有冲突,后面直接使用最新代码即可(解决的问题一样).
| shard_id=shard_id, | ||
| shard_dim=SHARD_ID_TO_SHARDED_DIM[shard_id], | ||
| ) | ||
| if expert_id - self.expert_id_offset >= 0 and expert_id - self.expert_id_offset < self.num_local_experts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
外面直接用了expert_id,结果里面还加了if expert_id is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
K11OntheBoat seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| quant_method = getattr(model_sublayer, "quant_method", None) | ||
| if not hasattr(quant_method, "process_weights_after_loading"): | ||
| return | ||
| if param is not None and hasattr(param, "tensor_track") and param.tensor_track is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是啥意思?没看懂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binhan的代码里,可能出现重复处理tensor_track的逻辑,加了这一行才可以规避
| num_experts = model_config.moe_num_experts[0] | ||
| else: | ||
| num_experts = model_config.moe_num_experts | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这这修改还是规避吧,不要引入这种历史在文件中
| if not hasattr(self, "mla_use_absorb"): | ||
| self.mla_use_absorb = False | ||
|
|
||
| if hasattr(self, "num_experts") and getattr(self, "moe_num_experts") is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种修改最好和原始的self.num_experts赋值放在一起,单独放在这里很突兀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前问了risheng,这里专门是一个override函数,就是兼容不同的命名方法的,放回原来的地方的话,就和重构之前一样了,又回到旧版本的风格了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
支持Qwen3-235B能够在FD使用PD + EP 部署.