Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Roadmap (2024 Q4) #1487

Open
11 of 37 tasks
Ying1123 opened this issue Sep 21, 2024 · 21 comments
Open
11 of 37 tasks

Development Roadmap (2024 Q4) #1487

Ying1123 opened this issue Sep 21, 2024 · 21 comments

Comments

@Ying1123
Copy link
Member

Ying1123 commented Sep 21, 2024

Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). Previous 2024 Q3 roadmap can be found in #634.

Performance

Parallelism

Hardware Coverage

Model Coverage

New features

Quantization

@HaiShaw @zhyncs @ispobock

Server API

Observability

Others

@fengyang95
Copy link

Are there any plans to optimize long context latency?

@Ying1123 Ying1123 changed the title [WIP] Development Roadmap (2024 Q4) Development Roadmap (2024 Q4) Sep 22, 2024
@zhyncs zhyncs pinned this issue Sep 22, 2024
@lumiere-ml
Copy link

Hi,can I help for Multi-layer radix cache (GPU/CPU/Disk)? Really insterested in that.

@tanzelin430
Copy link

Are there any plans to optimize long context latency?

I am interested in contributing to P-D split inference architechure and I have machines that support me to develop the architechure, if you guys got any related develop plans please let me know. Thank you @Ying1123 @zhyncs @fengyang95

@merrymercy
Copy link
Contributor

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@zhyncs
Copy link
Member

zhyncs commented Oct 20, 2024

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

@tanzelin430
Copy link

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

thanks for invitation, I am in slack now. forward to collaberate with you

@lumiere-ml
Copy link

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation!

@Edenzzzz
Copy link

Edenzzzz commented Nov 11, 2024

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation!

@lumiere-ml @zhyncs I'm also very interested, could you share which channel you're using to discuss?
Perhaps we can combine radix tree prefix matching with P-D disaggregation similar to Mooncake?

@mfdj2002
Copy link

If no one is actively working on supporting pipeline parallelism, I'm down to help

@Edenzzzz
Copy link

@mfdj2002 I think @CalvinXKY has expressed interest on slack, you can chat with him there

@merrymercy
Copy link
Contributor

No one is working on pipeline parallelism. Feel free to contribute one.

@m0g1cian
Copy link

m0g1cian commented Dec 3, 2024

I recently completed a reward model implementation for RMs trained by LlamaFactory. Everything worked well but I've noticed a relatively small value diff in last hidden states between my SGLang implementation and the counterpart in TRL (resulting a ROC loss of ~0.3%)

Regardless, I think I can help with the task "Support generalized reward API (adding linear layers to any Causal LM to get the reward)"

@kuangdao
Copy link

kuangdao commented Dec 4, 2024

i am interested in sequence parallelism, i want to know if the sequence parallelism will use the method of Context Parallelism for Scalable Million-Token Inference , thanks

@zhaochenyang20
Copy link
Collaborator

I recently completed a reward model implementation for RMs trained by LlamaFactory. Everything worked well but I’ve noticed a relatively small value diff in last hidden states between my SGLang implementation and the counterpart in TRL (resulting a ROC loss of ~0.3%)

Regardless, I think I can help with the task “Support generalized reward API (adding linear layers to any Causal LM to get the reward)”

Amazing, could you please send an Email with your wechat or other connection to [email protected]

We can also discuss this on our Slack. find [email protected] on sglang slack plz!

@m0g1cian

@trh11111
Copy link

I am also very interested in the scenario of PD disaggregation, and I hope to combine radix tree with PD disaggregation for some experiments. I saw that someone mentioned this in October. May I ask how the current development plan is progressing?

@zhaochenyang20
Copy link
Collaborator

zhaochenyang20 commented Dec 11, 2024

I am also very interested in the scenario of PD disaggregation, and I hope to combine radix tree with PD disaggregation for some experiments. I saw that someone mentioned this in October. May I ask how the current development plan is progressing?

@trh11111 Yeah. We have new members joined our team work on this and PD disaggregation is the first-priority in our developmap for our next quoter.

@tanzelin430
Copy link

I am also very interested in the scenario of PD disaggregation, and I hope to combine radix tree with PD disaggregation for some experiments. I saw that someone mentioned this in October. May I ask how the current development plan is progressing?

Hi, I have just finish my graduation recruiment senson and am working on my ATC paper. I'll be soon looking into the development

@zhaochenyang20
Copy link
Collaborator

I am also very interested in the scenario of PD disaggregation, and I hope to combine radix tree with PD disaggregation for some experiments. I saw that someone mentioned this in October. May I ask how the current development plan is progressing?

Hi, I have just finish my graduation recruiment senson and am working on my ATC paper. I'll be soon looking into the development

@trh11111 if you feel interested in this part, could reach out to us on slack.

@mpjlu
Copy link
Contributor

mpjlu commented Dec 18, 2024

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

how to join this slack channel

@zhyncs
Copy link
Member

zhyncs commented Dec 20, 2024

@mpjlu
Copy link
Contributor

mpjlu commented Dec 22, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests