Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Roadmap (2024 Q3) #634

Open
7 of 34 tasks
Ying1123 opened this issue Jul 17, 2024 · 11 comments
Open
7 of 34 tasks

Development Roadmap (2024 Q3) #634

Ying1123 opened this issue Jul 17, 2024 · 11 comments

Comments

@Ying1123
Copy link
Contributor

Ying1123 commented Jul 17, 2024

Here is the development roadmap for 2024 Q3. Contributions and feedback are welcome.

Server API

Performance

Parallelism

Quantization

Observability

  • Integrate Grafana.

Model Coverage

Hardware Coverage

  • AMD support
  • CPU support
  • Mac support

Language API

LoRA Support

Usage examples

Others

@Ying1123 Ying1123 pinned this issue Jul 17, 2024
@zhyncs
Copy link
Member

zhyncs commented Jul 17, 2024

Support W8A4 quantization with fp8 activation and int4 weight.

typo: W8A4 -> W4A8

@Ying1123
Copy link
Contributor Author

Support W8A4 quantization with fp8 activation and int4 weight.

typo: W8A4 -> W4A8

Thanks! Changed.

@LinqingZhong
Copy link

May I ask if there is an example for using llava-next-interleave with multi images ?

@anatoli26
Copy link

I guess ROCm support is under Hardware Coverage - AMD support. Any ETA for this?

@usaxena-asapp
Copy link

Hey @Ying1123 - are you okay with open source contributions from developers outside the core team? Looking to find more places I can contribute and I'm excited about SGLang. Just wondering.

@Ying1123
Copy link
Contributor Author

Hey @Ying1123 - are you okay with open source contributions from developers outside the core team? Looking to find more places I can contribute and I'm excited about SGLang. Just wondering.

Hi @usaxena-asapp, definitely! There is no strict definition of a "core team," and I'm just a volunteer to coordinate. If you contribute a lot, you are a core member! Let me know if you need any help from people with experience. My suggestion is to start with small issues and PRs and join discussions. If you want to start a big one, you can start with a simple proposal to trigger collaborations from the community.

@Ying1123
Copy link
Contributor Author

I guess ROCm support is under Hardware Coverage - AMD support. Any ETA for this?

Hi @usaxena-asapp, thanks for the question, we list it in the roadmap, but we might just start with some basic tests. Optimizations will depend on how many people and resources we can get.

@anatoli26
Copy link

we list it in the roadmap, but we might just start with some basic tests. Optimizations will depend on how many people and resources we can get.

Have you tried talking to AMD for hardware samples (e.g. a pair of W7900) and software collaboration? They are trying hard to be on par with NVIDIA in software stack: AMD is Becoming a Software Company. Here's the Plan. The author of the article has some great connections with the AMD people, maybe you could write him (W1zzard under the title) to ask for contacts at AMD responsible for relations with FOSS projects?

@ghchris2021
Copy link

IDK if there's any potential interest to broaden the concepts involved in "Hardware Coverage" but in case it may raise some ideas to consider in the future:

You mention CPU support, AMD support, but there are higher level frameworks that MAY considerably help with supporting different hardware backends (CPU, GPU) so you don't necessarily have to put as much work / focus into supporting a SPECIFIC backend -- they may ease / largely solve running on more than one for the same effort.
For instance OpenCL, SYCL, Vulkan compute, maybe OpenACC, and others are somewhat portable parallel computing frameworks and support some CPU(s) and some GPU(s) typically at least a couple if not several.

IIRC OpenCL can run on Nvidia, Amd, Intel GPUs as well as Intel & AMD & I think some ARM CPUs.

IIRC SYCL runs on Intel GPUs, Intel / AMD CPUs, and I believe also NVIDIA GPUs. It may run on AMD GPUs but I'm not so sure about that.

There are higher level still frameworks / implementations that can encapsulate / provide some of the tools / implementations for such open standards e.g.

https://github.com/AdaptiveCpp/AdaptiveCpp

targets SYCL but also provides C++ std:: paralellism programming models.

POCL, RustiCL, and several other (intel, amd, nvidia, ...) development packages / solutions support particular instances of platforms with functional compatible OpenCL support.

Besides the NVIDIA, AMD GPUs Intel has generations of data center / enterprise / business / consumer grade GPUs which are strong in their capabilities and they've got the same tooling / documentation / etc. across the product line insofar as supporting stuff like SYCL, OneAPI, OpenVINO, DPC++, libraries like OneDNN, etc. etc. for GPU families and CPUs.

There exist vulkan wrappers and higher level middleware that encapsulate the details of Vulkan compute programming and expose easier to use developer interfaces / solutions for general parallel compute, math / arithmetic / matrix / vector / NN etc. stuff.

IIRC all major gpus NVIDIA / AMD / Intel have Vulkan compatible runtimes and development options available and several ARM SOC etc. GPUs as well. So it as a middleware layer could help support numerous platforms for a single quantum of effort to target Vulkan based operations for the primary memory / NN / linear algebra etc. related calculations that can be accelerated.

So I'm just suggesting trying to reach for tools to support multiple standards based platforms if that eases your work and also broadens / accelerates the support of more platforms.

@CSEEduanyu
Copy link

I noticed that the speculate decode function has been implemented in the branch https://github.com/sgl-project/sglang/pull/270/commits, why was this commit closed? How long will it take to support speculate decode? Thank you for your reply.

@TimDettmers
Copy link

This is an awesome project! Thank you for this. @Ying1123 I am interested in using SGLang for multi-LoRA deployments for a project. The alternative is currently vLLM, but I like SGLang better. I am curious about the current state and timeline for supporting S-LoRA-like deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants