Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Roadmap (2024 Q4) #1487

Open
1 of 31 tasks
Ying1123 opened this issue Sep 21, 2024 · 1 comment
Open
1 of 31 tasks

Development Roadmap (2024 Q4) #1487

Ying1123 opened this issue Sep 21, 2024 · 1 comment

Comments

@Ying1123
Copy link
Member

Ying1123 commented Sep 21, 2024

Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome (Join Bi-weekly Development Meeting). Previous 2024 Q3 roadmap can be found in #634.

Performance

Parallelism

Hardware Coverage

Model Coverage

LoRA support

Quantization

@zhyncs @ispobock

  • Torchao quantization
  • Turbomind operators integration
  • More CUTLASS mixed precision gemm integration
  • KV cache quantization (more formats + scaling factor)

Server API

Observability

Others

  • Notebook-style interactive tutorials. @zhaochenyang20
  • Compiler mode optimizations for the language (e.g. support sending a full serialized SGL program to the server). @hnyls2002
  • Memory pool refactor to better support mixing different attention layers (e.g., interleaved window attention). @Ying1123
  • Linear layers refactor. Make vLLM an optional dependency. @zhyncs
@fengyang95
Copy link

Are there any plans to optimize long context latency?

@Ying1123 Ying1123 changed the title [WIP] Development Roadmap (2024 Q4) Development Roadmap (2024 Q4) Sep 22, 2024
@zhyncs zhyncs pinned this issue Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants