🚀 LLM Inference All-in-One 🌟

Your ultimate guide to resources, papers, and blogs on Large Language Model (LLM) inference techniques! 📚✨

🏆 Awesome Lists

Overview

🔗 Awesome-LLM-Inference
A curated collection of papers and codes on LLM inference, including topics like FlashAttention, PagedAttention, and Parallelism.
🔗 Awesome LLM Systems Papers
A curated list of Large Language Model systems related academic papers, articles, tutorials, slides and projects.

🌀 Speculative Decoding

🔗 Awesome-Speculative-Decoding
Explore advanced methods for accelerating LLM decoding with speculative techniques. 🚀
🔗 COLING 2025 Tutorial: Speculative Decoding for Efficient LLM Inference
Full slide deck, Recording, B站链接

📏 Long-Context Modeling

🔗 Large Language Model Based Long Context Modeling Papers and Blogs
Dive deep into papers and blogs on extending LLM context length, efficient transformers, and retrieval-augmented generation (RAG). 🧠✨

🧩 Mixture of Experts (MoE)

🔗 Awesome MoE LLM Inference System and Algorithm
A comprehensive list of resources for optimizing MoE-based LLM inference. Perfect for tackling sparse expert models! 🌟

🗂️ KV Cache Management

Efficient management of KV Caches for LLM acceleration! ⚡

🔗 Awesome-KV-Cache-Management
Explore token-level, model-level, and system-level optimizations for KV Cache.
🔗 Awesome-KV-Cache-Compression
Must-read papers on KV Cache compression for memory-efficient LLM inference.

📝 Resources

Explore insightful blogs and courses on cutting-edge LLM inference techniques! 🌐

Courses

🔗 入门必备 - Andrej Karpathy：从零开始构建 GPT 系列

🔗 MIT 6.5940 TinyML 和高效的深度学习计算

🔗 UCSD CSE 234: Data Systems for Machine Learning

🔗 CMU Large Language Model System Course

Blogs

🔗 Learning notes for ML System

🔗 A batch of noteworthy MLSys bloggers

Stay tuned for more updates! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 LLM Inference All-in-One 🌟

🏆 Awesome Lists

Overview

🌀 Speculative Decoding

📏 Long-Context Modeling

🧩 Mixture of Experts (MoE)

🗂️ KV Cache Management

📝 Resources

Courses

Blogs

About

Releases

Packages

Contributors 3

SuDIS-ZJU/llm-inference-all-in-one

Folders and files

Latest commit

History

Repository files navigation

🚀 LLM Inference All-in-One 🌟

🏆 Awesome Lists

Overview

🌀 Speculative Decoding

📏 Long-Context Modeling

🧩 Mixture of Experts (MoE)

🗂️ KV Cache Management

📝 Resources

Courses

Blogs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages