Your ultimate guide to resources, papers, and blogs on Large Language Model (LLM) inference techniques! ๐โจ
-
๐ Awesome-LLM-Inference
A curated collection of papers and codes on LLM inference, including topics like FlashAttention, PagedAttention, and Parallelism. -
๐ Awesome LLM Systems Papers
A curated list of Large Language Model systems related academic papers, articles, tutorials, slides and projects.
-
๐ Awesome-Speculative-Decoding
Explore advanced methods for accelerating LLM decoding with speculative techniques. ๐ -
๐ COLING 2025 Tutorial: Speculative Decoding for Efficient LLM Inference
Full slide deck, Recording, B็ซ้พๆฅ
๐ Large Language Model Based Long Context Modeling Papers and Blogs
Dive deep into papers and blogs on extending LLM context length, efficient transformers, and retrieval-augmented generation (RAG). ๐ง โจ
๐ Awesome MoE LLM Inference System and Algorithm
A comprehensive list of resources for optimizing MoE-based LLM inference. Perfect for tackling sparse expert models! ๐
Efficient management of KV Caches for LLM acceleration! โก
- ๐ Awesome-KV-Cache-Management
Explore token-level, model-level, and system-level optimizations for KV Cache. - ๐ Awesome-KV-Cache-Compression
Must-read papers on KV Cache compression for memory-efficient LLM inference.
Explore insightful blogs and courses on cutting-edge LLM inference techniques! ๐
๐ ๅ ฅ้จๅฟ ๅค - Andrej Karpathy๏ผไป้ถๅผๅงๆๅปบ GPT ็ณปๅ
๐ MIT 6.5940 TinyML ๅ้ซๆ็ๆทฑๅบฆๅญฆไน ่ฎก็ฎ
๐ UCSD CSE 234: Data Systems for Machine Learning
๐ CMU Large Language Model System Course
๐ Learning notes for ML System
๐ A batch of noteworthy MLSys bloggers
Stay tuned for more updates! ๐