🎯
#pragma unroll
🤖LLM/VLM | Diffusion | CUDA | AI Infra
-
Statistics Department of JNU
- Guangzhou, China
-
14:26
(UTC +08:00) - https://github.com/DefTruth
- https://www.zhihu.com/people/qyjdef
Pinned Loading
-
lite.ai.toolkit
lite.ai.toolkit Public🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
Awesome-LLM-Inference
Awesome-LLM-Inference Public📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
-
CUDA-Learn-Notes
CUDA-Learn-Notes Public📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
-
statistic-learning-R-note
statistic-learning-R-note Public📒《统计学习方法-李航: 笔记-从原理到实现,基于R语言》200页PDF,各种手推公式细节讲解,R语言实现. 🎉🎉
-
ffpa-attn-mma
ffpa-attn-mma Public📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.