I am an undergraduate student from IIIS (Yao Class), Tsinghua University. I am currently interested in efficient algorithm and machine learning system.
-
Tsinghua University, NVIDIA
- Beijing, China
Highlights
- Pro
Pinned Loading
-
thu-ml/SageAttention
thu-ml/SageAttention PublicQuantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
-
SPH_Project
SPH_Project PublicSPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.
-
thu-nics/MoA
thu-nics/MoA PublicThe official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
-
mit-han-lab/qserve
mit-han-lab/qserve PublicQServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.