PhD student in TSAIL group, Dept. of Computer Science, Tsinghua University
Pinned Loading
-
thu-ml/ReMoE
thu-ml/ReMoE PublicCodebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
Python 54
-
thu-ml/VCAS
thu-ml/VCAS PublicOfficial code for "Efficient Backpropagation with Variance Controlled Adaptive Sampling" (ICLR 2024)
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.