Replies: 2 comments
-
Can you share your command? Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Can you share your command? Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I discovered an interesting thing, by placing the MOE experts' weights in the CPU's pinned memory during loading, Triton kernels for FusedMOE can support pinned CPU tensors. This allows the DeepSeek-R1 model to run on a GPU with only 40GB of VRAM. I successfully ran the AWQ model on an A800 GPU, although the performance was relatively poor. Model is cognitivecomputations/DeepSeek-R1-AWQ
Beta Was this translation helpful? Give feedback.
All reactions