-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish the Llama2 sparsified models #30
Comments
The size of Llama2-70b is big. I think running our code repo on the llama2 model released on huggingface would take within minutes. Is there a reason for requesting a pruned model from our side? |
The main reason is the resources that would be required for the actual pruning of the largest Llama2-70b model... is it a modern GPU with large memory? Or a DGX box? In either case such resources are in scarcity these days... |
Okay, i see. For LLaMA-2-70B, we used 5 or 6 (the exact number i don't recall) A6000 GPUs to load the model in fp16. There is a workaround if you only have one GPU with limited memory, you can load the model in CPU with fp16. Only when a layer/block is pruned, load it into the GPU. I think this is what SparseGPT did. I can try to see if it is possible to release the pruned LLaMA-2-70b models. Not sure if there might be some licensing issues. Stay tuned. |
Thanks a lot, please let me know when/if you are able to release the LLaMA-2-70b models. |
Hi,
I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.
Thanks!
The text was updated successfully, but these errors were encountered: