Publish the Llama2 sparsified models #30

egeor · 2023-12-05T05:44:18Z

Hi,

I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.

Thanks!

Eric-mingjie · 2023-12-05T14:15:21Z

The size of Llama2-70b is big. I think running our code repo on the llama2 model released on huggingface would take within minutes. Is there a reason for requesting a pruned model from our side?

egeor · 2023-12-05T15:19:55Z

The main reason is the resources that would be required for the actual pruning of the largest Llama2-70b model... is it a modern GPU with large memory? Or a DGX box? In either case such resources are in scarcity these days...

Eric-mingjie · 2023-12-05T21:46:45Z

Okay, i see. For LLaMA-2-70B, we used 5 or 6 (the exact number i don't recall) A6000 GPUs to load the model in fp16. There is a workaround if you only have one GPU with limited memory, you can load the model in CPU with fp16. Only when a layer/block is pruned, load it into the GPU. I think this is what SparseGPT did.

I can try to see if it is possible to release the pruned LLaMA-2-70b models. Not sure if there might be some licensing issues. Stay tuned.

egeor · 2023-12-05T23:16:27Z

Thanks a lot, please let me know when/if you are able to release the LLaMA-2-70b models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish the Llama2 sparsified models #30

Publish the Llama2 sparsified models #30

egeor commented Dec 5, 2023

Eric-mingjie commented Dec 5, 2023

egeor commented Dec 5, 2023

Eric-mingjie commented Dec 5, 2023 •

edited

Loading

egeor commented Dec 5, 2023

Publish the Llama2 sparsified models #30

Publish the Llama2 sparsified models #30

Comments

egeor commented Dec 5, 2023

Eric-mingjie commented Dec 5, 2023

egeor commented Dec 5, 2023

Eric-mingjie commented Dec 5, 2023 • edited Loading

egeor commented Dec 5, 2023

Eric-mingjie commented Dec 5, 2023 •

edited

Loading