Replies: 1 comment 1 reply
-
Hi, I am unsure whether it would be feasible to optimize such a model by breaking it into parts. If the model weights are in float16, a 176B parameter model would take around 352GB of GPU VRAM, and to fit it into a 6GB GPU, we would need to divide it into 60 parts. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I was wondering if anyone here knows if this optimization could be done in the language model Bloom 176B. To exchange high memory for lower speeds, but still be executed in the GPU in parts. Or is there some detail in that model that would stop this from being feasible?
Beta Was this translation helpful? Give feedback.
All reactions