Optimization doubt Bloom 176B #76

FerMG · 2022-08-27T02:38:51Z

FerMG
Aug 27, 2022

Hello,

I was wondering if anyone here knows if this optimization could be done in the language model Bloom 176B. To exchange high memory for lower speeds, but still be executed in the GPU in parts. Or is there some detail in that model that would stop this from being feasible?

basujindal · 2022-08-27T11:44:58Z

basujindal
Aug 27, 2022
Maintainer

Hi, I am unsure whether it would be feasible to optimize such a model by breaking it into parts. If the model weights are in float16, a 176B parameter model would take around 352GB of GPU VRAM, and to fit it into a 6GB GPU, we would need to divide it into 60 parts.
Also, we would need around 360GB of RAM to load the model from memory.

1 reply

FerMG Aug 27, 2022
Author

It looks impractical going this path so, even if there was a way to load the weights in smaller chunks to the memory. So much memory swap would make this too slow.

Thanks for the analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization doubt Bloom 176B #76

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Optimization doubt Bloom 176B #76

FerMG Aug 27, 2022

Replies: 1 comment · 1 reply

basujindal Aug 27, 2022 Maintainer

FerMG Aug 27, 2022 Author

FerMG
Aug 27, 2022

Replies: 1 comment 1 reply

basujindal
Aug 27, 2022
Maintainer

FerMG Aug 27, 2022
Author