Batch size increase when using FSDP, rather than reducing the memory usage #16855
Unanswered
Luofan-KK
asked this question in
DDP / multi-GPU / multi-node
Replies: 2 comments
-
Also noticing this. I would be curious why the batch size increases when I'd rather it stay at the prescribed value in the data loader |
Beta Was this translation helpful? Give feedback.
0 replies
-
Digging into the docs - I think this can be resolved by setting (in Trainer)
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was told that Fully Sharded Training shards the entire model across all available GPUs. So I guess given a fixed batch size, when available GPUs increase, the memory usage at each GPU will decrease.
But in my practice, when I using more GPU, the actual batch size increase:
actual_batch_size= num_GPUs * given_batch_size
, rather than sharding model into fine grained. I want to tune a huge model, and this must cause OOM.I've searched similar questions. And there is an example. But how can I determine the batch size?
Following is my code:
Thanks
Beta Was this translation helpful? Give feedback.
All reactions