Did you actually test batch size not in a power of 2 ? #14
-
Hi, First of all thanks for sharing these tips and giving explanations. As far as batch size is concerned you advise choosing it in a power of 2. Nvidia also recommends this choice. But I also saw people training their model with eg batch size 40. Do you have evidence / experiment results that choosing a batch size not equal to a power of 2 has nefarious effect on training performances ? Are there cases where this rule does not apply ? Are there no positive effects of using a batch size greater than its nearest inferior power of 2 value. For example choosing 42 instead of 32. Thanks for your sharing your feedback ! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
It is fine to use batch sizes that aren't powers of 2. I have no evidence that non powers of two cause major problems, but I can't claim to know the details of what works best for every hardware platform. Personally, I often use batch sizes that are divisible by powers of 2 but aren't powers of two, such as 1536 or something. Sometimes you might be able to train faster with a batch size of 42 than 32 just because you might reduce the number of training steps required to get a particular result more than you increase the step time. |
Beta Was this translation helpful? Give feedback.
It is fine to use batch sizes that aren't powers of 2. I have no evidence that non powers of two cause major problems, but I can't claim to know the details of what works best for every hardware platform. Personally, I often use batch sizes that are divisible by powers of 2 but aren't powers of two, such as 1536 or something.
Sometimes you might be able to train faster with a batch size of 42 than 32 just because you might reduce the number of training steps required to get a particular result more than you increase the step time.