You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently experimenting with your provided code. Your plot indicating memory usage for the different batch sizes & max_length seems to fit perfectly for our setup for training. However, when monitoring the memory usage two things are noticeable:
Memory seems to not be freed after training
Memory seems to accumulate during validation.
I could not find a solution for 1.
For 2. it seems to work, to set eval_accumulation_steps, which is transferring the model outputs to CPU.
Do you have an idea?
Keep up the great work.
Best wishes,
Frederik
The text was updated successfully, but these errors were encountered:
Hi all,
I am currently experimenting with your provided code. Your plot indicating memory usage for the different batch sizes & max_length seems to fit perfectly for our setup for training. However, when monitoring the memory usage two things are noticeable:
I could not find a solution for 1.
For 2. it seems to work, to set eval_accumulation_steps, which is transferring the model outputs to CPU.
Do you have an idea?
Keep up the great work.
Best wishes,
Frederik
The text was updated successfully, but these errors were encountered: