PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153

Fredditeddy · 2024-07-04T13:33:25Z

Hi all,

I am currently experimenting with your provided code. Your plot indicating memory usage for the different batch sizes & max_length seems to fit perfectly for our setup for training. However, when monitoring the memory usage two things are noticeable:

Memory seems to not be freed after training
Memory seems to accumulate during validation.

I could not find a solution for 1.

For 2. it seems to work, to set eval_accumulation_steps, which is transferring the model outputs to CPU.

Do you have an idea?

Keep up the great work.

Best wishes,
Frederik

Fredditeddy · 2024-07-05T07:31:05Z

Update:

eval_accumulation_steps does not work, since it accumulates all tensors on RAM.

What works so far is, not returning hidden_states and attentions.

However, I do not understand, why this is not an issue for the training loop.

I additionally added a Callback after each epoch to use torch.cuda.emtpy_cache() which seems to free the memory after the training loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153

PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153

Fredditeddy commented Jul 4, 2024

Fredditeddy commented Jul 5, 2024

PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153

PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153

Comments

Fredditeddy commented Jul 4, 2024

Fredditeddy commented Jul 5, 2024