Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153

Open
Fredditeddy opened this issue Jul 4, 2024 · 1 comment

Comments

@Fredditeddy
Copy link

Hi all,

I am currently experimenting with your provided code. Your plot indicating memory usage for the different batch sizes & max_length seems to fit perfectly for our setup for training. However, when monitoring the memory usage two things are noticeable:

  1. Memory seems to not be freed after training
  2. Memory seems to accumulate during validation.

I could not find a solution for 1.

For 2. it seems to work, to set eval_accumulation_steps, which is transferring the model outputs to CPU.

Do you have an idea?

Keep up the great work.

Best wishes,
Frederik

@Fredditeddy
Copy link
Author

Update:

eval_accumulation_steps does not work, since it accumulates all tensors on RAM.

What works so far is, not returning hidden_states and attentions.

However, I do not understand, why this is not an issue for the training loop.

I additionally added a Callback after each epoch to use torch.cuda.emtpy_cache() which seems to free the memory after the training loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant