Models initiated in workers do not get deleted after workers complete their batches unless all workers complete their works #6415
Unanswered
mrxiaohe
asked this question in
Help: Other Questions
Replies: 1 comment
-
A related issue is here: #6303. Have you tried to make sure garbage collection runs efficiently, the memory pools are cleared, etc? I guess the other option, if you want to stick to just 6 batches, is trying to create batches with roughly equal amounts of characters, instead of equal number of documents? That might even out the processing time more. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your Environment
I am using
joblib
to send batches of documents to 6 workers -- if a worker completes its batch, another batch is batched out. So the goal is that at any given point, there should be 6 workers. In each worker, a spacy model is loaded. I was expecting that once a worker completed a batch, the GPU memory used by this worker would be released. However, it doesn't seem to be the case. After a worker finishes its batch, a new batch of documents is batched out, but because the GPU memory used by the completed worker has not been released, I end up getting an out of memory error. A sample code is below:As you can see in the screenshot of the
nvidia-smi
output, there are 10 workers right now, even though in reality, 4 of them have already completed their work but have not released the memory yet.My question is if I am approaching this the wrong way. Should I instead just divide the documents into 6 batches, instead of dividing them into more batches. The reason why I chose to go with more (smaller) batches is because I've had situations where one worker happens to have some really long documents, which causes that worker to take a long time to complete when all the other workers have already completed their work. Having smaller batches seems to reduce the severity of this issue.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions