Models initiated in workers do not get deleted after workers complete their batches unless all workers complete their works #6415

mrxiaohe · 2020-11-20T01:30:08Z

mrxiaohe
Nov 20, 2020

Your Environment

Operating System: Windows 10
Python Version Used: 3.8
spaCy Version Used: 2.3.2
Environment Information:

I am using joblib to send batches of documents to 6 workers -- if a worker completes its batch, another batch is batched out. So the goal is that at any given point, there should be 6 workers. In each worker, a spacy model is loaded. I was expecting that once a worker completed a batch, the GPU memory used by this worker would be released. However, it doesn't seem to be the case. After a worker finishes its batch, a new batch of documents is batched out, but because the GPU memory used by the completed worker has not been released, I end up getting an out of memory error. A sample code is below:

import math, os, csv, glob, json
from spacy.util import minibatch
from joblib import Parallel, delayed
from functools import partial
from multiprocessing import Manager
import joblib.parallel

def scoring_worker(batch_id, batch, results_dict):
    import spacy
    if batch_id % 2 == 0:
        spacy.require_gpu(0)
    else:
        spacy.require_gpu(1)
    
    nlp = spacy.load(PATH_TO_CUSTOM_TRAINED_MODEL, disable=["parser", "ner"])
    
    for doc in nlp.pipe(batch, as_tuples=True, batch_size=500):
        doc, docid = doc
        score = doc.cats["RELEVANT"]
        results_dict[docid] = score
    
    print("Done with", batch_id)
    return
    
def main():
    documents = []
    manager = Manager()
    results_dict = manager.dict()
    with open(PATH_TO_JSON_FORMAT_DOCUMENT_FILE, encoding="utf-8") as f:
        for row in f:
            row = json.loads(row.strip())
            results_dict[row["docid"]] = 0.0
            documents.append((row["text"], row["docid"]))
    
    print("Data imported")
    
    batches = minibatch(documents, 50000)
    nbatches = math.ceil(len(documents)/50000)
    executor = Parallel(n_jobs=6, backend="multiprocessing", prefer="processes")
    do = delayed(partial(scoring_worker))
    tasks = (do(batch_id, batch, results_dict) for batch_id, batch in enumerate(batches)) 
    executor(tasks)
    
if __name__ == "__main__":
    main()

As you can see in the screenshot of the nvidia-smi output, there are 10 workers right now, even though in reality, 4 of them have already completed their work but have not released the memory yet.

My question is if I am approaching this the wrong way. Should I instead just divide the documents into 6 batches, instead of dividing them into more batches. The reason why I chose to go with more (smaller) batches is because I've had situations where one worker happens to have some really long documents, which causes that worker to take a long time to complete when all the other workers have already completed their work. Having smaller batches seems to reduce the severity of this issue.

Thanks!

svlandeg · 2020-12-07T21:03:11Z

svlandeg
Dec 7, 2020
Maintainer

A related issue is here: #6303. Have you tried to make sure garbage collection runs efficiently, the memory pools are cleared, etc?

I guess the other option, if you want to stick to just 6 batches, is trying to create batches with roughly equal amounts of characters, instead of equal number of documents? That might even out the processing time more.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models initiated in workers do not get deleted after workers complete their batches unless all workers complete their works #6415

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Models initiated in workers do not get deleted after workers complete their batches unless all workers complete their works #6415

mrxiaohe Nov 20, 2020

Your Environment

Replies: 1 comment

svlandeg Dec 7, 2020 Maintainer

mrxiaohe
Nov 20, 2020

svlandeg
Dec 7, 2020
Maintainer