How to enable multiprocessing with GLiNER ? #281

OrianeN · 2025-07-30T09:52:07Z

OrianeN
Jul 30, 2025

Hello,

I've implemented GLiNER as part of a preprocessing pipeline (first tests show very good results for my use case), but in order to run the pipeline on a large dataset I would like to use some multiprocessing.

Unfortunately, it seems that the model cannot be loaded simultaneously on multiple CPUs (process hangs indefinitely after a few seconds).

Here's some reproducible code:

  
from multiprocessing import Pool  
from gliner import GLiNER  
  
  
class NER:  
    def __init__(self, model_name="urchade/gliner_multi-v2.1"):  
        print("Loading NER model...")  
        self.model = GLiNER.from_pretrained(model_name)  
        self.labels = ["name"]  
        print("NER model loaded !")  
  
    def get_entities(self, text):  
        entities = [ent["text"] for ent in self.model.predict_entities(text, self.labels)]  
        return entities  
  
  
class MyPipeline:  
    def __init__(self):  
        self.ner = NER()  
  
    def __getstate__(self):  
        self.ner = None  
        return self.__dict__  
  
    def __setstate__(self, state):  
        self.__dict__ = state  
        self.ner = NER()  
  
    def process_text(self, text):  
        # Some dummy preprocessing  
        entities = self.ner.get_entities(text)  
        out = text.lower() + " ; " + ",".join(entities)  
        return out  
  
    def preprocess_lines(self, lines):  
        with Pool(processes=2) as pool:  
            for text_out in pool.imap(self.process_text, lines):  
                print(text_out)  
  
  
if __name__ == "__main__":  
    corpus = ["My name is Franz Schubert.", "Ella Fitzgerald is my favorite singer."]  
    pip = MyPipeline()  
    pip.preprocess_lines(corpus)

The terminal shows:

Loading NER model...
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 133152.51it/s]
/home/onedey/Documents/Code/COLaF/lindisy/venv_lindisy/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:564: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
NER model loaded !
Loading NER model...
Loading NER model...
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 9968.64it/s]
Fetching 4 files: 100%|██████████| 4/4 [00:00<00:00, 4257.10it/s]

Then hangs indefinitely.
I checked htop and the memory is never full (I have 32GB of RAM).

Do you have any idea what's blocking the model to be loaded multiple times ? And any idea on how to enable multiprocessing in such case ?

Answered by OrianeN

Jul 31, 2025

I've found out that since PyTorch uses multithreading, forking processes is not possible ; however the "spawn" method works:

if __name__ == "__main__":  
    set_start_method("spawn")
    corpus = ["My name is Franz Schubert.", "Ella Fitzgerald is my favorite singer."]  
    pip = MyPipeline()  
    pip.preprocess_lines(corpus)

In my case, I also changed my code to avoid unpickling the MyPipeline instance for each sample, but at least it is possible to do multiprocessing with GLiNER.

View full answer

OrianeN · 2025-07-31T08:49:58Z

OrianeN
Jul 31, 2025
Author

I've found out that since PyTorch uses multithreading, forking processes is not possible ; however the "spawn" method works:

if __name__ == "__main__":  
    set_start_method("spawn")
    corpus = ["My name is Franz Schubert.", "Ella Fitzgerald is my favorite singer."]  
    pip = MyPipeline()  
    pip.preprocess_lines(corpus)

In my case, I also changed my code to avoid unpickling the MyPipeline instance for each sample, but at least it is possible to do multiprocessing with GLiNER.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to enable multiprocessing with GLiNER ? #281

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to enable multiprocessing with GLiNER ? #281

Uh oh!

OrianeN Jul 30, 2025

Replies: 1 comment

Uh oh!

OrianeN Jul 31, 2025 Author

OrianeN
Jul 30, 2025

OrianeN
Jul 31, 2025
Author