trying to make RAGGraphBuilder faster #47

snibbor · 2023-03-01T06:05:41Z

Hello,

Thank you again for this package and all your work on it.

I was dabbling with the RAGGraphBuilder for my dataset and I wanted to try and make the tissue graph processing faster.

I tried implementing multiprocessing on _set_node_labels and _build_topology with the pathos.multiprocessing ProcessPool. It works fine on my machine (maybe a 2-4x speedup), but it is far from an elegant solution.

I am sure there is a cleaner way to write this, maybe with joblib?
https://joblib.readthedocs.io/en/latest/parallel.html

One problem is that the memory consumption can spike fairly high above 50GB depending on the data and num_workers, so that might crash a job if someone isn't expecting it... Not sure what the work around is in Python, maybe using numba or something similar?

Anyways, thought I would pass it along. Thanks again!

Best,
Jack

snibbor added 3 commits March 1, 2023 00:48

trying to make RAGGraphBuilder faster

610c635

Update graph_builders.py

0e1bddd

Update feature_extraction.py

ac32c78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trying to make RAGGraphBuilder faster #47

trying to make RAGGraphBuilder faster #47

snibbor commented Mar 1, 2023

trying to make RAGGraphBuilder faster #47

Are you sure you want to change the base?

trying to make RAGGraphBuilder faster #47

Conversation

snibbor commented Mar 1, 2023