This repository consists of a template for parallelizing tasks in Python. This template can be easily modified for your application. All methods are based on the multiprocessing module.
Using the multiprocessing module it is possible to parallelize tasks in python by
from multiprocessing import Pool
with Pool() as pool:
output = pool.map(run_subtask, arguments)
and for tasks with multiple input arguments you can use
output = pool.starmap(run_subtask, arguments)
instead.
There also exist different alternatives to pool.map
method, such as pool.map_async
, pool.imap
, pool.imap_unordered
, which might be more suitable in terms of speed and memory allocation depending on your application. For more details about these methods see this stackoverflow post or multiprocessing documentation.
When parallelizing tasks one has to be especially careful with subtasks that need access to same data structures (e.g. tasks that write to the same databases). In this case it might be necessary to further adjust these methods to avoid their collisions during the parallelized process. For more information see multiprocessing documentation. However, if you are parallelizing subtasks that do not need access to shared data structures (e.g. Monte Carlo simulations) you can simply use the methods above to speed up your computations.