Skip to content

Template for parallelizing tasks in Python with the multiprocessing module.

License

Notifications You must be signed in to change notification settings

JurajZelman/py-parallelization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Parallelization in Python

Summary

This repository consists of a template for parallelizing tasks in Python. This template can be easily modified for your application. All methods are based on the multiprocessing module.

How to parallelize tasks in Python

Using the multiprocessing module it is possible to parallelize tasks in python by

from multiprocessing import Pool

with Pool() as pool:
        output = pool.map(run_subtask, arguments)

and for tasks with multiple input arguments you can use

output = pool.starmap(run_subtask, arguments)

instead.

There also exist different alternatives to pool.map method, such as pool.map_async, pool.imap, pool.imap_unordered, which might be more suitable in terms of speed and memory allocation depending on your application. For more details about these methods see this stackoverflow post or multiprocessing documentation.

Warning - Shared databases or data structures

When parallelizing tasks one has to be especially careful with subtasks that need access to same data structures (e.g. tasks that write to the same databases). In this case it might be necessary to further adjust these methods to avoid their collisions during the parallelized process. For more information see multiprocessing documentation. However, if you are parallelizing subtasks that do not need access to shared data structures (e.g. Monte Carlo simulations) you can simply use the methods above to speed up your computations.

About

Template for parallelizing tasks in Python with the multiprocessing module.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages