Why is multiprocessing forced to spawn processes rather than forking #779

mbsabath · 2025-01-09T17:49:25Z

❓ The question

Hi all, I'm working on a change to OLMo to support use of additional Pytorch Dataset Classes in our fork of OLMO, and I'm getting some OOM errors due to the use of process spawning rather than forming. I'm considering making process start method configurable, but wanted to understand more about the reasons for forcing all multiprocessing to be done with spawn before I went ahead with the change.

aman-17 · 2025-01-28T21:39:23Z

The OOM errors you’re encountering might stem from increased memory usage associated with spawn due to the duplication of resources when processes are initialized(not 100% sure). We used Memmap implementation to minimize memory storage. Maybe you can reduce the workers/batch-size to ease up.

dirkgr · 2025-01-31T18:24:14Z

The main reason is that torch isn't safe when you fork the process. It should be, but it is not.

mbsabath added the type/question An issue that's a question label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is multiprocessing forced to spawn processes rather than forking #779

Why is multiprocessing forced to spawn processes rather than forking #779

mbsabath commented Jan 9, 2025

aman-17 commented Jan 28, 2025

dirkgr commented Jan 31, 2025

Why is multiprocessing forced to spawn processes rather than forking #779

Why is multiprocessing forced to spawn processes rather than forking #779

Comments

mbsabath commented Jan 9, 2025

❓ The question

aman-17 commented Jan 28, 2025

dirkgr commented Jan 31, 2025