[WIP] Use spawner processes to speed up spawning #3719

Jongy · 2020-06-28T22:06:28Z

SCons tends to grow large in RAM usage. It's expensive for large processes to fork().
The best solution would be vfork(), but CPython doesn't support it properly (see this issue for latest status).

Until CPython has it (and in order to support older versions as well), I added the --process-spawner SCons options, which lets each Worker create its private "spawner" process. Further exec_subprocess() calls by that Worker will send the Popen requests to the spawner via simple IPC (pickle messages over stdin/stdout), and the spawner will Popen the requested process, wait for its result and send it back to the Worker.

Overall it's quite simple, yet for RAM-heavy SCons it adds a huge speedup.

Discussed (with some performance benching) in https://pairlist4.pair.net/pipermail/scons-users/2020-June/008286.html and #3703 (specifically this comment, this one and this)

Tasks:

Take a look at replacing the IPC with mutltiprocessing module
Pipe back exceptions raised in the spawner (and add a test for that)
Run a full-build benchmark, on a real project (and not partial builds of MongoDB / SCons artificial benchmarks)
Write tests (or: modify some existing tests to use --process-wrapper)
Make sure all spawner extra code is only executed if on POSIX

Contributor Checklist:

I still have to finish these (I just want an ack from one of you the maintainers that this looks okay)

I have created a new test or updated the unit tests to cover the new/changed functionality.
I have updated CHANGES.txt (and read the README.rst)
I have updated the appropriate documentation

SCons' process (and all Python stuff you push into it) may become quite heavy in RAM. It's not cheap for RAM-heavy processes to do fork()s and exec()s. This PoC commit lets each worker of SCons have a private "spawner" process which has a tiny memory footprint. Instead of using subprocess.Popen directly, the exec_subprocess() will request the spawner process private to the current thread to do the spawning. This makes spawning much faster.

pickle.load reads 1 pickle message from the buffer object and returns, so theres' no need to precede the message with size.

Jongy · 2020-06-28T22:10:54Z

SCons/Platform/posix.py

+if process_spawner:
+    from SCons.Job import spawner_tls
+
+    def exec_subprocess(l, env):


I modified only exec_subprocess, and not exec_popen3 / _subproc because they are more complex to wrap (although not impossible), and the "big win" for performance is from this function, which is the most commonly called (AFAIK)

Jongy · 2020-06-28T22:11:45Z

SCons/Platform/__init__.py

@@ -54,6 +54,10 @@
 import SCons.Subst
 import SCons.Tool

+# Gets set to True in Main._main() when the '--process-spawner'
+# option was given on the command-line for speeding up process spawning.
+process_spawner = False


Not sure if this is the right spot to place it? It's POSIX only so it should be in posix.py maybe? I followed #3703 on this one.

bdbaddog · 2020-06-29T05:05:07Z

Please fix sider issues.
Is this PR in your opinion ready to merge, or still a work in progress?

Jongy · 2020-06-29T22:05:18Z

Is this PR in your opinion ready to merge, or still a work in progress?

The concept is not going to change much. I am planning to add a few more features before I think it's ready to merge, though. I'll update the description with tasks. I'll complete them further on this week.

Jongy · 2020-07-03T13:42:43Z

About multiprocessing: I created a modified version that uses multiprocessing.Pool with the spawn context (so workers are not heavy forks of the main SCons Python process, but lightweight scripts). You can see it here. According to my tests, it performs a bit worse (benchmarks of perl scons_bld.pl builds, default settings).

multiprocessing

real    0m50.058s
user    2m38.115s
sys     1m0.131s

current version

real    0m45.071s
user    2m32.627s
sys     0m55.708s

Given these results I'll stick with the plain version.

Jongy · 2020-07-03T13:57:00Z

I benched the build of Godot:

godot with spawner

real    4m52.019s
user    32m16.345s
sys     2m14.654s

godot master

real    4m53.794s
user    32m13.860s
sys     2m27.330s

This amounts to a tiny 0.5% improvement (user+sys time). I wasn't expecting to see much because Godot's SCons doesn't use too much RAM (starts at 60MB, climbs up to 150MB). Impact becomes more obvious as RAM usages grows.

Jongy · 2020-07-03T13:59:02Z

SCons/Job.py

@@ -264,6 +268,37 @@ def run(self):

                self.resultsQueue.put((task, ok))

+    class Spawner:


Should move to posix.py

bdbaddog · 2020-07-11T03:31:42Z

I'm not seeing any compelling results above.
Do you have any examples where you have notable improvements?
Also, probably good to show peak memory usage as that's the main stated reason to use vfork?

Jongy · 2020-07-11T21:48:34Z

Do you have any examples where you have notable improvements?

I don't have an open-source example with heavy memory consumption at hand (if you know of any project that might fit, please let me know and I'll try on it).

I was thinking this week to take a more accurate measuring: the problem with a simple time is that it also takes into account the time spent by GCC and all other tools executed by SCons. What we really want to measure here is the time of SCons only (and when using the process spawner, the time of SCons + all spawner processes). This will give the percentage of improvement over current SCons, which is really the metric we seek.

Obviously, if that metric shows no notable improvement, then it's not worth it. Though, like I said, for projects with heavy memory consumption, this is proved useful (I can share some results of Godot build where I forced SCons to take, say, 500MB. Difference becomes notable)

bdbaddog · 2020-07-11T21:51:02Z

Did you try --debug=time ?

Jongy · 2020-07-11T21:52:09Z

Did you try --debug=time ?

Nope, cool! Will try that, thanks for the tip.

Jongy · 2020-07-20T00:17:42Z

Okay, I tried with --debug=time, but I didn't completely understand how it accounts times between SCons and executed commands, and anyway, it is based on clock time and not CPU time.

So I tried the following - I added this small snippet:

import resource
r = resource.getrusage(resource.RUSAGE_SELF)
print("main", r.ru_utime, r.ru_stime, file=sys.stderr)

to SCons' main() and to the spawner.py code - right before they exit. This should give accurate accounting for CPU time for the SCons process and the child spawners.

Trying again with Godot. On this branch without --process-spawner, the result is: main 34.841785 8.357901, so a total of 43.07 CPU seconds for SCons.

On this branch with --process-spawner, the result is main 33.671169 1.498248 plus 8 spawner processes (I used -j 8):

spawner 0.095034 0.14987699999999998
spawner 0.142122 0.13424999999999998
spawner 0.103977 0.096798
spawner 0.091567 0.122125
spawner 0.12446299999999999 0.111069
spawner 0.084918 0.127081
spawner 0.107025 0.121252
spawner 0.083006 0.162874

that account to 1.85 seconds. So the total is 37.02.

I admit this doesn't look notable comparing to the total build time for Godot (which is about 2k CPU seconds) but it is a notable portion of the SCons time (about 15%), as and as SCons' time grows, this part grows as well. With SCons at 700MB~ ,the --process-spawner version cuts by half the SCons time (it remains at 37-38 seconds, while the standard version climbs to 75-80).

acmorrow · 2020-07-20T11:54:53Z

@Jongy - The MongoDB build might be another one to try if you want some additional data points. We too would like to see SCons be faster.

Jongy · 2020-07-20T23:52:23Z

@acmorrow I tested build of mongo-r4.2.8.

Build commandline I used: SCONS_VERSION=test python3 buildscripts/scons.py MONGO_VERSION=4.2.8 mongos --disable-warnings-as-errors (I made sure to place my SCons under src/third_party/scons-test/scons-local-test).

Without --process-spawner, SCons time is main 59.585993 16.084736 - 75.67 seconds in SCons.

With --process-spawner, the result is main 56.936538 3.723293 (plus 8 spawners that total on 1.8559) - 62.51 seconds in SCons.

Jongy · 2020-07-23T18:55:53Z

@bdbaddog What do you think about these results? There is no huge effect on the build time, but that's as far as it could go, since SCons itself is only a small portion of that time. But the improvement in SCons-only time is definitely notable, and like I said this is aimed at projects consuming more RAM, where this change is much more effective.

Also you can see in the this comment - when SCons is under stress, this change has positive effects.

Waiting for your approval (or disapproval :) before I go on with this.

bdbaddog · 2020-07-23T21:21:56Z

Curious how much RAM are you seeing SCons take.
I've seen cmake take > 5G of RAM..
There's chances that other things you're doing in your build are causing larger than necessary scons footprints.

Jongy · 2020-07-24T00:18:44Z

IIRC it was about 1.5G, but I don't have access to the code at the moment, so can't verify. It's indeed very likely the build doesn't really need those 1.5G - like every other program, it can be optimized. Nonetheless, this idea proposed here had notable effects on the build time, at practically zero cost.

grossag · 2023-02-08T17:37:14Z

Just FYI as of Python 3.10 Linux Python's subprocessing module uses vfork() instead of fork() unless the caller specifically disables it. So this pull request may not be needed anymore.

bdbaddog · 2023-02-08T18:28:38Z

@grossag - Thanks for they update and keeping an eye on developments related to this PR. :)

Jongy · 2023-02-08T22:48:21Z

Just FYI as of Python 3.10 Linux Python's subprocessing module uses vfork() instead of fork() unless the caller specifically disables it. So this pull request may not be needed anymore.

Yup, as noted in the description - tracking ticket was https://bugs.python.org/issue35823 and it got merged in 3.10. This PR is far less relevant now

bdbaddog · 2023-02-08T22:50:25Z

@Jongy - so safe to say the recommended fix now is to use python 3.10+? If so, let's close this PR.

Jongy · 2023-02-08T23:01:48Z

@Jongy - so safe to say the recommended fix now is to use python 3.10+? If so, let's close this PR.

Yes. It will cover the SCons case (that is, the subprocess usage of SCons).

I'll close the PR, I see no need to introduce the extra complexity, given that people can just upgrade to 3.10

mwichmann · 2023-02-08T23:13:43Z

I'm generally fine with this resolution, too, but... it's not true that everyone can just upgrade to 3.10+. We keep hearing from people who can't "because" (usually an IT/Security thing restricting from using a Python not provided by the Linux distribution in question). We also need to get off using os.spawnve on Windows, but that's got its own trackers...

bdbaddog · 2023-02-09T00:35:13Z

@Jongy - One question for you. On the system(s) where the spawner was making a notable difference, how much RAM did the system(s) have?

Jongy · 2023-02-09T00:42:42Z

@Jongy - One question for you. On the system(s) where the spawner was making a notable difference, how much RAM did the system(s) have?

16GB. However the physical RAM size doesn't matter, what matters is the amount of pages in the pagetables in the app - which is the RSS of the app divided by page size. Using hugepages, for examples, will very likely reduce the slowdown drastically. RSS I described here.

Jongy added 3 commits June 25, 2020 02:10

Use plain pickle for IPC, without leading message sizes

55855b0

pickle.load reads 1 pickle message from the buffer object and returns, so theres' no need to precede the message with size.

Add --process-spawner to control it

758ba3d

Jongy commented Jun 28, 2020

View reviewed changes

Jongy changed the title ~~Use spawner processes to speed up spawning~~ WIP: Use spawner processes to speed up spawning Jun 29, 2020

Jongy added 3 commits June 30, 2020 23:31

Rename ambiguous variable

006a37f

Show the --process-spawner option only on POSIX

8e5ad83

Pipe back exceptions from spawners

1621751

Jongy commented Jul 3, 2020

View reviewed changes

SCons/Job.py

@@ -264,6 +268,37 @@ def run(self):

self.resultsQueue.put((task, ok))

class Spawner:

Copy link

Author

Jongy Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move to posix.py

bdbaddog changed the title ~~WIP: Use spawner processes to speed up spawning~~ [WIP] Use spawner processes to speed up spawning Jul 3, 2020

Jongy closed this Feb 8, 2023

Jongy deleted the process-spawner branch February 8, 2023 23:01

Jongy restored the process-spawner branch February 8, 2023 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Use spawner processes to speed up spawning #3719

[WIP] Use spawner processes to speed up spawning #3719

Jongy commented Jun 28, 2020 •

edited

Loading

Jongy Jun 28, 2020

Jongy Jun 28, 2020

bdbaddog commented Jun 29, 2020

Jongy commented Jun 29, 2020

Jongy commented Jul 3, 2020 •

edited

Loading

Jongy commented Jul 3, 2020

Jongy Jul 3, 2020

bdbaddog commented Jul 11, 2020

Jongy commented Jul 11, 2020

bdbaddog commented Jul 11, 2020

Jongy commented Jul 11, 2020

Jongy commented Jul 20, 2020

acmorrow commented Jul 20, 2020

Jongy commented Jul 20, 2020 •

edited

Loading

Jongy commented Jul 23, 2020

bdbaddog commented Jul 23, 2020

Jongy commented Jul 24, 2020

grossag commented Feb 8, 2023

bdbaddog commented Feb 8, 2023

Jongy commented Feb 8, 2023

bdbaddog commented Feb 8, 2023

Jongy commented Feb 8, 2023

mwichmann commented Feb 8, 2023

bdbaddog commented Feb 9, 2023

Jongy commented Feb 9, 2023

		@@ -264,6 +268,37 @@ def run(self):

		self.resultsQueue.put((task, ok))

		class Spawner:

[WIP] Use spawner processes to speed up spawning #3719

[WIP] Use spawner processes to speed up spawning #3719

Conversation

Jongy commented Jun 28, 2020 • edited Loading

Tasks:

Contributor Checklist:

Jongy Jun 28, 2020

Choose a reason for hiding this comment

Jongy Jun 28, 2020

Choose a reason for hiding this comment

bdbaddog commented Jun 29, 2020

Jongy commented Jun 29, 2020

Jongy commented Jul 3, 2020 • edited Loading

Jongy commented Jul 3, 2020

Jongy Jul 3, 2020

Choose a reason for hiding this comment

bdbaddog commented Jul 11, 2020

Jongy commented Jul 11, 2020

bdbaddog commented Jul 11, 2020

Jongy commented Jul 11, 2020

Jongy commented Jul 20, 2020

acmorrow commented Jul 20, 2020

Jongy commented Jul 20, 2020 • edited Loading

Jongy commented Jul 23, 2020

bdbaddog commented Jul 23, 2020

Jongy commented Jul 24, 2020

grossag commented Feb 8, 2023

bdbaddog commented Feb 8, 2023

Jongy commented Feb 8, 2023

bdbaddog commented Feb 8, 2023

Jongy commented Feb 8, 2023

mwichmann commented Feb 8, 2023

bdbaddog commented Feb 9, 2023

Jongy commented Feb 9, 2023

Jongy commented Jun 28, 2020 •

edited

Loading

Jongy commented Jul 3, 2020 •

edited

Loading

Jongy commented Jul 20, 2020 •

edited

Loading