Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate the stubprocess.py wrapper from Parts #3703

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

dirkbaechle
Copy link
Contributor

With this PR we integrate the stubprocess.py wrapper from Parts to our project. It's made available as an experimental feature (for now) via the '--stubprocess-wrapper' command line option.

I considered to simply always activate the wrapper, but the following reasons speak against it:

  1. It can slow down builds with low memory consumption and/or low number of files. See also the 'man page' entry for this feature in 'doc/man/scons.xml'.
  2. Python itself is now adding support for vfork starting in v3.7.2, but only for 'linux' platforms at the moment. However, a few Python releases from now this wrapper might be superfluous.
  3. I had to patch the original version a lot to make it work under Python3, but I couldn't test my changes under OSx.
  4. So far, Python3 has changed the interface for the subprocess._execute_child() call several times, and the wrapper should handle all of those at the moment. But as soon as Python changes again, users' builds would break immediately and no work-around (except patching the stubprocess wrapper) would be available.

I ran the wrapper against several 'artificial' builds (created with a Perl script by E. Melski) with the following results (all in parallel with '-j 4'):

Small (approx 6000 files)
-------------------------

normal:    1m4.336s
wrapper:   1m38.203s

Large (approx 78000 files)
--------------------------

normal:   14m33.155s
wrapper:  14m6.347s

Very large (approx 400000 files)
--------------------------------

normal:  406m9.926s
wrapper: 111m6.048s

There are no additional test cases provided with this PR, running the whole testsuite without the '--stubprocess-wrapper' option didn't show any additional fails on my machine. I also verified that the wrapper works fine in "interactive" mode.

For this current implementation of the wrapper, a Python3 floor version of 3.4 is assumed.

After this PR gets integrated, I plan to do a little report on the scaling of the wrapper over the numbers of cores and the number of files to build. I will then work my results into the 'WhySConsIsNotSlow' Wiki page and update it accordingly.

Contributor Checklist:

  • I have created a new test or updated the unit tests to cover the new/changed functionality.
  • I have updated CHANGES.txt (and read the README.rst)
  • I have updated the appropriate documentation

This is the original version of the 'stubprocess.py' wrapper, taken from the
git repo of the Parts project (https://bitbucket.org/sconsparts/parts.git),
commit 4f2a0e97e24b410d098df57ddc4247ebfb2aa636 (v0.15.4).
Modified the original version slightly by creating a
shadow version of subprocess.Popen that gets wrapped instead.
Also added the license.txt (MIT) from the Parts repo and a short
header explaining why the made changes are required.
@Jongy
Copy link

Jongy commented Jun 26, 2020

Hey @dirkbaechle, I posted a question to scons-users regarding SCons process spawning performance, and was referred to your PR (see https://pairlist4.pair.net/pipermail/scons-users/2020-June/008287.html).

I also wrote a PoC to tackle this issue: it's not as feature complete as Parts, but as far as I can tell, most of these features aren't used by SCons anyway (in exec_subprocess() and exec_popen3()). My current PoC replaces exec_subprocess which is the simplest case - only args and environment - and makes it send send "spawn requests" to a per-Worker spawner process. This alleviates most of the fork() and execve() memory mappings handling, and remains fairly cheap regarding the added syscalls.

I'd like to test it with the same benchmark, could you please publish the generator script here?

Comment on lines +23 to +28
- Added a new command-line option "--stubprocess-wrapper", which will try to wrap the
subprocess.Popen() call when activated. Enabling this option can give a significant speedup
for the build of large projects that consume a lot of memory and create many files.
Small builds however may experience a slow-down, which is why we declared this option
an "experimental feature" for the moment. Please also refer to the corresponding entry
in the MAN page for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that it is specific to Mac and Linux.

Comment on lines +1677 to +1678
<para>Tries to activate the <filename>stubprocess.py</filename> wrapper from the <emphasis>Parts</emphasis>
project for speeding up builds. This wrapper uses pickling of the Actions' arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this confusing because it implied that Parts is also needed. If this truly is now part of SCons, I personally think that it would be more easily understandable if you either (1) dropped mention of the Parts project or (2) mentioned that it does not require Parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "a contribution from the Parts project"

@bdbaddog
Copy link
Contributor

Is there no way to create a test(s) for this?
Maybe repurpose some general tests to run with stubprocess?

@dirkbaechle
Copy link
Contributor Author

@bdbaddog Hmm, regarding the test: what would be the list of your acceptance criteria, i.e. what exactly would you like to test? (Sorry, I'm doing too much work with requirements in my day job at the moment. ;) )

@dirkbaechle
Copy link
Contributor Author

@Jongy Please check out https://bitbucket.org/dirkbaechle/scons_testresults/src/default/scons230_vs_make/ where I uploaded the script scons_test.pl a few minutes ago. I will also make the script available in a new version of my "scons_testsuite" repo (currently migrating from bitbucket to github), but this will take a while.

@Jongy
Copy link

Jongy commented Jun 27, 2020

Thanks @dirkbaechle. I used your script to generate an environment this way:

# perl scons_test.pl -f 4000
Number of levels = 2
Number of dirs / level = 3
Number of source files / dir = 4000
Number of lookups / source file = 2
Number of compiles grouped      = 20
Expecting:
    directories:   12
    source files:  16,000
    include files: 16,004
    makefiles:     4 (1 per directory)
    total files:   32,011
    look-ups: >=   1,600
    rules: >=      840
When the build runs, 16,000 object files & 800 executable(s)
will be created, for a total of 48,811 files.

And these are the results on my Linux box, in a python:latest Docker (Python 3.8.3), running scons -j 8:

master @ e4fdc10cad4060c740763148699131be2dae3b96
real    3m6.849s
user    4m43.750s
sys     4m19.206s

process-spawner @ 9b411285fb0535376edcfc86a3cbc945c8da9b3e
real    1m40.905s
user    5m13.892s
sys     1m56.040s

stubprocess-wrapper @ 7c598af9a5b1a0ef4be618b9b56a5fb9a5ee9679
real	18m25.090s
user	48m20.900s
sys	89m10.878s

I ran with perf to identify the reasons for the unexpectedly long execution time for stubprocess: the close syscall is high in the graph, and so is seccomp filtering on syscalls (Docker adds the this seccomp overhead, per syscall).

Why so many closes? SC_OPEN_MAX appears to be 1048576, and os.closerange on Linux just loops over fds and issues a million syscalls per subprocess.Popen... (while the C-side of subprocess.Popen has smarter logic for closing fds - it operates on open fds only, according to /proc/pid/fd.)

I then capped MAXFD at 1024 (sounds a fair default):

--- a/SCons/Platform/stubprocess.py
+++ b/SCons/Platform/stubprocess.py
@@ -87,7 +87,7 @@ if sys.platform in ('linux', 'darwin'):
 
     if __name__ == '__main__':
         try:
-            MAXFD = os.sysconf("SC_OPEN_MAX")
+            MAXFD = 1024
         except Exception:
             MAXFD = 256

and tried again (this time with perl scons_test.pl -f 2000):

master
real	1m22.913s
user	2m23.997s
sys	1m51.569s

process-spawner
real	0m48.841s
user	2m37.460s
sys	0m59.696s

stubprocess-wrapper
real	1m25.625s
user	7m11.600s
sys	1m54.429s

Given those results, I think the approach I presented in process-spawner should be taken into consideration, either as a substitute to this PR, or as a complementary, third option (so we have standard stubprocess.Popen, using Parts and using the process spawner)

@dirkbaechle
Copy link
Contributor Author

@Jongy If you have a solution that is even faster than the stubprocess.py, I'd say go for it. I'm personally not stuck to stubprocess.py...I simply took what the Parts project had to offer. One of my main goals wasn't to optimize the wrapper itself, but simply to make it work under SCons while modifying it as little as possible. Like this, it would be easier to keep both of those versions in synch, contributing changes and bugfixes back and forth.

What I would be interested in, are times of your wrapper with a lower number of files (6000 or even down to 4000 in total). Is it fast enough, such that we get faster build times even for small projects, or at least the same times roughly? Because then we could activate your wrapper as default. Just my 2cents.

@Jongy
Copy link

Jongy commented Jun 28, 2020

Cool. Later today I'll post those additional results you've requested, and open a new pull request with my branch. I think that also for projects with very small number of files (e.g even < 100) the process spawning approach will be cheaper.

@Jongy
Copy link

Jongy commented Jun 28, 2020

Perhaps it can be enabled by default... I made it configurable since I think that new, experimental features like this one shouldn't be enabled by default when first added, and definitely should be disable-able.

Following are the results for perl /code/scons_test.pl -f 1000. We can see that without --process-spawner the changeset didn't affect the times, but with --process-spawner we see some improvements.

The real is less relevant here, This test build doesn't run heavy compilations so CPU remains partly idle, and with --process-wrapper SCons becomes more concurrent and uses multiple cores more efficiently. What's more interesting is the reduction in total user + sys time, which means SCons requires less CPU time in total.

master @ ea848745d7025d8f6885dfbeaaa92731ddd711bd (I based my branch on this commit)
real	0m35.622s
user	1m13.946s
sys	0m40.853s

proces-spawner @ 758ba3d1e0a335c72e5489ec697a1a9af548fd90 (**without** --process-spawner)
real	0m35.371s
user	1m13.873s
sys	0m40.570s

proces-spawner @ 758ba3d1e0a335c72e5489ec697a1a9af548fd90 (**with** --process-spawner)
real	0m23.134s
user	1m18.697s
sys	0m28.367s

I'll submit a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
4.2.0 Release
  
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

4 participants