Dead simple utility for spawning graph of processes connected by streams in UNIX system. Each process is being equiped with specified UNIX pipes and sockets at specified file descriptors. Whole graph description is contained in a YAML file.
pgspawn
is awesome! Why? Here are some arguments for it:
- It follows the UNIX philosophy.
- It's simple and understandable.
- It uses standard syntax - YAML is well known and pretty.
- It uses standard pipe semantics - UNIX pipe is battle-tested.
- It's language agnostic.
pgspawn
doesn't care about language - it spawns just processes. - It's efficient. After spawning phase all work is done by OS.
Package is available from pypi
pip install pgspawn
Check the examples/
directory.
As input pgspawn
takes YAML file with graph description in it.
A very simple, single-node graph can be as follows:
$ cat examples/id.yml
nodes:
- command: [cat]
$ echo abc | pgspawn examples/id.yml
abc
It spawns cat
program and doesn't do anything about file descriptors,
so child process inherits standard fds (probably stdin, stdout, stderr).
We can do more complex. Lets write yes
program counterpart:
$ cat examples/yes.yml
nodes:
- command: [cat]
outputs:
1: feedback
- command: [tee, /proc/self/fd/3]
inputs:
0: feedback
outputs:
3: feedback
$ echo y | pgspawn examples/yes.yml
y
y
y
...
What it does is create pipe (named internally feedback
) and use it to
feed output into input. Section outputs: {1: feedback}
describes that
file descriptor 1 used by cat
(it's stdout) is fed into our pipe.
Section inputs: {0: feedback}
denotes that fd 0 of tee
program is
read from feedback
pipe.
Graph drawn with explicitly connected stdin and stdout:
YAML allows for #
-comments so they mix well together. Take a look at examples/executable
:
#!/usr/bin/env pgspawn
nodes:
- command: [echo, 'hashbang!']
And I can do this:
$ examples/executable
hashbang!
It's possible to use parent's program fds in inputs
and outputs
descriptions.
Just give them names and roll:
$ cat examples/swap.yml
outputs:
stdout: 1
stderr: 2
nodes:
- command: [bash, -c, echo "I'm stdout"; echo "I'm stderr" >&2;]
outputs:
1: stderr
2: stdout
$ pgspawn examples/swap.yml > /dev/null
I'm stdout
$ pgspawn examples/swap.yml 2> /dev/null
I'm stderr
Similar you can do with inputs
(see examples/id_explicite.yml
).
More complicated example is shown in examples/server.yml
.
It's a TCP chat with expression evaluation.
Simple example with use of socket-connected processes is shown in examples/socket.yml
.
For documentation purposes there is pg2dot
program that converts YAML
description of graph into DOT file
supported by graphviz.
To generate image run something like:
cat examples/yes_explicite_full.yml | pg2dot | dot -T png -o graph.png
pgspawn
allows to make multiple programs write or read from single fd. If you do this you better be aware of what to expect.
When multiple programs are writing into single pipe content gets interlaced, but there are rules. One can enforce write atomicity by writing small enough chunks. POSIX defines that size (PIPE_BUF) to be at least 512 bytes. (Yeah, citation needed.)
For concurrent reads matter is worse. There are some ways to make atomic read but all of them (no proof for that) rely on implementation, not on standard. There is hopeful chapter in libc manual but man 3 read
tells:
The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.
Of course socket can be modelled as a pair of unidirectional pipes. Such pipes will be connected at two fds in child process. Some programs may not support that and expect single fd. pgspawn
can create pair of connected, anonymous UNIX domain sockets (like socketpair()
from <sys/socket.h>
) and pass them to child processes. Such a connection can be shared only between two processes unlike unidirectional pipe that can be used by many more processes.
They are contained in test.sh
file and exmaples directories.
Running is not so standard. I do it like this:
python setup.py develop --user
./test.sh