Skip to content

Commit 9eff6c3

Browse files
committed
Fleshing out the documentation
1 parent 41eb18c commit 9eff6c3

File tree

13 files changed

+380
-3
lines changed

13 files changed

+380
-3
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,4 @@ examples/simplest/
7878
tests/run
7979
tests/**/run
8080
*shu*
81+
examples/run

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,28 @@
77
![Nix flake](https://badgen.net/badge/Nix%20flake/available/blue?icon=docs)
88
](https://github.com/TyberiusPrime/pypipegraph2/blob/main/flake.nix)
99

10+
<svg width="93.8" height="20" viewBox="0 0 938 200" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="docs: available">
11+
<title>docs: available</title>
12+
<linearGradient id="FaUuN" x2="0" y2="100%">
13+
<stop offset="0" stop-opacity=".1" stop-color="#EEE"/>
14+
<stop offset="1" stop-opacity=".1"/>
15+
</linearGradient>
16+
<mask id="MaZyA"><rect width="938" height="200" rx="30" fill="#FFF"/></mask>
17+
<g mask="url(#MaZyA)">
18+
<rect width="350" height="200" fill="#555"/>
19+
<rect width="588" height="200" fill="#08C" x="350"/>
20+
<rect width="938" height="200" fill="url(#FaUuN)"/>
21+
</g>
22+
<g aria-hidden="true" fill="#fff" text-anchor="start" font-family="Verdana,DejaVu Sans,sans-serif" font-size="110">
23+
<text x="60" y="148" textLength="250" fill="#000" opacity="0.25">docs</text>
24+
<text x="50" y="138" textLength="250">docs</text>
25+
<text x="405" y="148" textLength="488" fill="#000" opacity="0.25">available</text>
26+
<text x="395" y="138" textLength="488">available</text>
27+
</g>
28+
29+
</svg>
30+
[docs](https://tyberiusprime.github.io/pypipegraph2/)
31+
1032
Fine-grained tracking of what goes into generated artifacts,
1133
and when it's necessary to recalculate them.
1234

docs/content/_index.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,18 @@ If it changes, the job [invalidates](concepts#invalidation) it's downstreams.
2121
{{< /columns >}}
2222

2323
# What is tracked?
24+
{{< mermaid class="optional" >}}
25+
graph TD;
26+
Input_Files-->Intermediary_and_Temporary_Files;
27+
Intermediary_and_Temporary_Files-->Output_files;
28+
Python_Functions-->Output_files;
29+
Python_Functions-->Intermediary_and_Temporary_Files;
30+
Parameters-->Output_files;
31+
Parameters-->Intermediary_and_Temporary_Files;
32+
Intermediary_and_Temporary_Files-->Output_files;
33+
{{< /mermaid >}}
34+
35+
2436

2537
* input files (d'oh),
2638
* intermediary files,

docs/content/docs/concepts/_index.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ title: "Concepts"
55
---
66
# Pypipegraph2 concepts
77

8-
## What is modeled
8+
## The directed acyclic graph
99

1010
The pypipegraph2 models 'Jobs', which are named, distinct units of computation that form
1111
[direct acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG).
@@ -41,3 +41,43 @@ But with the rules around ephemeral jobs - they are only done if a downstream 'n
4141
they can invalidate downstreams - it's a fairly gnarly state machine.
4242

4343

44+
# Job names (job_ids)
45+
46+
All our jobs have a unique name, which is how the ppg2 keeps track of them.
47+
For the file related jobs they are (relative) paths (The constructors usually can take a pathlib.Path as well,
48+
but the job_id is always a string). For jobs with multiple file outputs, the job_id is the sorted list of files,
49+
concatenated with ':::'.
50+
51+
# Jobs vs files
52+
A job may produce multiple files, and dependant jobs may depend on only a subset of them (using job_obj.depends_on(filename).
53+
This is all handled behind the scenes.
54+
55+
# Comparing 'outputs'
56+
Depending on the job type, we store more than a simple hash.
57+
For example for file generating jobs, we store the file size and modification time as well.
58+
This allows us to not calculate the hash every time.
59+
(We do not defend against modifications of files outside ppg runs that preserve these two meta datums).
60+
61+
62+
## Process management
63+
64+
Modern systems have many cores.
65+
Python comes from 1992 when the number of cores was 1.
66+
Accordingly, python has a 'global interpreter lock' that effectively limits the concurrency of python programs, with the exception of C-extensions, to only one core.
67+
68+
Pypipegraph2 circumvents these limitations in two ways:
69+
70+
1. Jobs changing the ppg2 process are run in multiple threads, and things like hashing files happens in a C extension.
71+
72+
2. Jobs that are supposed to be isolated from the ppg2 process (e.g. all *FileGeneratingJobs) happen in a fork of the process.
73+
74+
The advantage of the fork is that the child process inherits
75+
all loaded python objects for free, and effectively isolates against
76+
all kinds of crashes.
77+
78+
The disadvantage of the fork is all the trouble of safely forking in the first place - forks only retain the main thread, file handles are trouble some, locks across forks spell inexplicable hang ups etc
79+
80+
It also effectively prevents any ppg2 from ever running on windows.
81+
82+
We also do our own process reaping - parallel to the main ppg2 process, there's a watcher spawned that makes sure that on shutdown (think abort), all children spawned by any of the forked processes are terminated as well
83+
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
+++
2+
title= "Forbidden actions"
3+
+++
4+
5+
# Forbidden actions
6+
7+
Unfortunately, the way ppg2 is running your computation
8+
does put on a light burden of prohibited actions on the user.
9+
10+
Blame the POSIX standard.
11+
12+
## Changing the cwd
13+
14+
You must not change the current working directory in jobs that run inside the
15+
ppg2 process (e.g. DataLoading, CachedDataLoading's load job, JobGeneratingJo).
16+
17+
This is because these run multi-threaded.
18+
19+
There is detection for this in ppg2, but at that point the cat's already out of
20+
the bag.
21+
22+
Note that changing the cwd within a forked job (*FileGenerating) is fine.
23+
24+
See [process management](../#process-management) for more details.
25+
26+
## Holding a lock across forks
27+
28+
The forking nature of ppg2 means that in-process jobs (e.g. DataLoading,
29+
CachedDataLoading's load job, JobGeneratingJob) must be ready to be forked at
30+
any moment.
31+
32+
That means they must not hold any locks.
33+
34+
That applies to logging - if you call the python logging functions from within a
35+
DataLoadingJob, you are liable for hanging forked processes.
36+
(Which is a bit of a shame since the stdout of DataLoadingProcesses is shared with all other in-process jobs, so you can't just print to stdout either. PR welcome).

docs/content/docs/faq/_index.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,33 @@ for ii, element in enumerate(whatever):
2222

2323
to get the correct element.
2424

25+
## I'm experiencing the weirdest hangs
26+
27+
Your jobs are not using cpu time, but not returning either?
28+
29+
Chances are you have a lock that's stuck across the fork all FileGeneratingJobs perform.
30+
31+
See the [lock](../concepts/forbidden/#holding-a-lock-across-forks) section of the forbidden actions page.
32+
33+
34+
## My pipegraph evaluation fails with an internal error.
35+
36+
You see something like this:
37+
38+
``` Internal error. Something in the pipegraph2 engine is wrong. Graph execution aborted.```
39+
40+
This is a bug in pypipegraph. Please report it on our [github issue tracker](https://github.com/TyberiusPrime/pypipegraph2/issues).
41+
42+
Background: The [ephemeral jobs](../concepts/#job-types) push the complexity of deciding wether a job needs to be done from something fairly trivial into a nightmarish complexity. It's not yet perfect.
43+
44+
And the bugs always happen when you have a few ten-thousand nodes in the graph - but every single one of them has boiled down to a small example.
45+
46+
If this happens, there are other options besides
47+
sending you as a complete snapshot of your project.
48+
(E.g. `graph.dump_subgraph_for_debug` and `graph.dump_subgraph_for_debug_at_ run`)
49+
50+
Contact the authors, and we will walk you threw them.
51+
52+
In the meantime, you can often get the ppg2 unstuck
53+
by deleting the right output files.
2554

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
+++
2+
title= "Logging"
3+
+++
4+
5+
# Console
6+
7+
The console log is (by default) very brief.
8+
9+
It outputs when a job's input history changed
10+
(so you know why a job was rerun).
11+
12+
It outputs a short stack trace when jobs fail,
13+
and the name of the error log file (see below)
14+
which contains more details.
15+
16+
And it shows a counter for
17+
18+
* T: Total jobs
19+
* D: Done jobs
20+
* R: Running jobs
21+
* W: Waiting jobs
22+
* F: Failed jobs
23+
24+
# Log file(s)
25+
26+
The log file (.ppg/logs/latest.messages) contains more info.
27+
28+
Compared to the console log it omits the counters,
29+
but logs every job start/stop (together with it's runtime).
30+
31+
It's messages contain references into the ppg2 source, these
32+
are abstracted away into latest.lookup to denoise the log.
33+
34+
You can increase the amount of logging by passing a lower
35+
low level to [ppg2.new()](../managing-pypipegraph#pypipegraph2new)
36+
37+
# Error log files
38+
39+
For each failed job, an error log in .ppg/errors/latest/<job_number>_exception.txt.
40+
41+
It contains the full stack trace (including locals!), the message of the exception, and - if available - stdout and stderr.
42+
43+
44+
is created.
45+
46+
47+

0 commit comments

Comments
 (0)