You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -41,3 +41,43 @@ But with the rules around ephemeral jobs - they are only done if a downstream 'n
41
41
they can invalidate downstreams - it's a fairly gnarly state machine.
42
42
43
43
44
+
# Job names (job_ids)
45
+
46
+
All our jobs have a unique name, which is how the ppg2 keeps track of them.
47
+
For the file related jobs they are (relative) paths (The constructors usually can take a pathlib.Path as well,
48
+
but the job_id is always a string). For jobs with multiple file outputs, the job_id is the sorted list of files,
49
+
concatenated with ':::'.
50
+
51
+
# Jobs vs files
52
+
A job may produce multiple files, and dependant jobs may depend on only a subset of them (using job_obj.depends_on(filename).
53
+
This is all handled behind the scenes.
54
+
55
+
# Comparing 'outputs'
56
+
Depending on the job type, we store more than a simple hash.
57
+
For example for file generating jobs, we store the file size and modification time as well.
58
+
This allows us to not calculate the hash every time.
59
+
(We do not defend against modifications of files outside ppg runs that preserve these two meta datums).
60
+
61
+
62
+
## Process management
63
+
64
+
Modern systems have many cores.
65
+
Python comes from 1992 when the number of cores was 1.
66
+
Accordingly, python has a 'global interpreter lock' that effectively limits the concurrency of python programs, with the exception of C-extensions, to only one core.
67
+
68
+
Pypipegraph2 circumvents these limitations in two ways:
69
+
70
+
1. Jobs changing the ppg2 process are run in multiple threads, and things like hashing files happens in a C extension.
71
+
72
+
2. Jobs that are supposed to be isolated from the ppg2 process (e.g. all *FileGeneratingJobs) happen in a fork of the process.
73
+
74
+
The advantage of the fork is that the child process inherits
75
+
all loaded python objects for free, and effectively isolates against
76
+
all kinds of crashes.
77
+
78
+
The disadvantage of the fork is all the trouble of safely forking in the first place - forks only retain the main thread, file handles are trouble some, locks across forks spell inexplicable hang ups etc
79
+
80
+
It also effectively prevents any ppg2 from ever running on windows.
81
+
82
+
We also do our own process reaping - parallel to the main ppg2 process, there's a watcher spawned that makes sure that on shutdown (think abort), all children spawned by any of the forked processes are terminated as well
Unfortunately, the way ppg2 is running your computation
8
+
does put on a light burden of prohibited actions on the user.
9
+
10
+
Blame the POSIX standard.
11
+
12
+
## Changing the cwd
13
+
14
+
You must not change the current working directory in jobs that run inside the
15
+
ppg2 process (e.g. DataLoading, CachedDataLoading's load job, JobGeneratingJo).
16
+
17
+
This is because these run multi-threaded.
18
+
19
+
There is detection for this in ppg2, but at that point the cat's already out of
20
+
the bag.
21
+
22
+
Note that changing the cwd within a forked job (*FileGenerating) is fine.
23
+
24
+
See [process management](../#process-management) for more details.
25
+
26
+
## Holding a lock across forks
27
+
28
+
The forking nature of ppg2 means that in-process jobs (e.g. DataLoading,
29
+
CachedDataLoading's load job, JobGeneratingJob) must be ready to be forked at
30
+
any moment.
31
+
32
+
That means they must not hold any locks.
33
+
34
+
That applies to logging - if you call the python logging functions from within a
35
+
DataLoadingJob, you are liable for hanging forked processes.
36
+
(Which is a bit of a shame since the stdout of DataLoadingProcesses is shared with all other in-process jobs, so you can't just print to stdout either. PR welcome).
Copy file name to clipboardExpand all lines: docs/content/docs/faq/_index.md
+29Lines changed: 29 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,4 +22,33 @@ for ii, element in enumerate(whatever):
22
22
23
23
to get the correct element.
24
24
25
+
## I'm experiencing the weirdest hangs
26
+
27
+
Your jobs are not using cpu time, but not returning either?
28
+
29
+
Chances are you have a lock that's stuck across the fork all FileGeneratingJobs perform.
30
+
31
+
See the [lock](../concepts/forbidden/#holding-a-lock-across-forks) section of the forbidden actions page.
32
+
33
+
34
+
## My pipegraph evaluation fails with an internal error.
35
+
36
+
You see something like this:
37
+
38
+
``` Internal error. Something in the pipegraph2 engine is wrong. Graph execution aborted.```
39
+
40
+
This is a bug in pypipegraph. Please report it on our [github issue tracker](https://github.com/TyberiusPrime/pypipegraph2/issues).
41
+
42
+
Background: The [ephemeral jobs](../concepts/#job-types) push the complexity of deciding wether a job needs to be done from something fairly trivial into a nightmarish complexity. It's not yet perfect.
43
+
44
+
And the bugs always happen when you have a few ten-thousand nodes in the graph - but every single one of them has boiled down to a small example.
45
+
46
+
If this happens, there are other options besides
47
+
sending you as a complete snapshot of your project.
48
+
(E.g. `graph.dump_subgraph_for_debug` and `graph.dump_subgraph_for_debug_at_ run`)
49
+
50
+
Contact the authors, and we will walk you threw them.
51
+
52
+
In the meantime, you can often get the ppg2 unstuck
0 commit comments