Workers of the world, disamalgamate. You have nothing to lose but your schemas! #1313

o-smirnov · 2021-02-10T11:24:22Z

o-smirnov
Feb 10, 2021
Maintainer

Related to #1310, but actually a more general discussion.

The R1 architecture is not really amenable to being parallelized or distributed, ever (apart from local and specific improvements inside workers). Part of the problem is the worker concept itself. A the moment, each worker is a massive opaque monolith of logic. Caracal itself has no idea about any given worker's inputs or outputs. They just happen to match up because one worker uses the same naming convention for its outputs that another one uses for its inputs, but this is a helpful "accident of the universe" which often goes wrong, and one that the pipeline manager itself is completely ignorant of.

We can't ever hope to make this work with a task scheduler, because there's no information exposed about what the tasks actually are, or how data flows between them.

For similar reasons, it's been hard to contemplate things like #1310.

And yet looking at the code, I can see there was far more grand and pure intent at the beginning. If had to guess at the historical process, I think workers were first just supposed to assemble and return Stimela recipes, which the pipeline itself would then run. But Stimela v1 wasn't feature-rich enough to support all the niggly twisted logic that a typical worker required, hence they evolved into these massive obelisks of code which just happen to run some Stimela recipes behind the scenes, most of the time. Am I interpreting the history correctly @SpheMakh?

Anyway, I would propose the following architectural changes for R2:

A worker is a processing step with a set of config parameters, and, particularly, with a well-defined set of inputs and outputs
The config file tells the system how to chain workers together, and how to plumb inputs to outputs (with reasonable defaults "auto-connecting")
The simplest kind of worker just returns a Stimela recipe (see Recipes of recipes, and turtles all the way down ratt-ru/Stimela-classic#698), which the pipeline manager can decide how (and where) to run. This is also the best kind of worker ("class A"), since we can expect some magic scheduler in the future to know how to distribute the individual steps of a recipe efficiently.
A more complex worker could have Python logic on top of recipes (or any other code), but still strictly define its inputs and outputs. This is the second-best ("class B") kind of worker, since you can only distribute and run it as an entire unit.
A complex worker can also be a composition (and/or iteration) of individual workers. See Recipes of recipes, and turtles all the way down ratt-ru/Stimela-classic#698 for thoughts on this. Selfcal is a prime candidate. Transform+prep+flag is another ("run this sequence of steps over all targets, and all MSs", see the scope for automatic parallelization here?) This is still a "class A" worker, since the pipeline manager has all the information it needs to distribute and run the constituent steps.
In the interest of a smooth transition, old-style monolithic workers ("class C") must remain supported, but they can only be treated as black boxes with no known inputs or outputs, only documented naming conventions that they expect to follow. So, you can have a pipeline use a mix of glorious new workers and ugly old dinosaur workers, but then the config has to work around the dinosaurs, to make sure the files are named as they expect.
- This suggests a clear transition path from dinosaur to new-style. First, define the worker's inputs and outputs, and expose those definitions to the pipeline. This turns it into a class B worker. Then, start thinking about how to break it up into smaller class A chunks...
No global state allowed. Global state is the crack cocaine of software-inclined astronomers, and inevitably leads to deformities and mental health issues. Confiscate the crack pipe once and for all. For example, the "obsinfo"-derived stuff we currently put in the global pipeline object and (very haphazardly and non-systematically) access from over the place should be an output of an "obsinfo" worker (which can even be merged with the prep worker perhaps?), and any workers that need it must declare it as an input.

Anyway, putting it out there for discussion. Notional "new-style" config examples to follow (in particular, thinking about how to break up crosscal, polcal and selfcal into smaller chunks...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers of the world, disamalgamate. You have nothing to lose but your schemas! #1313

{{title}}

Replies: 0 comments

Select a reply

Workers of the world, disamalgamate. You have nothing to lose but your schemas! #1313

o-smirnov Feb 10, 2021 Maintainer

Replies: 0 comments

o-smirnov
Feb 10, 2021
Maintainer