Executor Refactor #286

mpeck · 2016-02-23T15:53:14Z

The Great Executor Refactor

The executor is in need of a refactor. Currently everything is pretty much crammed into a single module which makes it difficult to maintain. In order to improve maintainability and open up the possibility for future improvements we have some suggestions for a refactor.

Proposal - move templating out of the executor

Templating should be done independently of the executor. Specifically we are thinking about moving templating up to the initializer level. The initializer is a named process that starts executor processes. If we move templating up to this level, and perhaps rename the initializer to something more appropriate(pipeline_coordinator?), it will not only make things a bit cleaner, it gives us a place to more clearly implement things like error templating. It also makes it easier to eventually introduce parallel command execution and recursive alias expansion, because the executor can simple return json.

Proposal - move redirects out of the executor

In the same vein as the templating move. By moving redirects out of the we create a further separation of responsibilities for command execution.

Proposal - break the executor into two parts

The real meat of the executor exists in the prepare_or_finish function. Over time the function has grown and become difficult to follow. A major issue revolves around the fact that we iterate over multiple lists. At the top level we have a list of invocations, the pipeline. We iterate over the list of invocations executing each command in turn. Dependent on the type of command, they can be executed once or multiple times. And we further iterate over command input and output lists based on that execution type.

We can simplify this by breaking the executor into two distinct components: The pipeline executor (iterates over the invocation list), and the command executor (managers how commands are executed). The pipeline executor's main purpose is to maintain the top level list of invocations, iterate over them and spin up a new command executor to run each invocation. The command executor is responsible for identifying the type of command to be executed and properly executing said command.

This would allow us to do some interesting things with command execution, namely things like parallel execution. It greatly simplifies the whole input output buffer bit. And it also allows us to do actual recursive expansion of aliases. Currently we just expand the alias and prepend the resulting pipeline to the invocation list. Using this method we can just spin up another pipeline executor as a child of the command executor. When the pipeline executor returns json, we just continue processing as normal. We can easily track how deep we go down the alias expansion rabbit hole and add configurable limits. Which should protect us from things like cyclic references.

I anticipate this issue spawning multiple additional issues. Also any discussion or feedback is welcome.

The text was updated successfully, but these errors were encountered:

kevsmith · 2016-02-23T16:27:03Z

There are a couple of things that would help me consider these changes:

Description of the desired end state wrt responsibilities of each component
Rough sequencing of tasks to get us from where we are now to where we want to be.

mpeck · 2016-02-23T18:21:04Z

I envision things being broken out as so:

The pipeline_executor - responsible for parsing the pipeline and iterating over invocations
The command_executor - responsible for validating and executing the command
The pipeline_coordinator - named process responsible for spinning up pipeline_executor processes and calling out for any post-processing, for example templating or redirection, after the pipeline has been executed.
templating - responsible for templating
redirection - responsible for output redirection
Probably some other components that I'm not thinking about.

Terms

For clarity I'll define some terms:

command string - What the user actually types into their chat client to invoke cog. For example: echo foo
pipeline - The parsed command string.
invocation - An individual command invocation. A list of invocations make up the pipeline.
execution pipeline - The process by which a command or series of commands is executed by cog.

Example flow

Ultimately a full run through the execution pipeline might look something like this.

The pipeline_coordinator receives a message from an adaper.
The pipeline_coordinator spins up a new pipeline_executor process and passes it the payload.
The pipeline_executor parses the command string and stores the resulting pipeline in it's internal state.
The pipeline_executor pulls the first invocation off of the pipeline and spins up a new command_executor passing it the invocation along with whatever addition metadata it needs, ie user info, previous command output.
The command_executor queries the database for the command model and then runs any validations; ie command exists, proper options, user has proper permissions to execute.
The command is executed based on it's type; multi or single, bound or all.
The command_executor loops if necessary based on the command type.
The command_executor returns the command_response to the pipeline_executor and shuts down
The pipeline_executor takes the response and either spins up a new command_executor process for the next invocation (in which case we repeat steps 4 - 8) or returns the command response to the pipeline coordinator and shuts down.
The pipeline_coordinator receives the response from the pipeline_executor and then runs post-processing; templating, redirection, etc.

Advantages

This process provides us with a number of advantages. A few examples:

aliases: If an alias is encountered in the command_executor, the command_executor queries the database for the associated command string and then spins up a new pipeline_executor. The alias is then executed just like any other command string. This makes tracking the level of recursion for aliases as simple as passing the level to the pipeline_executor. And thus helps to protect us from cyclic references.
parallel execution: As an optimization, if a command is of the multi type, the command_executor could spin up multiple processes to execute the command in parallel. Further more we could even have the pipeline_executor spin up multiple command_executors in parallel if command invocations aren't dependent on one another.

Implementation sequencing

I think the logical place to start is by moving post-processing bits out of the executor and up to the pipeline_coordinator, what we now call the initializer.
Issue list:
Rework the initializer to send executor responses to the user and rename to pipeline_coordinator
Move templating logic to a separate module and invoke in the initializer
Move redirection logic to it's own template and invoke in the initializer
The next major bit is to actually break the executor into two bits, the pipeline_executor and the command_executor. I'm actually not sure the best way to break this part up. I think that for the most part it's a copy and paste, with the exception of aliases which would need a bit of a rework.
Issue list:
Break the executor into a pipeline_executor and a command_executor
Optimize

Concerns

For the most part I think this will be a much more flexible and maintainable solution. I am a bit unsure about the extra work we would be putting on the pipeline_coordinator though. It being a static named process, it could be a bit of a bottleneck. We could possible create another dynamic process called a post_processor? Once the executor returns the coordinator could spin up a post_processor process that is responsible for presentation and redirection. Thoughts?

kevsmith · 2016-02-23T21:01:17Z

Summing up recent offline conversation between myself, @mpeck, and @christophermaier about how to approach restructuring the pipeline executor.

High Level Summary

The conceptual model of an interpreter fits well with our current and known future needs wrt pipeline execution. Interpreters have state represented by the call stack and call graph. We need to track state analogous to both of these concepts to effectively manage pipeline execution.
Re-structuring the executor as an interpreter allows us to reframe the design challenges in a productive way and suggests potential component factorings.
The biggest challenge/obstacle of working with the executor code today is the deep mingling of concerns within a single module. The current state is roughly analogous to writing a language interpreter but, instead of factoring out responsibilities like memory management and bytecode execution into separate copmonents, these components have been "inlined" in the same shared module. The resulting tangle is difficult to understand and challenging to modify.
The interpreter model is our ultimate goal and one which we should proceed directly towards. That said, the most pragmatic way to achieve it is by first extracting standalone logic, such as template execution and output redirection, into their own modules. We believe this will reveal the core of the executor and simplify recasting it as an interpreter.

Why An Interpreter?

Pipeline execution and alias expansion can be thought of as destructuring and structuring operations of call stacks. By monitoring the call stack depth trends (is it strictly growing?) we can identify one class of looping behavior and implement some protections against non-terminating pipelines.
Alias expansion can be translated to pushing call frames onto the top (front) of the call stack. This gives us an intercept point where we can enforce alias expansion limits to manage resource consumption.
Maintaining a map of pipeline execution history, in the form of a call graph, allows us to detect another class of looping behavior and implement another set of non-terminating pipeline protections.

mpeck · 2016-03-09T16:41:11Z

reference #282 for executor timeouts

mpeck · 2016-03-09T16:46:29Z

Reference #229 for anonymous users and safe commands

mpeck · 2016-03-09T16:47:51Z

Reference #208 for executor timings

vanstee mentioned this issue Feb 24, 2016

In a pipeline, the executor fails to handle commands that return empty lists #233

Closed

kevsmith mentioned this issue Feb 24, 2016

Command aliases #290

Closed

mpeck added this to the Cog 0.3.0 milestone Mar 4, 2016

mpeck assigned kevsmith and christophermaier and unassigned kevsmith Mar 4, 2016

lhaskins mentioned this issue Mar 4, 2016

Add templates to the filter command #327

Closed

mpeck mentioned this issue Mar 4, 2016

Template using triple mustache returns extra mustaches #324

Closed

kevsmith added the in-progress label Mar 8, 2016

mpeck mentioned this issue Mar 9, 2016

Unknown users should be able to interact with "safe" commands #229

Open

kevsmith mentioned this issue Mar 11, 2016

Piper updates #340

Closed

christophermaier mentioned this issue Mar 15, 2016

New and improved executor #390

Merged

christophermaier added review and removed in-progress labels Mar 15, 2016

christophermaier closed this as completed in #390 Mar 16, 2016

christophermaier removed the review label Mar 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executor Refactor #286

Executor Refactor #286

mpeck commented Feb 23, 2016

kevsmith commented Feb 23, 2016

mpeck commented Feb 23, 2016

kevsmith commented Feb 23, 2016

mpeck commented Mar 9, 2016

mpeck commented Mar 9, 2016

mpeck commented Mar 9, 2016

Executor Refactor #286

Executor Refactor #286

Comments

mpeck commented Feb 23, 2016

The Great Executor Refactor

Proposal - move templating out of the executor

Proposal - move redirects out of the executor

Proposal - break the executor into two parts

kevsmith commented Feb 23, 2016

mpeck commented Feb 23, 2016

Terms

Example flow

Advantages

Implementation sequencing

Concerns

kevsmith commented Feb 23, 2016

High Level Summary

Why An Interpreter?

mpeck commented Mar 9, 2016

mpeck commented Mar 9, 2016

mpeck commented Mar 9, 2016