Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor Refactor #286

Closed
mpeck opened this issue Feb 23, 2016 · 6 comments
Closed

Executor Refactor #286

mpeck opened this issue Feb 23, 2016 · 6 comments
Assignees
Milestone

Comments

@mpeck
Copy link
Contributor

mpeck commented Feb 23, 2016

The Great Executor Refactor

The executor is in need of a refactor. Currently everything is pretty much crammed into a single module which makes it difficult to maintain. In order to improve maintainability and open up the possibility for future improvements we have some suggestions for a refactor.

Proposal - move templating out of the executor

Templating should be done independently of the executor. Specifically we are thinking about moving templating up to the initializer level. The initializer is a named process that starts executor processes. If we move templating up to this level, and perhaps rename the initializer to something more appropriate(pipeline_coordinator?), it will not only make things a bit cleaner, it gives us a place to more clearly implement things like error templating. It also makes it easier to eventually introduce parallel command execution and recursive alias expansion, because the executor can simple return json.

Proposal - move redirects out of the executor

In the same vein as the templating move. By moving redirects out of the we create a further separation of responsibilities for command execution.

Proposal - break the executor into two parts

The real meat of the executor exists in the prepare_or_finish function. Over time the function has grown and become difficult to follow. A major issue revolves around the fact that we iterate over multiple lists. At the top level we have a list of invocations, the pipeline. We iterate over the list of invocations executing each command in turn. Dependent on the type of command, they can be executed once or multiple times. And we further iterate over command input and output lists based on that execution type.

We can simplify this by breaking the executor into two distinct components: The pipeline executor (iterates over the invocation list), and the command executor (managers how commands are executed). The pipeline executor's main purpose is to maintain the top level list of invocations, iterate over them and spin up a new command executor to run each invocation. The command executor is responsible for identifying the type of command to be executed and properly executing said command.

This would allow us to do some interesting things with command execution, namely things like parallel execution. It greatly simplifies the whole input output buffer bit. And it also allows us to do actual recursive expansion of aliases. Currently we just expand the alias and prepend the resulting pipeline to the invocation list. Using this method we can just spin up another pipeline executor as a child of the command executor. When the pipeline executor returns json, we just continue processing as normal. We can easily track how deep we go down the alias expansion rabbit hole and add configurable limits. Which should protect us from things like cyclic references.

I anticipate this issue spawning multiple additional issues. Also any discussion or feedback is welcome.

@kevsmith
Copy link
Member

There are a couple of things that would help me consider these changes:

  1. Description of the desired end state wrt responsibilities of each component
  2. Rough sequencing of tasks to get us from where we are now to where we want to be.

@mpeck
Copy link
Contributor Author

mpeck commented Feb 23, 2016

I envision things being broken out as so:

  • The pipeline_executor - responsible for parsing the pipeline and iterating over invocations
  • The command_executor - responsible for validating and executing the command
  • The pipeline_coordinator - named process responsible for spinning up pipeline_executor processes and calling out for any post-processing, for example templating or redirection, after the pipeline has been executed.
  • templating - responsible for templating
  • redirection - responsible for output redirection
  • Probably some other components that I'm not thinking about.

Terms

For clarity I'll define some terms:

  • command string - What the user actually types into their chat client to invoke cog. For example: echo foo
  • pipeline - The parsed command string.
  • invocation - An individual command invocation. A list of invocations make up the pipeline.
  • execution pipeline - The process by which a command or series of commands is executed by cog.

Example flow

Ultimately a full run through the execution pipeline might look something like this.

  1. The pipeline_coordinator receives a message from an adaper.
  2. The pipeline_coordinator spins up a new pipeline_executor process and passes it the payload.
  3. The pipeline_executor parses the command string and stores the resulting pipeline in it's internal state.
  4. The pipeline_executor pulls the first invocation off of the pipeline and spins up a new command_executor passing it the invocation along with whatever addition metadata it needs, ie user info, previous command output.
  5. The command_executor queries the database for the command model and then runs any validations; ie command exists, proper options, user has proper permissions to execute.
  6. The command is executed based on it's type; multi or single, bound or all.
  7. The command_executor loops if necessary based on the command type.
  8. The command_executor returns the command_response to the pipeline_executor and shuts down
  9. The pipeline_executor takes the response and either spins up a new command_executor process for the next invocation (in which case we repeat steps 4 - 8) or returns the command response to the pipeline coordinator and shuts down.
  10. The pipeline_coordinator receives the response from the pipeline_executor and then runs post-processing; templating, redirection, etc.

Advantages

This process provides us with a number of advantages. A few examples:

  • aliases: If an alias is encountered in the command_executor, the command_executor queries the database for the associated command string and then spins up a new pipeline_executor. The alias is then executed just like any other command string. This makes tracking the level of recursion for aliases as simple as passing the level to the pipeline_executor. And thus helps to protect us from cyclic references.
  • parallel execution: As an optimization, if a command is of the multi type, the command_executor could spin up multiple processes to execute the command in parallel. Further more we could even have the pipeline_executor spin up multiple command_executors in parallel if command invocations aren't dependent on one another.

Implementation sequencing

  1. I think the logical place to start is by moving post-processing bits out of the executor and up to the pipeline_coordinator, what we now call the initializer.
    Issue list:
  2. Rework the initializer to send executor responses to the user and rename to pipeline_coordinator
  3. Move templating logic to a separate module and invoke in the initializer
  4. Move redirection logic to it's own template and invoke in the initializer
  5. The next major bit is to actually break the executor into two bits, the pipeline_executor and the command_executor. I'm actually not sure the best way to break this part up. I think that for the most part it's a copy and paste, with the exception of aliases which would need a bit of a rework.
    Issue list:
  6. Break the executor into a pipeline_executor and a command_executor
  7. Optimize

Concerns

For the most part I think this will be a much more flexible and maintainable solution. I am a bit unsure about the extra work we would be putting on the pipeline_coordinator though. It being a static named process, it could be a bit of a bottleneck. We could possible create another dynamic process called a post_processor? Once the executor returns the coordinator could spin up a post_processor process that is responsible for presentation and redirection. Thoughts?

@kevsmith
Copy link
Member

Summing up recent offline conversation between myself, @mpeck, and @christophermaier about how to approach restructuring the pipeline executor.

High Level Summary

  1. The conceptual model of an interpreter fits well with our current and known future needs wrt pipeline execution. Interpreters have state represented by the call stack and call graph. We need to track state analogous to both of these concepts to effectively manage pipeline execution.
  2. Re-structuring the executor as an interpreter allows us to reframe the design challenges in a productive way and suggests potential component factorings.
  3. The biggest challenge/obstacle of working with the executor code today is the deep mingling of concerns within a single module. The current state is roughly analogous to writing a language interpreter but, instead of factoring out responsibilities like memory management and bytecode execution into separate copmonents, these components have been "inlined" in the same shared module. The resulting tangle is difficult to understand and challenging to modify.
  4. The interpreter model is our ultimate goal and one which we should proceed directly towards. That said, the most pragmatic way to achieve it is by first extracting standalone logic, such as template execution and output redirection, into their own modules. We believe this will reveal the core of the executor and simplify recasting it as an interpreter.

Why An Interpreter?

  1. Pipeline execution and alias expansion can be thought of as destructuring and structuring operations of call stacks. By monitoring the call stack depth trends (is it strictly growing?) we can identify one class of looping behavior and implement some protections against non-terminating pipelines.
  2. Alias expansion can be translated to pushing call frames onto the top (front) of the call stack. This gives us an intercept point where we can enforce alias expansion limits to manage resource consumption.
  3. Maintaining a map of pipeline execution history, in the form of a call graph, allows us to detect another class of looping behavior and implement another set of non-terminating pipeline protections.

@mpeck
Copy link
Contributor Author

mpeck commented Mar 9, 2016

reference #282 for executor timeouts

@mpeck
Copy link
Contributor Author

mpeck commented Mar 9, 2016

Reference #229 for anonymous users and safe commands

@mpeck
Copy link
Contributor Author

mpeck commented Mar 9, 2016

Reference #208 for executor timings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants