Skip to content

Commit

Permalink
Added roadmap for containerization
Browse files Browse the repository at this point in the history
  • Loading branch information
maximilianreimer committed Nov 19, 2021
1 parent 18a6810 commit 8ab4d16
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 0 deletions.
72 changes: 72 additions & 0 deletions dacbench/container/Container Roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Container Roadmap

This document describes how we want to use containerization and what needs to be implemented.

There is also a project in the repo called [Containerization](https://github.com/automl/DACBench/projects/2),
containing more fine-grained tasks and descriptions. This document serve as a overview of the project.

## Purpose / Requirements
We want to use containers, more precisely [Singulariy Container](https://singularity.hpcng.org/), in order to:

1. Make the experiments (more) reproducible: reduce dependency of external tools such as compilers, interpreters and hardware
2. Easier executable: no need to install everything manually just download DACBench, and it will automatically install the container on request
3. Ensure same version of dependencies and DACBench for same experiments: publish container versions for each experiment / publications
4. Enable existent of benchmarks with conflicting dependencies: through separate containers

This includes:

* The benchmarks
* The baselines

Additional requirements are:
* The user should not have to deal with the container directly (except installing the container system)
* No need for `root` to run the container (rules out Docker)
* Low overhead

## Architecture
To fulfill these requirements we adapt the architecture introduced in [HPOBench](https://github.com/automl/HPOBench).

For questions and support ask:
* Philipp Mueller ([email protected])
* Katharina Eggensperger ([email protected])
how kindly offered their help.

The main idea is to run the components that have either complicated dependencies or are crucial to be reproducible in a container together with a server that exposes the objects via http / sockets to the outside and provide a wrapper for the objects that automatically retrieves and starts the relevant container and acts as proxy so that the user does not notice she/he is communicating with a component within a container.

![architecture overview](architecture.png)

Workflow of remote benchmark execution:
```python
benchmark = SigmoidBenchmark()
# adapt default config or load from file
benchmark.set_seed(42)

# gets and start container for benchmark version from specific experiment / this also defines what is logged, which wrappers are used
# maybe improved / made configurable later
remote_runner = RemoteRunner(benchmark, experiement_identifier="exp:0.01")

# set up and agent for the baselines we also need a containerized version (todo)
agent = agent_creation_function(remote_runner.get_environment())

# run the experiment for n episodes
# logs are written to local file and are retrievable afterwards
remote_runner.run(agent, number_of_episodes=10)
```

(todo):
Classes:
* `dacbench.container.RemoteRunner`
* `dacbench.container.RemoteRunnerServer`
* `dacbench.container.RemoteEnvironmentClient`
* `dacbench.container.RemoteEnvironmentServer`

Todos:
* [ ] Implement container setup and download for benchmarks
* [ ] Unify the way serialization is handled (currently in the benchmark and in the environment)
* [ ] Communications via sockets currently via http
* [ ] set up container registry
* [ ] Make dependencies separately installable for each benchmark and remove all benchmark dependencies from default since default is to run in container? (or add container extra
* [ ] command line interface for remote runner / integrate with `dacbench.runner.run()`. Proposed solution: add common baseclass for Runner and RemoteRunner that handles argument parsing and defines interface for method run()
* [ ] Improve experiment setup (currently only one experiment hardcoded in RemoteRunnerServer.get_environment()))
* Measure performance of containerized version vs. non-containerized version
* [ ] Add guide on how to build own containers also useful for internal usage
Binary file added dacbench/container/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8ab4d16

Please sign in to comment.