GitHub - novonordisk-research/ProcessOptimizer: A tool to optimize real world problems

  _____                              ____        _   _           _              
 |  __ \                            / __ \      | | (_)         (_)             
 | |__) | __ ___   ___ ___  ___ ___| |  | |_ __ | |_ _ _ __ ___  _ _______ _ __ 
 |  ___/ '__/ _ \ / __/ _ \/ __/ __| |  | | '_ \| __| | '_ ` _ \| |_  / _ \ '__|
 | |   | | | (_) | (_|  __/\__ \__ \ |__| | |_) | |_| | | | | | | |/ /  __/ |   
 |_|   |_|  \___/ \___\___||___/___/\____/| .__/ \__|_|_| |_| |_|_/___\___|_|   
                                          | |                                   
                                          |_|

ProcessOptimizer

ProcessOptimizer is intended and tailored for optimization of real world processes. This could e.g. be some complex chemical reaction where no reliable analytical model mapping input variables to the output is readily available. Functionality includes Bayesian optimization, space filling, Design of Experiment algorithms, multi-objective optimization, and more.

ProcessOptimizer is tailored to perform well when observables have non-neglible noise but where the underlying relations between factors and responses follow regular real world behavior.

Installation

ProcessOptimizer can be installed using pip install ProcessOptimizer
The repository and examples can be found at https://github.com/novonordisk-research/ProcessOptimizer
ProcessOptimizer can also be installed by running pip install -e . in top directory of the cloned repository.

How to get started

Below is an illustrative example of minimization of the 2-dimensional Booth function using the ProcessOptimizer package. Notice that in real world applications, we would not know this function beforehand, i.e., it would be "black-box" (and typically we would also have more than 2 input factors).
In this example, uniformly distributed random noise between 0-5% of the function value is added using np.random. The function is defined as follows:

import numpy as np

def Booth(x0, x1):
    booth = (x0 + 2 * x1 - 7)**2 + (2 * x0 + x1 - 5)**2
    noise = 1 + 0.05 * (2 * np.random.rand() - 1)
    return (booth * noise)

You are given the task of finding the minimum of the function without knowing its analytical form. You can perform "experiments", where you provide x0 and x1 and obtain the noisy value of the function. You want to do as few experiments as possible.
Working with the ProcessOptimizer package, you define the experimental Space and create an Optimizer object. In this specific case, we have two continous numerical dimensions both ranging from 0.0 to 5.0.

import ProcessOptimizer as po

SPACE = po.Space([[0.0, 5.0], [0.0, 5.0]])

The Optimizer defined below uses "GP" (Gaussian Process) for Bayesian optimization. Before the Bayesian part of the optimization begins, a number of initial "experiments" (n_initial_points) is run to obtain some initial data. After these initial "experiments" and every time new data is added afterwards, a Gaussian Process regression model is fitted to the data we have obtained so far. Based on this model (and an acquisition function that determines our search strategy), the optimizer suggests the next point to evaluate.

opt = po.Optimizer(SPACE, base_estimator = "GP", n_initial_points = 2)

The optimizer can be used in steps by calling the .ask() function, evaluating the function at the given point and using .tell() to feed back the result to the Optimizer. In practise it would work like this. First ask the optimizer for the next point to perform an experiment:

opt.ask()
>>> [3.75, 3.75]

Now go to the laboratory or wherever the experiment can be performed and use the values obtained when calling ask(). In this example the experiment can simply be performed by evaluating the Booth function using the values above:

Booth(3.75, 3.75)
>>> 59.313996676981354

When a result has been obtained the user needs to tell the output to the Optimizer. This is done using the .tell() function:

opt.tell([3.75, 3.75], 59.313996676981354)
result = opt.get_result()

po.plot_objective(result)

The result returned by tell contains a model of the Gaussian Process predicted mean. This model can be plotted using plot_objective(result). Below is a gif of the search after 2 initial points and until 20 points have been sampled in total. The orange dots visualise each evaluation of the function. Besides the 2D color plot, there are also 1D plots for each input variable. These show how the function depend on each input variable with other input variables kept constant at the best sampled data point.

Notice that this is an optimization tool and not a modelling tool. This means that the optimizer finds an approximate solution for the global minimum quickly. It does, however, not guarantee that the obtained model is accurate on the entire domain.

A full minimal example of use can be found below:

import numpy as np
import ProcessOptimizer as po

def Booth(x0, x1):
    booth = (x0 + 2 * x1 - 7)**2 + (2 * x0 + x1 - 5)**2
    noise = 1 + 0.05 * (2 * np.random.rand() - 1)
    return (booth * noise)

SPACE = po.Space([[0.0, 5.0], [0.0, 5.0]])
opt = po.Optimizer(SPACE,
                   base_estimator = "GP",
                   n_initial_points = 2)

for i in range(20):
    x = opt.ask()
    y = Booth(*x)
    opt.tell(x, y)

result = opt.get_result()
po.plot_objective(result)

Examples

An introductory walkthough of the package can be found here
Various examples on use and functionality can be found here.

Contributions

Feel free to play around with algorithm. Should you encounter errors while using ProcessOptimizer, please report them at https://github.com/novonordisk-research/ProcessOptimizer/issues.
To help solve the issues, please:

Provide minimal amount of code to reproduce the error
State versions of ProcesOptimizer, sklearn, numpy, ...
Describe the expected behavior of the code

If you would like to contribute by making anything from documentation to feature-additions, THANK YOU. Please open a pull request marked as WIP as early as possible and describe the issue you seek to solve and outline your planned solution.

Related work

ProcessOptimizer is a fork of scikit-optimize. ProcessOptimizer will fundamentally function like scikit-optimize, yet developments are focussed on bringing improvements to help optimizing real world processes, like chemistry or baking.

Brownie Bee is a web-based platform for Bayesian process optimization intended for non-coders. It uses ProcessOptimizer as the underlying optimization engine.

Citation

If you use the package in relation to published works, please cite: https://doi.org/10.5281/zenodo.5155295 and https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c02240
Please also cite the underlaying package (scikit-optimize).

Name		Name	Last commit message	Last commit date
Latest commit History 2,410 Commits
.github		.github
ProcessOptimizer		ProcessOptimizer
examples		examples
media		media
plot_test		plot_test
.gitignore		.gitignore
.zenodo.json		.zenodo.json
AUTHORS_scikit_optimize.md		AUTHORS_scikit_optimize.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
README_scikit_optimize.rst		README_scikit_optimize.rst
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

ProcessOptimizer

Installation

How to get started

Examples

Contributions

Related work

Citation

About

Releases 15

Contributors 59

Languages

License

novonordisk-research/ProcessOptimizer

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

ProcessOptimizer

Installation

How to get started

Examples

Contributions

Related work

Citation

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 15

Contributors 59

Languages