_____ ____ _ _ _ | __ \ / __ \ | | (_) (_) | |__) | __ ___ ___ ___ ___ ___| | | |_ __ | |_ _ _ __ ___ _ _______ _ __ | ___/ '__/ _ \ / __/ _ \/ __/ __| | | | '_ \| __| | '_ ` _ \| |_ / _ \ '__| | | | | | (_) | (_| __/\__ \__ \ |__| | |_) | |_| | | | | | | |/ / __/ | |_| |_| \___/ \___\___||___/___/\____/| .__/ \__|_|_| |_| |_|_/___\___|_| | | |_|
ProcessOptimizer is intended and tailored for optimization of real world processes. This could e.g. be some complex chemical reaction where no reliable analytical model mapping input variables to the output is readily available. Functionality includes Bayesian optimization, space filling, Design of Experiment algorithms, multi-objective optimization, and more.
ProcessOptimizer is tailored to perform well when observables have non-neglible noise but where the underlying relations between factors and responses follow regular real world behavior.
ProcessOptimizer can be installed using pip install ProcessOptimizer
The repository and examples can be found at https://github.com/novonordisk-research/ProcessOptimizer
ProcessOptimizer can also be installed by running pip install -e .
in top directory of the cloned repository.
Below is an illustrative example of minimization of the 2-dimensional Booth function using the ProcessOptimizer
package. Notice that in real world applications, we would not know this function beforehand, i.e., it would be "black-box" (and typically we would also have more than 2 input factors).
In this example, uniformly distributed random noise between 0-5% of the function value is added using np.random
. The function is defined as follows:
import numpy as np
def Booth(x0, x1):
booth = (x0 + 2 * x1 - 7)**2 + (2 * x0 + x1 - 5)**2
noise = 1 + 0.05 * (2 * np.random.rand() - 1)
return (booth * noise)
You are given the task of finding the minimum of the function without knowing its analytical form. You can perform "experiments", where you provide x0
and x1
and obtain the noisy value of the function. You want to do as few experiments as possible.
Working with the ProcessOptimizer package, you define the experimental Space
and create an Optimizer
object. In this specific case, we have two continous numerical dimensions both ranging from 0.0 to 5.0.
import ProcessOptimizer as po
SPACE = po.Space([[0.0, 5.0], [0.0, 5.0]])
The Optimizer
defined below uses "GP"
(Gaussian Process) for Bayesian optimization. Before the Bayesian part of the optimization begins, a number of initial "experiments" (n_initial_points
) is run to obtain some initial data. After these initial "experiments" and every time new data is added afterwards, a Gaussian Process regression model is fitted to the data we have obtained so far. Based on this model (and an acquisition function that determines our search strategy), the optimizer suggests the next point to evaluate.
opt = po.Optimizer(SPACE, base_estimator = "GP", n_initial_points = 2)
The optimizer can be used in steps by calling the .ask()
function, evaluating the function at the given point and using .tell()
to feed back the result to the Optimizer
. In practise it would work like this. First ask the optimizer for the next point to perform an experiment:
opt.ask()
>>> [3.75, 3.75]
Now go to the laboratory or wherever the experiment can be performed and use the values obtained when calling ask()
. In this example the experiment can simply be performed by evaluating the Booth function using the values above:
Booth(3.75, 3.75)
>>> 59.313996676981354
When a result has been obtained the user needs to tell the output to the Optimizer
. This is done using the .tell()
function:
opt.tell([3.75, 3.75], 59.313996676981354)
result = opt.get_result()
po.plot_objective(result)
The result
returned by tell
contains a model of the Gaussian Process predicted mean. This model can be plotted using plot_objective(result)
. Below is a gif of the search after 2 initial points and until 20 points have been sampled in total. The orange dots visualise each evaluation of the function. Besides the 2D color plot, there are also 1D plots for each input variable. These show how the function depend on each input variable with other input variables kept constant at the best sampled data point.
Notice that this is an optimization tool and not a modelling tool. This means that the optimizer finds an approximate solution for the global minimum quickly. It does, however, not guarantee that the obtained model is accurate on the entire domain.
A full minimal example of use can be found below:
import numpy as np
import ProcessOptimizer as po
def Booth(x0, x1):
booth = (x0 + 2 * x1 - 7)**2 + (2 * x0 + x1 - 5)**2
noise = 1 + 0.05 * (2 * np.random.rand() - 1)
return (booth * noise)
SPACE = po.Space([[0.0, 5.0], [0.0, 5.0]])
opt = po.Optimizer(SPACE,
base_estimator = "GP",
n_initial_points = 2)
for i in range(20):
x = opt.ask()
y = Booth(*x)
opt.tell(x, y)
result = opt.get_result()
po.plot_objective(result)
An introductory walkthough of the package can be found here
Various examples on use and functionality can be found here.
Feel free to play around with algorithm. Should you encounter errors while using ProcessOptimizer, please report them
at https://github.com/novonordisk-research/ProcessOptimizer/issues.
To help solve the issues, please:
- Provide minimal amount of code to reproduce the error
- State versions of ProcesOptimizer, sklearn, numpy, ...
- Describe the expected behavior of the code
If you would like to contribute by making anything from documentation to feature-additions, THANK YOU. Please open a pull request
marked as WIP as early as possible and describe the issue you seek to solve and outline your planned solution.
ProcessOptimizer is a fork of scikit-optimize. ProcessOptimizer will fundamentally function like scikit-optimize, yet developments are focussed on bringing improvements to help optimizing real world processes, like chemistry or baking.
Brownie Bee is a web-based platform for Bayesian process optimization intended for non-coders. It uses ProcessOptimizer as the underlying optimization engine.
If you use the package in relation to published works, please cite: https://doi.org/10.5281/zenodo.5155295 and https://pubs.acs.org/doi/full/10.1021/acs.jcim.4c02240
Please also cite the underlaying package (scikit-optimize).