Skip to content

Blackbox feasibility prediction with machine learning to optimize a CMA-ES algorithm

Notifications You must be signed in to change notification settings

KodAgge/Feasibility-Prediction

Repository files navigation

Feasibility Prediction

.
├── code
│   ├── figures        <- plots generated by python
│   ├── misc           <- .png illustrations shown in notebooks/.md 
│   ├── old notebooks  <- Depreciated notebooks
│   ├── utils          <- .py files with classes
│   ├── .ipynb         <- One for each Model explored
│   └── README.md
├── data
│   ├── .csv           <- raw data as received from client
│   └── README.md
├── misc               <- .png illustrations shown in notebooks/.md 
├── models
│   └── results        <- results for each model
├── requirements.txt   <- use pip install -r requirements.txt
└── README.md


Project structure based on the cookiecutter data science project template

Executive summary

To solve a constrained non-linear optimization problem, a evolutionary algorithm is used. Due to the nature of the problem, labeling solutions as feasible or non-feasible turns out to be computationally heavy. Thus, this projects looks at to see if machine learning algorithms could be used to filter out non-feasible solutions. For this, a wide range of both supervised and unsupervised machine learning methods were studied. Unfortunately, no method yielded fruitful result. This is most likely due to the evolutionary algorithm advancing towards the optimum in iterational steps which were to long leading to an the ML algortithms needing to extrapolate predication in combination with the non-linear constraints being to complex.

Introduction

As described by client. A more thorough introductory walkthrough of the problem and data can be found in the introduction.ipynb.

In each iteration of the evolutionary optimization algorithm, 1024 unique solutions are generated, and after evaluation of constraints each solution will get a non-negative breach value per constraint. If any of the constraints has a non-zero breach value for a solution, the solution is labeled as infeasible. The model is typically in the area of 3500 dimensions and a full optimization solve typically takes 10k iterations. In each iteration of an advanced model, a large proportion, up to 85%, of the solutions could be infeasible and there for of no use to improve the objective value. The cost of determining that through full evaluation of each solution is high, and therefore it is of interest to explore the possibility to use an ML model, trained on-the-fly during the optimization while collecting data, to classify solutions as either feasible or infeasible with full evaluation.

Results

The following table summarizes the results for the best models in every notebook. The performance metric used i balanced accuracy and the prediction interval shows the certainty of that value. For more information read the notebooks.

Notebook Model Balanced accuracy 95% Prediction interval
01-SVM SVM w. RBF kernel 50.17% (49.31%, 51.03%)
02-RBFSampler SGDClassifier + squared_hinge loss 51.25% (50.30%, 52.20%)
03-Deeplearning - k-NNs 2-layers + regularization 50.17% (50.04%, 50.30%)
04-Nearest Neighbor 31 neighbors 50.26% (49.51%, 51.01%)
05-Random Forest Random Forest 50.17% (49.75%, 50.58%)
06-Logistic Regression Logistic Regression 50.43% (49.89%, 50.97%)
07-Gradient-Boosting-Machines Depth unlimited, 31 maximum leaves, λ = 0.01 50.62% (49.69%, 51.55%)
08-Autoencoder Two-layer, fully-connected AE 50.08% (49.82%, 50.33%)

Model Tester

As the project consists of exploring a range of ML methods, we have developed a special library to be able to compare our results in a meaningful way. If there is a need to explore new methods in data which is structured in a similar way, we strongly enourage the reader this use this library. An introduction to it and instructions for how to use it can be found here.

Authors

Adam Wuilmart

August Regnell

Eric Bojs

Ludvig Wärnberg Gerdin