Add C backend for bayesian optimization #47

SimonKLind · 2021-07-06T09:28:25Z

Note that this C backend is currently not really adapted for HyperMapper
and is only plugged in through a very crude wrapper.
The C backend is also currently very limited in its functionality.
More functionality can be added later.
Thus, work to be done includes tweaking the backend + wrapper to fit
better into the rest of HyperMapper, and add functionality present in
the other optimizer implementations.

Note that this C backend is currently not really adapted for HyperMapper and is only plugged in through a very crude wrapper. The C backend is also currently very limited in its functionality. More functionality can be added later. Thus, work to be done includes tweaking the backend + wrapper to fit better into the rest of HyperMapper, and add functionality present in the other optimizer implementations.

SimonKLind · 2021-07-06T09:32:16Z

Hi guys.
As I stated in the commit message, this C backend is currently added in a rather crude way,
and I expect that some (or many) changes need to be made before you will consider merging this.
I'm creating this pull request already in order to be able to get you guys's input on how you want it integrated into HM, and what functionality you want present before a potential merge.

SimonKLind · 2021-07-06T09:43:34Z

I should also mention the specifics of what this C backend currently supports:

No client-server mode. (Should be farily easy to add)
Only single-objective functions. (Shouln't be too difficult to add multi-objective)
Only real-valued parameters. (Support for other types might be more extensive to add)
Only latin-hypercube initialization. (Definitely easy to add others)
No customization of BO parameters other than DoE samples and opimization iterations. (Definitely easy to add)

I think those are the most important restrictions that may need to be addressed before potentially merging.

arturluis

Code looks good, to my somewhat limited understanding of C. Left just some minor comments. I'll comment on the features and overall thoughts in a separate comment.

Good job!

c_src/bayesian.c

arturluis · 2021-07-09T20:07:34Z

c_src/bayesian.c

+            RandomForest forest;
+            randomforest_fit(&forest, NUMBER_OF_TREES, points, dims, i);
+
+            optimize_forest(&forest, lower, upper, points[i]->x, dims, best_val, workspace_points);


What is this optimize forest part? Is it the acquisition function optimization?

It'd be better to have a more standard BO naming here. Can we simply call it acquisition function or similar?

Renamed optimize_forest() -> optimize_acquisition_function()

hypermapper/schema.json

arturluis · 2021-07-09T21:15:01Z

Overall, I think it looks really good, and I really like this C backend initiative. Good job! I think the acquisition function optimization will be changed a bit based on a meeting we had earlier today, but other than that the rest seems fine. Some higher-level takes:

I'd vote for moving the c_src directory to inside the hypermapper directory, to keep all of the hypermapper source code together in the same place.
Like we discussed earlier today, we are not writing results to a CSV output file currently, right? I think this is important to add, even to help us with our future experiments and tests.
A minor for now, but later down the pipeline I think it would be interesting to add an example for the c backend.

Commenting on the not-yet-implemented features:

No client-server mode. I think this is actually quite important, client-server mode is a pretty popular and useful feature of HM.
Only single-objective functions. I think it's fine to leave that for later.
Only real-valued parameters. This is a tricky one, I think we actually want the other parameter types, it's another important feature of HM, but since it's hard to implement, we may decide to push it down a bit for now. I think it's good to keep this in mind though, so that future features don't make this even harder to implement in the future. If we think that will be the case, it may be a good idea to implement the other variable types now.
Only latin-hypercube initialization. If it's easy, I'd implement random sampling as well since it's the default for HM (and I think you already have code for randomly sampling configurations, right?), but it's not that important. In previous experiments, Latin Hypercube and Random Sampling have performed similarly, if I remember right. If anything, I'd expect latin hypercube to perform a bit better in some applications.
No customization of BO parameters other than DoE samples and opimization iterations. It might make things easier down the line to just add the applicable customizations now, so that it is easier to then just keep it updated as more features are added, but otherwise this can definitely wait.

There are some other (many of them minor) features of HM that are not implemented. To keep track (correct me if I'm wrong):

Constrained Bayesian Optimization
Prior injection
GP model
Resume optimization
UCB and TS acquisition functions
Batch BO
RF parameter importance

Most of these are minor features that I think we can leave for later, with really low priority, I'm adding it here mostly to have it written down for future reference. The exception may be constrained BO, which might be not very hard to add and is an often-needed feature. Prior injection and GPs are nice, but will probably be tricky to add and I think are not mandatory.

I think the things I'd vote we work on next would be:

writing results to a csv file
the acquisition function optimization
client-server mode

and later down the line:

different parameter types
constrained BO

@luinardi what do you think?

SimonKLind · 2021-07-12T15:42:48Z

Thank you for your comments @arturluis.

Originally when discussing this C-backend with @luinardi, I believe the idea was for it to mostly be a smaller edge-case kind of thing. Essentially just a small, fast backend you can use if it fits your application. So I'm not sure how much effort we're willing to put into this? Of course we should add things like client-server and CSV-logging, but perhaps discuss the more advanced additions like GP model and prior injection etc.

That being said, some of the extra changes you mentioned should be pefectly feasible without too much work.
Resuming optimization, batch BO, and extra acquisition functions should all be relatively simple additions.
When it comes to prior injection and constrained BO, I don't know these concepts in enough detail (yet) to say how difficult they are to add.

A GP model might be feasible for me to add. I'm currently reading up on GPs anyway, so if I have the time I might just implement it in C.

All in all I definitely agree that we start with client-server, CSV-logging, and the acquisition-optimization.

arturluis · 2021-07-12T16:29:42Z

Good to know, I left those as my impressions and overall thoughts, even for future reference in case we want to expand this in the future. Even then, some of these features (like GPs and prior injection) we may want to leave for really long down the line. For now, though, I'm fully on board with having this as an edge case for specific scenarios. I'm not sure how much effort we should/want to to put into this, but for a minimal version, I think we just need the acquisition optimization and CSV results. To be honest, for a truly minimal backend, we can just have the CSV results and use the same acquisition function optimization as the Python implementation.

The rest I'd say are all nice features, but not requirements.

Cheers!

Update to treat DoE as separate from optimization_iterations. Rename optimize_forest() -> optimize_acquisition_function(). Separate "use_c_backend" config parameter.

When changing to treat DoE and optimization_iterations as separate, I accidentally messed up indexing so DoE samples were overwritten. This change fixes that bug.

arturluis self-requested a review July 9, 2021 19:54

arturluis reviewed Jul 9, 2021

View reviewed changes

arturluis requested a review from luinardi July 9, 2021 21:16

Simon Kristoffersson Lind added 2 commits July 23, 2021 10:16

Minor adjustments

1d573bb

Update to treat DoE as separate from optimization_iterations. Rename optimize_forest() -> optimize_acquisition_function(). Separate "use_c_backend" config parameter.

Fix indexing bug

be03d8e

When changing to treat DoE and optimization_iterations as separate, I accidentally messed up indexing so DoE samples were overwritten. This change fixes that bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C backend for bayesian optimization #47

Add C backend for bayesian optimization #47

SimonKLind commented Jul 6, 2021

SimonKLind commented Jul 6, 2021

SimonKLind commented Jul 6, 2021 •

edited

Loading

arturluis left a comment

arturluis Jul 9, 2021

SimonKLind Jul 12, 2021

luinardi Jul 21, 2021

SimonKLind Jul 23, 2021

arturluis commented Jul 9, 2021

SimonKLind commented Jul 12, 2021

arturluis commented Jul 12, 2021 •

edited

Loading

Add C backend for bayesian optimization #47

Are you sure you want to change the base?

Add C backend for bayesian optimization #47

Conversation

SimonKLind commented Jul 6, 2021

SimonKLind commented Jul 6, 2021

SimonKLind commented Jul 6, 2021 • edited Loading

arturluis left a comment

Choose a reason for hiding this comment

arturluis Jul 9, 2021

Choose a reason for hiding this comment

SimonKLind Jul 12, 2021

Choose a reason for hiding this comment

luinardi Jul 21, 2021

Choose a reason for hiding this comment

SimonKLind Jul 23, 2021

Choose a reason for hiding this comment

arturluis commented Jul 9, 2021

SimonKLind commented Jul 12, 2021

arturluis commented Jul 12, 2021 • edited Loading

SimonKLind commented Jul 6, 2021 •

edited

Loading

arturluis commented Jul 12, 2021 •

edited

Loading