Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C backend for bayesian optimization #47

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

SimonKLind
Copy link
Contributor

Note that this C backend is currently not really adapted for HyperMapper
and is only plugged in through a very crude wrapper.
The C backend is also currently very limited in its functionality.
More functionality can be added later.
Thus, work to be done includes tweaking the backend + wrapper to fit
better into the rest of HyperMapper, and add functionality present in
the other optimizer implementations.

Note that this C backend is currently not really adapted for HyperMapper
and is only plugged in through a very crude wrapper.
The C backend is also currently very limited in its functionality.
More functionality can be added later.
Thus, work to be done includes tweaking the backend + wrapper to fit
better into the rest of HyperMapper, and add functionality present in
the other optimizer implementations.
@SimonKLind
Copy link
Contributor Author

Hi guys.
As I stated in the commit message, this C backend is currently added in a rather crude way,
and I expect that some (or many) changes need to be made before you will consider merging this.
I'm creating this pull request already in order to be able to get you guys's input on how you want it integrated into HM, and what functionality you want present before a potential merge.

@SimonKLind
Copy link
Contributor Author

SimonKLind commented Jul 6, 2021

I should also mention the specifics of what this C backend currently supports:

  • No client-server mode. (Should be farily easy to add)
  • Only single-objective functions. (Shouln't be too difficult to add multi-objective)
  • Only real-valued parameters. (Support for other types might be more extensive to add)
  • Only latin-hypercube initialization. (Definitely easy to add others)
  • No customization of BO parameters other than DoE samples and opimization iterations. (Definitely easy to add)

I think those are the most important restrictions that may need to be addressed before potentially merging.

@arturluis arturluis self-requested a review July 9, 2021 19:54
Copy link
Collaborator

@arturluis arturluis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, to my somewhat limited understanding of C. Left just some minor comments. I'll comment on the features and overall thoughts in a separate comment.

Good job!

c_src/bayesian.c Outdated Show resolved Hide resolved
c_src/bayesian.c Outdated
RandomForest forest;
randomforest_fit(&forest, NUMBER_OF_TREES, points, dims, i);

optimize_forest(&forest, lower, upper, points[i]->x, dims, best_val, workspace_points);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this optimize forest part? Is it the acquisition function optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better to have a more standard BO naming here. Can we simply call it acquisition function or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed optimize_forest() -> optimize_acquisition_function()

hypermapper/schema.json Outdated Show resolved Hide resolved
@arturluis
Copy link
Collaborator

Overall, I think it looks really good, and I really like this C backend initiative. Good job! I think the acquisition function optimization will be changed a bit based on a meeting we had earlier today, but other than that the rest seems fine. Some higher-level takes:

  • I'd vote for moving the c_src directory to inside the hypermapper directory, to keep all of the hypermapper source code together in the same place.
  • Like we discussed earlier today, we are not writing results to a CSV output file currently, right? I think this is important to add, even to help us with our future experiments and tests.
  • A minor for now, but later down the pipeline I think it would be interesting to add an example for the c backend.

Commenting on the not-yet-implemented features:

  • No client-server mode. I think this is actually quite important, client-server mode is a pretty popular and useful feature of HM.
  • Only single-objective functions. I think it's fine to leave that for later.
  • Only real-valued parameters. This is a tricky one, I think we actually want the other parameter types, it's another important feature of HM, but since it's hard to implement, we may decide to push it down a bit for now. I think it's good to keep this in mind though, so that future features don't make this even harder to implement in the future. If we think that will be the case, it may be a good idea to implement the other variable types now.
  • Only latin-hypercube initialization. If it's easy, I'd implement random sampling as well since it's the default for HM (and I think you already have code for randomly sampling configurations, right?), but it's not that important. In previous experiments, Latin Hypercube and Random Sampling have performed similarly, if I remember right. If anything, I'd expect latin hypercube to perform a bit better in some applications.
  • No customization of BO parameters other than DoE samples and opimization iterations. It might make things easier down the line to just add the applicable customizations now, so that it is easier to then just keep it updated as more features are added, but otherwise this can definitely wait.

There are some other (many of them minor) features of HM that are not implemented. To keep track (correct me if I'm wrong):

  • Constrained Bayesian Optimization
  • Prior injection
  • GP model
  • Resume optimization
  • UCB and TS acquisition functions
  • Batch BO
  • RF parameter importance

Most of these are minor features that I think we can leave for later, with really low priority, I'm adding it here mostly to have it written down for future reference. The exception may be constrained BO, which might be not very hard to add and is an often-needed feature. Prior injection and GPs are nice, but will probably be tricky to add and I think are not mandatory.

I think the things I'd vote we work on next would be:

  • writing results to a csv file
  • the acquisition function optimization
  • client-server mode

and later down the line:

  • different parameter types
  • constrained BO

@luinardi what do you think?

@arturluis arturluis requested a review from luinardi July 9, 2021 21:16
@SimonKLind
Copy link
Contributor Author

Thank you for your comments @arturluis.

Originally when discussing this C-backend with @luinardi, I believe the idea was for it to mostly be a smaller edge-case kind of thing. Essentially just a small, fast backend you can use if it fits your application. So I'm not sure how much effort we're willing to put into this? Of course we should add things like client-server and CSV-logging, but perhaps discuss the more advanced additions like GP model and prior injection etc.

That being said, some of the extra changes you mentioned should be pefectly feasible without too much work.
Resuming optimization, batch BO, and extra acquisition functions should all be relatively simple additions.
When it comes to prior injection and constrained BO, I don't know these concepts in enough detail (yet) to say how difficult they are to add.

A GP model might be feasible for me to add. I'm currently reading up on GPs anyway, so if I have the time I might just implement it in C.

All in all I definitely agree that we start with client-server, CSV-logging, and the acquisition-optimization.

@arturluis
Copy link
Collaborator

arturluis commented Jul 12, 2021

Good to know, I left those as my impressions and overall thoughts, even for future reference in case we want to expand this in the future. Even then, some of these features (like GPs and prior injection) we may want to leave for really long down the line. For now, though, I'm fully on board with having this as an edge case for specific scenarios. I'm not sure how much effort we should/want to to put into this, but for a minimal version, I think we just need the acquisition optimization and CSV results. To be honest, for a truly minimal backend, we can just have the CSV results and use the same acquisition function optimization as the Python implementation.

The rest I'd say are all nice features, but not requirements.

Cheers!

Simon Kristoffersson Lind added 2 commits July 23, 2021 10:16
Update to treat DoE as separate from optimization_iterations.
Rename optimize_forest() -> optimize_acquisition_function().
Separate "use_c_backend" config parameter.
When changing to treat DoE and optimization_iterations as separate,
I accidentally messed up indexing so DoE samples were overwritten.
This change fixes that bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants