Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost rewrite on Julia using Metal.jl #167

Open
Roh-codeur opened this issue Feb 4, 2023 · 8 comments
Open

XGBoost rewrite on Julia using Metal.jl #167

Roh-codeur opened this issue Feb 4, 2023 · 8 comments

Comments

@Roh-codeur
Copy link

Hi

thanks for the work on this library. Since you all are experienced with XGBoost, I was wondering if you have any thoughts on rewriting XGBoost in Julia, potentially using Metal.jl? I am sure Apple M1 will bring considerable boost in performance.
Thoughts please

ta!

@tylerjthomas9
Copy link
Contributor

EvoTrees.jl is a pure julia version of gradient boosting. EvoTrees.jl has phenomenal cpu performance. This library is just a wrapper of the xgboost c library, so it wouldn't be possible to make those changes here. Maybe xgboost will support more gpu backends at some point.

@Roh-codeur
Copy link
Author

EvoTrees.jl is a pure julia version of gradient boosting. EvoTrees.jl has phenomenal cpu performance. This library is just a wrapper of the xgboost c library, so it wouldn't be possible to make those changes here. Maybe xgboost will support more gpu backends at some point.

thanks, @tylerjthomas9! I will take a look at EvoTrees.jl. I noticed their benchmarks and it seems that even EvoTrees would benefit from GPU. :)

I understand your point about the scope of this project; I have very little knowledge about XGBoost and programming in general, so was hoping you all would be able to advise on the complexity to port XGBoost so it would support Apple M1 Silicon in Julia. The thing is I have to run multiple models and need the current XGBoost implementation takes a very long time to run, so, am hoping Apple M1 GPU would be able to help.

ta!

@ExpandingMan
Copy link
Collaborator

Originally when I started working on this package it was mostly because it seemed to have a very high reward to effort ratio, which I think has mostly been borne out. However, at some point I'm going to start looking to replace it anywhere I might use it with EvoTrees.jl, and see how close it is to parity. It does seem like it has had a lot of recent work done.

@Roh-codeur
Copy link
Author

sure, I understand! thanks again for all your work on this package, I know a lot of users, like me, sincerely appreciate all your help on this package.

as I write this, I am looking at EvoTrees as well. the benchmarks look quite impressive.

thanks!

@tylerjthomas9
Copy link
Contributor

It might be worth writing methods to convert EvoTrees.jl models to and from XGBoost. There's an issue for it on EvoTrees.jl

Evovest/EvoTrees.jl#179

@bobaronoff
Copy link

I too have M1 Apple Silicon and share the original poster's pain. However, such is the state of affairs. Apple silicon is still only at tier 2 support and Metal.jl describes itself as a work in progression i.e., not ready for production work. Many great people doing great work but this will take time. That said, there are some maneuvers to help reduce computation times - it will not come close to GPU but can be significant. Am kind of curious how many rows/columns are involved.

I presume the datasets are quite large. Here are some options to consider.

  1. 'turn off' the watchlist. I have found that computation time is reduced by 40% without these. Of course you'll need to do these an alternate way but Julia and MLJ are quick and can be done less frequently i.e. every 20 rounds. You probably already doing this.
  2. if there are excessive rows i.e. 10^5-10^6, can reduce the subsample fraction and also consider the hist tree method as opposed to the exact tree method. I have not used hist method but it is intended to be computationally efficient.
  3. if there are excessive columns i.e. 10^2-10^3, take advantage of the column sub-sampling parameters. There are 3 which are additive and work by tree, level, and node. This also reduces computation time.
  4. multi-threading for a single model is not extremely helpful as rounds are iterative. However, threading can help with concurrent CV fold processing and grid search. I've used this with R but not Julia. I believe that MLJ has facilities for this.
  5. don't shoot the messenger here, but if the need is great, one can always borrow a machine with Intel CPU and Nvidia GPU.

I have found xgboost to be quite fast compared to the 'gbm' package in R and Dr Friedman's MART, so I am a glass half full guy. My datasets are relatively small (i.e. 10,000 rows/ 25 column); at this size a 10 fold CV using exact trees and watchlist 'on' and 1,000 rounds takes about 25 seconds. For me this is acceptable. Other's needs will vary.

@tylerjthomas9
Copy link
Contributor

2. the hist tree method as opposed to the exact tree method. I have not used hist method but it is intended to be computationally efficient.

Using tree_method="hist" should make a large speed difference. I always use hist when training on CPUs.

@jeremiedb
Copy link

I'd definitely appreciate user inputs on EvoTrees :)
AFAICT, it now fares quite well on CPU, but I'm pretty sure the GPU story could be improved.
I haven't but any effort either with regard to distributed setup, nor multi-GPU. All spaces where I'm less knowledgeable.

Potential "low hanging fruits" I was considering shorter term was random forest mode as well as oblivious trees (tree structure used by CatBoost).

That being said, it's really nice to see the recent efforts to bring more robust wrappers around go-to librairies like XGBoost and CatBoost. These are definitely important to raise Julia's credibility in general ML space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants