-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spark] Make xgboost spark support large model size #10984
Conversation
Signed-off-by: Weichen Xu <[email protected]>
LGTM for the functionality except the CI issue. |
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Could you run |
Signed-off-by: Weichen Xu <[email protected]>
I can't fully understand the linter error:
@wbo4958 any ideas ? |
@WeichenXu123 XGBoost's Python package uses Python typehint. In the following line: booster = booster.save_raw("json").decode("utf-8") The |
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
LGTM if the CI can pass |
@trivialfis Can we make a patch release to include this fix ? We have several customers facing the issue. thanks! |
--------- Signed-off-by: Weichen Xu <[email protected]>
) --------- Signed-off-by: Weichen Xu <[email protected]> Co-authored-by: WeichenXu <[email protected]>
Spark RDD can't support one line with very long content.
To make large size model training / saving / loading works,
I split model json string to chunks when collecting model in training, and modify saving / loading code too.