Forward-merge branch-25.04 into branch-25.06 #6435

rapids-bot · 2025-03-13T20:14:39Z

Forward-merge triggered by push to branch-25.04 that creates a PR to keep branch-25.06 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

Reduce the UMAP logging verbosity. Avoids printing potentially large arrays.

PRs being backported: - [x] #6234 - [x] #6306 - [x] #6320 - [x] #6319 - [x] #6327 - [x] #6333 - [x] #6142 - [x] #6223 - [x] #6235 - [x] #6317 - [x] #6331 - [x] #6326 - [x] #6332 - [x] #6347 - [x] #6348 - [x] #6337 - [x] #6355 - [x] #6354 - [x] #6322 - [x] #6353 - [x] #6359 - [x] #6364 - [x] #6363 - [x] [FIL BATCH_TREE_REORG fix for SM90, 100 and 120](a3e419a) --------- Co-authored-by: William Hicks <[email protected]>

PR removes a wrong `click` library option that was present in the CLI functionality.

Due to a bug in the import code, experimental FIL was previously not making use of the `align_bytes` argument correctly. The effect was not just a failure to take advantage of cache line boundaries but a severe pessimization in which padding nodes were inserted in the forest structure at highly non-optimal places. This PR corrects this, resulting in a substantial performance improvement. It also introduces the `layered` layout type, in which nodes of the same depth are stored together. This allows for a moderate performance improvement in some models. It also allows CPU FIL to intelligently set the number of threads rather than accepting the highly non-optimal default. This provides a significant performance improvement for small batch size. Authors: - William Hicks (https://github.com/wphicks) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - Dante Gama Dessavre (https://github.com/dantegd) - https://github.com/jakirkham URL: #6397

rapids-bot · 2025-03-13T20:15:49Z

FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the Resolve conflicts option in this PR, follow these instructions https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

`shellcheck` is a fast, static analysis tool for shell scripts. It's good at flagging up unused variables, unintentional glob expansions, and other potential execution and security headaches that arise from the wonders of `bash` (and other shlangs). This PR adds a `pre-commit` hook to run `shellcheck` on all of the `sh-lang` files in the `ci/` directory, and the changes requested by `shellcheck` to make the existing files pass the check. xref: rapidsai/build-planning#135 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - James Lamb (https://github.com/jameslamb) URL: #6246

… RF (#6387) If both results are NaNs, pass the test rather than attempting to `ASSERT_NEAR` on NaN values. Authors: - William Hicks (https://github.com/wphicks) - Jim Crist-Harif (https://github.com/jcrist) - Simon Adorf (https://github.com/csadorf) Approvers: - Simon Adorf (https://github.com/csadorf) - Dante Gama Dessavre (https://github.com/dantegd) URL: #6387

This fixes `test_accuracy_score` to still work when `cudf.pandas` is active. The failure had gone unnoticed since `cudf.pandas` builds are optional currently and have been flakey long enough that I've stopped inspecting them when they're red :/. More motivation to fix our test issues and make that test run non-optional. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Dante Gama Dessavre (https://github.com/dantegd) URL: #6439

…6447) This PR adds a filter to skip CUDA 11.4 jobs on PRs as a precursor to enabling them in shared-workflows. Once the 11.4 issues are fixed, this matrix filter should be removed so 11.4 gets tested on PRs. xref: rapidsai/build-planning#164 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - James Lamb (https://github.com/jameslamb) URL: #6447

AFAICT these are no longer failing. Some of the disabled tests were reenabled a while ago, but these were missed. After this PR, everything disabled due to #5441 has been reenabled. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) URL: #6446

…into branch-25.04-merge-25.02-2

We're deprecating `cuml-cpu` in favor of `cuml.accel`. This adds a deprecation warning on import of `cuml-cpu` builds notifying users of this deprecation, and linking them to the relevant docs to learn more. Fixes #6458. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) URL: #6466

`sklearn` ensemble estimators are valid sequences of estimators. Supporting `__getitem__` and `__iter__` is _hard_ with our current implementation, but `__len__` is easy and lets more of the sklearn compatiblity tests pass. Fixes #6465. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) URL: #6468

Solve conflicts of #6313 Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Simon Adorf (https://github.com/csadorf) - Jake Awe (https://github.com/AyodeAwe) Approvers: - Simon Adorf (https://github.com/csadorf) - Divye Gala (https://github.com/divyegala) URL: #6385

Port all condabuild recipes over to use `rattler-build` instead. Contributes to rapidsai/build-planning#47 - To satisfy `rattler-build`, this changes all the licenses in the `pyproject.toml` files to the SPDX-compliant `Apache-2.0` instead of `Apache 2.0` Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - James Lamb (https://github.com/jameslamb) URL: #6440

@viclafargue

…6471) The `estimator` attribute is used in the scikit-learn tags machinery to figure out what tags the meta estimator has. We pass a default constructed instance as it isn't actually used. This was found as part of #6438 but can be fixed standalone. The problem isn't with the new scikit-learn version, but we discovered it with it. cc @viclafargue Authors: - Tim Head (https://github.com/betatim) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - Victor Lafargue (https://github.com/viclafargue) URL: #6471

…rkflows (#6447)" (#6470) Now that nightlies are passing, we should be able to test these jobs in PRs. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) - Ray Douglass (https://github.com/raydouglass) URL: #6470

We recently changed the `num_segments` argument to take `int64_t` in order to support larger segments Authors: - Michael Schellenberger Costa (https://github.com/miscco) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #6459

As seen [here](https://github.com/rapidsai/cuml/actions/runs/13964745379/job/39092533532#step:9:2176), the test mentioned in the title OOMs on L4s. This PR attempts to fix that by ensuring the failing test is allowed to use 100% of the available memory. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - James Lamb (https://github.com/jameslamb) - Dante Gama Dessavre (https://github.com/dantegd) URL: #6474

@aamijar

This log happens when the working set cannot be filled fully by new elements, which is not unexpected and not something worth alerting a user about. Fixes #5721. cc @aamijar for quick review. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #6477

For unknown reasons conda is unintentionally preferring an old build of `rapids-dask-dependency` that relies on `dask` nightlies rather than the current pin of `2025.2.0`. Since the current plan is to no longer install dask nightlies in project CI, removing the dask nightlies channel should prevent this problem going forward. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #6485

A recent refactor (#6089) made `sklearn` accidentally required to import `cuml`. This fixes that. I've tested that `cuml` can be imported now without `sklearn` installed. I'll push up a follow-up PR adding a minimal build import check to CI, but for now I believe this fixup should be sufficient to resolve the issue before release. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) - Victor Lafargue (https://github.com/viclafargue) URL: #6483

Previously this notebook used a couple internal `cuml` APIs. This PR switches them for public APIs instead. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) URL: #6488

This PR adds support for handling sparse input arrays in the KMeans algorithm by dispatching to CPU implementation when sparse arrays are detected during fitting. It also updates the sparse array detection utilities to be more robust and consistent across the codebase. Fixes scikit-learn test `test_kmeans_results[float64-lloyd-sparse_array]` in combination with #6442 . ## Changes - Added `_should_dispatch_cpu` method to KMeans to handle sparse input arrays - Updated `is_sparse` utility function to use `issparse` instead of `isspmatrix` for better compatibility - Updated sparse array detection in `input_utils.py` to use the new `issparse` method ## Testing - Verified that KMeans correctly dispatches to CPU implementation when sparse arrays are detected Authors: - Simon Adorf (https://github.com/csadorf) - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Jim Crist-Harif (https://github.com/jcrist) URL: #6448

This fixes a failure in `test_to_sparse_dask_array` with dask main. It seems that the workarounds previously implemented are fixed in cupy / dask and can now be removed from cuml. xref rapidsai/dask-upstream-testing#37, specifically the failure [here](https://github.com/rapidsai/dask-upstream-testing/actions/runs/14053066285/job/39346850200#step:10:933). Not sure if anyone has the context to say for sure, but I'm curious how well we think the existing test suite would catch any regressions here. I haven't done any kind of performance / memory profiling to make sure there aren't any more subtle regressions. Authors: - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: #6489

This PR promotes experimental FIL to the new stable FIL. This is purely a Python-level change. `cuml.fil.fil.ForestInference` now resolves to a thin wrapper around `cuml.experimental.fil.fil.ForestInference` with warnings about upcoming changes to the output shape of FIL predictions. Random forest estimators continue to use legacy FIL because of their usage of `TreeliteModel`, an obsolete implementation detail of legacy FIL. A future change should switch this to Treelite's native `treelite.Model` wrapper. The legacy FIL implementation has been moved to `cuml.legacy.fil.fil.ForestInference`. This can be removed in 25.06. The thin wrapper around `cuml.experimental.fil.fil.ForestInference` can also be removed in 25.06 once users have a deprecation cycle to adapt to new output shapes. This is marked as a breaking change because it removes the `shape_str` attribute from `ForestInference` objects. This attribute is not used anywhere in cuML and appears to have existed primarily for debugging. Resolve #6460. Authors: - William Hicks (https://github.com/wphicks) - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Jim Crist-Harif (https://github.com/jcrist) - Simon Adorf (https://github.com/csadorf) - Dante Gama Dessavre (https://github.com/dantegd) URL: #6464

…into branch-25.04-merge-25.02-2

Co-authored-by: Simon Adorf <[email protected]> Co-authored-by: Jake Awe <[email protected]> Co-authored-by: William Hicks <[email protected]>

This PR adds the `conda-python-scikit-learn-accel-tests` job to the nightly test workflow. This ensures that scikit-learn acceleration tests are run as part of the nightly test suite, matching the behavior in the PR workflow. Authors: - Simon Adorf (https://github.com/csadorf) - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) - James Lamb (https://github.com/jameslamb) URL: #6457

Adds the `.solver_` estimated attribute in addition to the `.solver` hyperparameter. Switches the default cuml `solver` hyperparameter from "eig" to "auto" (backwards-compatible). Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Tim Head (https://github.com/betatim) URL: #6415

Skip flaky test_rf_classification_seed test in combination with cudf.pandas. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) URL: #6500

## Overview This PR adds limited support for array-like inputs (lists and tuples) in cuML's API ingestion framework and the `CumlArray` class when the accelerator is active. This enhancement improves usability by allowing users to directly pass Python lists and tuples as inputs without requiring explicit conversion to NumPy arrays. ## Changes - Added `support_array_like` function in `api_decorators.py` to automatically convert list/tuple inputs to NumPy arrays when the accelerator is active - Modified `CumlArray` initialization to handle list/tuple inputs by converting them to NumPy arrays - Added comprehensive tests for both new functionalities in `test_array_like_input.py` ## Example Usage In combination with the accel mode(!): ```python import numpy as np from cuml.linear_model import LinearRegression # Before: Required explicit conversion X = [[1, 2], [3, 4]] model = LinearRegression() model.fit(np.array(X), [1, 2]) # Had to convert list to array # After: Works directly with lists model.fit(X, [1, 2]) # Lists are automatically converted ``` ## Testing The new functionality is tested in `test_array_like_input.py` with test cases covering: - List and tuple inputs - Nested structures - Mixed list/tuple inputs - Different data types (int, float) - Edge cases (empty lists/tuples, single elements) The tests are designed to run only when the accelerator is active, ensuring compatibility with the existing codebase. <details> <summary>sklearn tests fixed with a0735a4</summary> ```diff --- passing-with-2eff6289c7f9de8942bee4b061cfd4e62876aced.txt 2025-03-19 13:34:57.927489370 -0500 +++ passing-with-a0735a44d18bc1265c0e5baf99c4a1b0899937de.txt 2025-03-19 15:09:46.581269583 -0500 @@ -11,9 +11,13 @@ sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier[kwargs0-expected_names0] sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier[kwargs1-expected_names1] sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier_error +sklearn.feature_selection.tests.test_from_model::test_max_features_array_like[<lambda>] +sklearn.feature_selection.tests.test_from_model::test_max_features_array_like[2] sklearn.linear_model.tests.test_base::test_linear_regression_positive sklearn.linear_model.tests.test_coordinate_descent::test_lasso_positive_constraint sklearn.linear_model.tests.test_coordinate_descent::test_enet_positive_constraint +sklearn.linear_model.tests.test_coordinate_descent::test_lasso_non_float_y[ElasticNet] +sklearn.linear_model.tests.test_coordinate_descent::test_lasso_non_float_y[Lasso] sklearn.model_selection.tests.test_validation::test_cross_val_predict_input_types[coo_matrix] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] ``` ``` sklearn.cluster.tests.test_dbscan::test_input_validation sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[full] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[covariance_eigh] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[arpack] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[randomized] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[auto] sklearn.ensemble.tests.test_forest::test_regressor_attributes[RandomForestRegressor] sklearn.ensemble.tests.test_voting::test_n_features_in[VotingRegressor] sklearn.ensemble.tests.test_voting::test_n_features_in[VotingClassifier] sklearn.ensemble.tests.test_voting::test_get_features_names_out_regressor sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier[kwargs0-expected_names0] sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier[kwargs1-expected_names1] sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier_error sklearn.feature_selection.tests.test_from_model::test_max_features_array_like[<lambda>] sklearn.feature_selection.tests.test_from_model::test_max_features_array_like[2] sklearn.linear_model.tests.test_base::test_linear_regression_positive sklearn.linear_model.tests.test_coordinate_descent::test_lasso_positive_constraint sklearn.linear_model.tests.test_coordinate_descent::test_enet_positive_constraint sklearn.linear_model.tests.test_coordinate_descent::test_lasso_non_float_y[ElasticNet] sklearn.linear_model.tests.test_coordinate_descent::test_lasso_non_float_y[Lasso] sklearn.model_selection.tests.test_validation::test_cross_val_predict_input_types[coo_matrix] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_estimators[MultiOutputRegressor(estimator=Ridge())-check_regressors_train] sklearn.tests.test_common::test_estimators[MultiOutputRegressor(estimator=Ridge())-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[MultiOutputRegressor(estimator=Ridge())-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_estimators[OneVsRestClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[OneVsRestClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[OneVsRestClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_grid={'ridge__alpha':[0.1,1.0]})-check_regressors_train] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_grid={'ridge__alpha':[0.1,1.0]})-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_grid={'ridge__alpha':[0.1,1.0]})-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_classifiers_train] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] ``` </details> <details> <summary>sklearn tests fixed with 2eff628</summary> ``` sklearn.cluster.tests.test_dbscan::test_input_validation sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[full] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[covariance_eigh] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[arpack] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[randomized] sklearn.decomposition.tests.test_pca::test_pca_check_projection_list[auto] sklearn.ensemble.tests.test_forest::test_regressor_attributes[RandomForestRegressor] sklearn.ensemble.tests.test_voting::test_n_features_in[VotingRegressor] sklearn.ensemble.tests.test_voting::test_n_features_in[VotingClassifier] sklearn.ensemble.tests.test_voting::test_get_features_names_out_regressor sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier[kwargs0-expected_names0] sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier[kwargs1-expected_names1] sklearn.ensemble.tests.test_voting::test_get_features_names_out_classifier_error sklearn.linear_model.tests.test_base::test_linear_regression_positive sklearn.linear_model.tests.test_coordinate_descent::test_lasso_positive_constraint sklearn.linear_model.tests.test_coordinate_descent::test_enet_positive_constraint sklearn.model_selection.tests.test_validation::test_cross_val_predict_input_types[coo_matrix] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[CalibratedClassifierCV(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_estimators[MultiOutputRegressor(estimator=Ridge())-check_regressors_train] sklearn.tests.test_common::test_estimators[MultiOutputRegressor(estimator=Ridge())-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[MultiOutputRegressor(estimator=Ridge())-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_estimators[OneVsRestClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[OneVsRestClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[OneVsRestClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train] sklearn.tests.test_common::test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_grid={'ridge__alpha':[0.1,1.0]})-check_regressors_train] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_grid={'ridge__alpha':[0.1,1.0]})-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_grid={'ridge__alpha':[0.1,1.0]})-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_classifiers_train] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[GridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_grid={'logisticregression__C':[0.1,1.0]})-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),min_resources='smallest',param_grid={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingGridSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),min_resources='smallest',param_grid={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[RandomizedSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('ridge',Ridge())]),param_distributions={'ridge__alpha':[0.1,1.0]},random_state=0)-check_regressors_train(readonly_memmap=True,X_dtype=float32)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True)] sklearn.tests.test_common::test_search_cv[HalvingRandomSearchCV(cv=2,error_score='raise',estimator=Pipeline(steps=[('pca',PCA()),('logisticregression',LogisticRegression())]),param_distributions={'logisticregression__C':[0.1,1.0]},random_state=0)-check_classifiers_train(readonly_memmap=True,X_dtype=float32)] ``` </details> Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) URL: #6442

- Update base linear model prediction to handle multi-target scenarios - Improve handling of intercept for multi-target predictions - Add CPU dispatch method for Ridge regression with multi-target support --- 21b2894..86afee1 ## Test Summaries |Metric|Baseline|Current| |------|--------|--------| |total|1845|1845| |failures|318|278| |errors|0|0| |skipped|140|140| |time|56.615|52.674| <details> <summary> Tests that failed in baseline but passed in current </summary> ``` linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-8-asarray-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-8-asarray-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-8-csr_array-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-8-csr_matrix-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-20-csr_matrix-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-8-csr_matrix-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-20-csr_array-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-8-csr_matrix-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-8-asarray-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-20-csr_array-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-20-asarray-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-8-csr_matrix-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-8-csr_matrix-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-8-csr_array-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-20-csr_matrix-eigen] linear_model.tests.test_ridge.test_ridge_intercept linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-20-asarray-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-20-csr_array-eigen] tests.test_kernel_ridge.test_kernel_ridge_multi_output linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-8-csr_array-svd] linear_model.tests.test_ridge.test_dense_sparse[csr_matrix-_test_multi_ridge_diabetes] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-20-csr_array-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-8-csr_matrix-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-20-csr_array-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-8-csr_array-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-20-csr_matrix-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-20-csr_matrix-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-20-asarray-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-20-csr_matrix-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-8-asarray-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-20-csr_matrix-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-8-asarray-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-20-asarray-svd] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-20-csr_array-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-20-asarray-svd] linear_model.tests.test_ridge.test_dense_sparse[csr_array-_test_multi_ridge_diabetes] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-20-asarray-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape1-True-20.0-8-csr_array-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape3-False-30.0-8-asarray-eigen] linear_model.tests.test_ridge.test_ridge_gcv_sample_weights[y_shape2-True-150.0-8-csr_array-eigen] ``` </details> ## Summary |Category|Count| |--------|-----| |Regressions|0| |Fixes|40| |Skip Changes|0| |Added Tests|0| |Removed Tests|0| Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Jim Crist-Harif (https://github.com/jcrist) URL: #6414

This PR addresses two issues that are currently blocking the cuML CI on the 25.04 release branch: 1. Out-of-memory (OOM) errors occurring in SVM tests on CUDA 11.8: - Several SVM-related tests, particularly `test_svc_methods`, are failing with OOM errors and segmentation faults - This only surfaces with CUDA 11.8 and is likely due to memory allocation patterns - As a temporary workaround, we skip these tests on CUDA 11.8 while the root cause is investigated 2. XGBoost test dependency compatibility: - XGBoost 3.0.0 has a known issue that manifests when using older NVIDIA drivers with recent CUDA toolkit versions (dmlc/xgboost#11397) - To maintain stability, we constrain the XGBoost test dependency to versions < 3.0.0 - This ensures consistent test behavior across different driver/toolkit combinations We expect to remove the constraint on the xgboost version once the issue is resolved in a future xgboost release. We expect to be able to address the SVM test issue by reducing its memory footprint (see #6514), however here we are taking a more conservative approach to ensure that the CI pipeline is stable. The remaining failing CI job is _optional_, the issue is going to be addressed on branch-25.06.

csadorf and others added 11 commits February 12, 2025 16:02

Log UMAP arrays at trace verbosity level. (#6274)

448b624

Reduce the UMAP logging verbosity. Avoids printing potentially large arrays.

Update Changelog [skip ci]

f166d88

Remove straggling click option (#6381)

8df3559

PR removes a wrong `click` library option that was present in the CLI functionality.

FIX merge conflict resolution

e57cda5

FIX style fixes

e5ccd49

FIX forgot one conflict

f72bc92

FIX final conflict

3578c53

Merge branch 'branch-25.04' into branch-25.04-merge-25.02-2

886a7a7

Fix dependencies.

674a8cf

rapids-bot bot requested review from a team as code owners March 13, 2025 20:14

rapids-bot bot requested review from AyodeAwe, dantegd and cjnolet March 13, 2025 20:14

github-actions bot added conda conda issue Cython / Python Cython or Python issue CMake CUDA/C++ labels Mar 13, 2025

rapids-bot bot requested a review from a team as a code owner March 14, 2025 14:06

github-actions bot added the ci label Mar 14, 2025

wphicks and others added 4 commits March 14, 2025 21:49

dantegd and others added 29 commits March 19, 2025 14:35

Solve merge conflicts

5311cca

Merge branch 'branch-25.04-merge-25.02-2' of github.com:dantegd/cuml …

8f70a94

…into branch-25.04-merge-25.02-2

Solve merge conflicts

8cdc861

Merge branch 'branch-25.04' into branch-25.04-merge-25.02-2

f7f6757

Don't use private APIs in FIL notebook (#6488)

50152d4

Previously this notebook used a couple internal `cuml` APIs. This PR switches them for public APIs instead. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) URL: #6488

Merge branch 'branch-25.04' into branch-25.04-merge-25.02-2

c88a826

Merge branch 'branch-25.04-merge-25.02-2' of github.com:dantegd/cuml …

34150b6

…into branch-25.04-merge-25.02-2

Merge commit history of 25.02 into 25.04 (#6495)

21b2894

Co-authored-by: Simon Adorf <[email protected]> Co-authored-by: Jake Awe <[email protected]> Co-authored-by: William Hicks <[email protected]>

Skip test_rf_classification_seed for cudf.pandas tests. (#6500)

56d9add

Skip flaky test_rf_classification_seed test in combination with cudf.pandas. Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) URL: #6500

Update Changelog [skip ci]

0b6a344

AyodeAwe merged commit 3a85501 into branch-25.06 Apr 10, 2025
86 of 87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward-merge branch-25.04 into branch-25.06 #6435

Forward-merge branch-25.04 into branch-25.06 #6435

rapids-bot bot commented Mar 13, 2025

rapids-bot bot commented Mar 13, 2025

Forward-merge branch-25.04 into branch-25.06 #6435

Forward-merge branch-25.04 into branch-25.06 #6435

Conversation

rapids-bot bot commented Mar 13, 2025

rapids-bot bot commented Mar 13, 2025