Skip to content

Commit fba79e4

Browse files
authored
Unblock CI for 25.04 (#6519)
This PR addresses two issues that are currently blocking the cuML CI on the 25.04 release branch: 1. Out-of-memory (OOM) errors occurring in SVM tests on CUDA 11.8: - Several SVM-related tests, particularly `test_svc_methods`, are failing with OOM errors and segmentation faults - This only surfaces with CUDA 11.8 and is likely due to memory allocation patterns - As a temporary workaround, we skip these tests on CUDA 11.8 while the root cause is investigated 2. XGBoost test dependency compatibility: - XGBoost 3.0.0 has a known issue that manifests when using older NVIDIA drivers with recent CUDA toolkit versions (dmlc/xgboost#11397) - To maintain stability, we constrain the XGBoost test dependency to versions < 3.0.0 - This ensures consistent test behavior across different driver/toolkit combinations We expect to remove the constraint on the xgboost version once the issue is resolved in a future xgboost release. We expect to be able to address the SVM test issue by reducing its memory footprint (see #6514), however here we are taking a more conservative approach to ensure that the CI pipeline is stable. The remaining failing CI job is _optional_, the issue is going to be addressed on branch-25.06.
1 parent 3975ae8 commit fba79e4

File tree

7 files changed

+15
-6
lines changed

7 files changed

+15
-6
lines changed

conda/environments/all_cuda-118_arch-aarch64.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,5 +77,5 @@ dependencies:
7777
- sysroot_linux-aarch64==2.28
7878
- treelite==4.4.1
7979
- umap-learn==0.5.6
80-
- xgboost>=2.1.0
80+
- xgboost>=2.1.0,<3.0.0
8181
name: all_cuda-118_arch-aarch64

conda/environments/all_cuda-118_arch-x86_64.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,5 +77,5 @@ dependencies:
7777
- sysroot_linux-64==2.28
7878
- treelite==4.4.1
7979
- umap-learn==0.5.6
80-
- xgboost>=2.1.0
80+
- xgboost>=2.1.0,<3.0.0
8181
name: all_cuda-118_arch-x86_64

conda/environments/all_cuda-128_arch-aarch64.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,5 +73,5 @@ dependencies:
7373
- sysroot_linux-aarch64==2.28
7474
- treelite==4.4.1
7575
- umap-learn==0.5.6
76-
- xgboost>=2.1.0
76+
- xgboost>=2.1.0,<3.0.0
7777
name: all_cuda-128_arch-aarch64

conda/environments/all_cuda-128_arch-x86_64.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,5 +73,5 @@ dependencies:
7373
- sysroot_linux-64==2.28
7474
- treelite==4.4.1
7575
- umap-learn==0.5.6
76-
- xgboost>=2.1.0
76+
- xgboost>=2.1.0,<3.0.0
7777
name: all_cuda-128_arch-x86_64

dependencies.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -493,7 +493,7 @@ dependencies:
493493
- pytest-xdist
494494
- seaborn
495495
- *scikit_learn
496-
- &xgboost xgboost>=2.1.0
496+
- &xgboost xgboost>=2.1.0,<3.0.0
497497
- statsmodels
498498
- umap-learn==0.5.6
499499
- pynndescent

python/cuml/cuml/tests/test_device_selection.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@
7272
from cuml.internals.safe_imports import cpu_only_import
7373

7474
np = cpu_only_import("numpy")
75+
cp = gpu_only_import("cupy")
7576
pd = cpu_only_import("pandas")
7677
cudf = gpu_only_import("cudf")
7778

@@ -1121,6 +1122,14 @@ def test_svc_methods(
11211122
class_type,
11221123
probability,
11231124
):
1125+
# Skip on CUDA 11.8 due to segfaults in decision_function
1126+
# See: https://github.com/rapidsai/cuml/issues/6480
1127+
if 11080 <= cp.cuda.runtime.runtimeGetVersion() < 11090:
1128+
pytest.skip(
1129+
"Skipping test_svc_methods on CUDA 11.8 due to segfaults in "
1130+
"decision_function (#6480)"
1131+
)
1132+
11241133
if class_type == "single_class":
11251134
X_train = X_train_class
11261135
y_train = y_train_class

python/cuml/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ test = [
142142
"seaborn",
143143
"statsmodels",
144144
"umap-learn==0.5.6",
145-
"xgboost>=2.1.0",
145+
"xgboost>=2.1.0,<3.0.0",
146146
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
147147

148148
[project.urls]

0 commit comments

Comments
 (0)