Skip to content

Commit 5a9aa41

Browse files
authored
Merge pull request #198 from adrinjalali/dev-api
2 parents b534b86 + 0855997 commit 5a9aa41

File tree

1 file changed

+141
-0
lines changed

1 file changed

+141
-0
lines changed

_posts/2024-12-05-dev-api.md

+141
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
---
2+
#### Blog Post Template ####
3+
4+
#### Post Information ####
5+
title: "Changes and development of scikit-learn's developer API"
6+
date: December 12, 2024
7+
8+
#### Post Category and Tags ####
9+
# Format in titlecase without dashes (Ex. "Open Source" instead of "open-source")
10+
categories:
11+
- Updates
12+
tags:
13+
- Open Source
14+
- Machine Learning
15+
- License
16+
17+
#### Featured Image ####
18+
featured-image: BSD_watermark.svg
19+
20+
#### Author Info ####
21+
# Can accomodate multiple authors
22+
# Add SQUARE Author Image to /assets/images/author_images/ folder
23+
postauthors:
24+
- name: Adrin Jalali
25+
website: https://adrin.info/
26+
image: adrin-jalali.jpeg
27+
---
28+
<div>
29+
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt="">
30+
{% include postauthor.html %}
31+
</div>
32+
33+
Historically, scikit-learn's API has been divided into public and private. Public API is
34+
intended to be used by users, and private API is used internally in scikit-learn to
35+
develop new features and estimators. However, many of those functionalities have become
36+
essential to develop scikit-learn estimators by third parties who develop them outside
37+
the scikit-learn codebase.
38+
39+
When it comes to our public API, we have very strict and high standards on backward
40+
compatibility. The rule of thumb is that no change should cause a change in users'
41+
code unless we warn about it for two release cycles, which means we give users a year
42+
time to update their code.
43+
44+
On the other hand, we have no such guarantees or constraints on our private API. This
45+
brings an issue to third party developers who would like to use methods used by
46+
scikit-learn developers to develop their estimators. Constantly changing private API
47+
without prior warning brings certain challenges to third party developers which is not
48+
ideal.
49+
50+
As a result, we've been working on creating a developer API which would sit somewhere
51+
between our public and private API in terms of backward compatibility. That means we
52+
intend to try to keep that API stable, and if needed, introduce changes with one release
53+
cycle warning.
54+
55+
In the past few releases, we've slowly introduced more functionalities under this
56+
umbrella. `__sklearn_clone__` and `__sklearn_is_fitted__` are two examples.
57+
58+
In the 1.6 release, we focused on the testing infrastructure and estimator tag system.
59+
Estimator tags used to be private, and we were not sure about their design. In the 1.6
60+
release, new tags are introduced and using them looks like the following:
61+
62+
```python
63+
from sklearn.base import BaseEstimator, ClassifierMixin
64+
65+
class MyEstimator(ClassifierMixin, BaseEstimator):
66+
67+
...
68+
69+
def __sklearn_tags__(self):
70+
tags = super().__sklearn_tags__()
71+
# modify tags here
72+
tags.non_deterministic = True
73+
return tags
74+
```
75+
76+
The new tags mostly follow the same structure as the old tags, but there are certain
77+
changes to them. The main change is that the old `_xfail_checks` is no longer present
78+
in the new tags. That tag was used to tell the common testing tools about the tests
79+
which are known to fail and are to be skipped. That information is now directly passed
80+
to the test functionalities. The old way of skipping a test was the following:
81+
82+
```python
83+
from sklearn.base import BaseEstimator, ClassifierMixin
84+
85+
class MyEstimator(ClassifierMixin, BaseEstimator):
86+
87+
...
88+
89+
def _more_tags(self):
90+
return {
91+
"_xfail_checks": {
92+
"check_to_skip_name": "this check is known to fail",
93+
...
94+
}
95+
}
96+
```
97+
98+
And then when calling `check_estimator` or using `parametrize_with_checks` with `pytest`
99+
would automatically ignore those tests for the estimator.
100+
101+
Instead, in this release, you pass that information directly to those methods:
102+
103+
```python
104+
from sklearn.utils.estimator_checks import check_estimator, parametrize_with_checks
105+
106+
CHECKS_EXPECTED_TO_FAIL = {
107+
"check_to_skip_name": "this check is known to fail",
108+
...
109+
}
110+
111+
# Using check_estimator
112+
def test_with_check_estimator():
113+
check_estimator(MyEstimator(), expected_failed_checks=CHECKS_EXPECTED_TO_FAIL)
114+
115+
# Using parametrize_with_checks
116+
@parametrize_with_checks(
117+
[MyEstimator()],
118+
expected_failed_checks=lambda est: CHECKS_EXPECTED_TO_FAIL
119+
)
120+
def test_with_parametrize_with_checks(estimator, check):
121+
check(estimator)
122+
```
123+
124+
While working on the testing infrastructure, we have also been working on improving our
125+
tests and that means in this release we had a particularly high number of changes in
126+
their names and what they do. The changes will make it easier for developers to fix
127+
issues with their estimators. Note that you can now pass `legacy=False` to both
128+
`check_estimator` and `parametrize_with_checks` to include only strictly API related
129+
tests.
130+
131+
The above changes mean developers need to update their estimators and depending on
132+
what they use, write scikit-learn version specific code to handle supporting multiple
133+
scikit-learn versions. To make that process easier, we've worked on a package called
134+
[`sklearn_compat`](https://github.com/sklearn-compat/sklearn-compat/). You can either
135+
depend on it as a package dependency, or vendor a single file inside your project. At
136+
the moment this project is in its infancy and might change in the future. But hopefully
137+
it helps developers out there.
138+
139+
If you think there are missing functionalities in the developer API, please let us know
140+
and give us feedback on our [issue tracker](
141+
https://github.com/scikit-learn/scikit-learn/issues).

0 commit comments

Comments
 (0)