-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Type: EnhancementIndicates new feature requestsIndicates new feature requests
Milestone
Description
Description
All the following classes use n_neighbors
:
ADASYN
OneSidedSelection
NeighbourhoodCleaningRule
NearMiss
AllKNN
RepeatedEditedNearestNeighbours
EditedNearestNeighbours
CondensedNearestNeighbour
Whereas k_neighbors
is used with SMOTE
and all its variants.
This poses a problem with duck-typing and pipelines.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from imblearn.over_sampling import ADASYN
from imblearn.over_sampling import SMOTE
X, y = ...
smote = SMOTE()
adasyn = ADASYN()
logreg = LogisticRegression()
smote_pipe = Pipeline([('sampler', smote), ('classifier', logreg)])
adasyn_pipe = Pipeline([('sampler', adasyn), ('classifier', logreg)])
params = dict(sampler__n_neighbors=range(3, 6))
smote_grid = GridSearchCV(smote_pipe, params)
adasyn_grid = GridSearchCV(adasyn_pipe, params)
# fails due to k_neighbors instead of n_neighbors
# I am forced to make a new params dict
smote_grid.fit(X, y)
# succeeds
adasyn_grid.fit(X, y)
Expected Results
SMOTE
would benefit using n_neighbors
to have consistent API.
Versions
Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.1
SciPy 1.3.1
Scikit-Learn 0.21.3
Imbalanced-Learn 0.5.0
Metadata
Metadata
Assignees
Labels
Type: EnhancementIndicates new feature requestsIndicates new feature requests
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
glemaitre commentedon Sep 18, 2019
I see. Could make sense. It would take 2 versions for the deprecation. However, you still have some other neighbors params in the smote variants as well. It could also be an issue.
You could always create you grid on the fly:
MattEding commentedon Sep 19, 2019
I would argue that the extra
m_neighbors
parameters inSVMSMOTE
andBorderlineSMOTE
have different meaning than then/k_neighbors
found in other algorithms (and themselves). Then/k_neighbors
are used only for finding neighbors, whereasm_neighbors
looks to me that its usage is for flagging samples as'danger'
or'noise'
.I know this is a minor issue that has simple workarounds, but I felt that it was worth marking as an issue nonetheless.
glemaitre commentedon Nov 17, 2019
We could think about modifying this in 1.X since that we will have more freedom to break the API
MattEding commentedon Nov 19, 2019
Additionally, I recently noticed the inconsistency also occurs with
self.nn_
vsself.nn_k_
for non-SMOTE and SMOTE repsectively.rola93 commentedon Feb 3, 2020
hey! come here from #680
Thanks for your answer.
I know it's more or less complex and need some time for this cycle (waiting for two releases) but, is it going to start?
Thanks