Data augmentation via Nearest Neighbour algorithms #246

Catarina-Alves · 2021-06-09T14:59:55Z

It could be nice to include a class that encapsulates data augmentation via Nearest Neighbour-inspired algorithms such as SMOTE (Synthetic Minority Over-sampling Technique), ADASYN etc. @tallamjr developed some code for this, and it is saved in utils/imblearn_augment.py.

I propose to implement this data augmentation methodology in snaugment. This involves testing and developing unit tests. Note that, in previous analysis, we found that SMOTE augmentation leads to information leaks in the classification step. Thus this must be checked when implementing this augmentation.

File: snaugment.py, utils/imblearn_augment.py

The text was updated successfully, but these errors were encountered:

Catarina-Alves · 2021-06-16T12:01:49Z

While we do not find this code to work for our imbalanced problem, it might be useful for someone else.

Catarina-Alves added the feature To add a new feature, new standalone files. (High level) label Jun 9, 2021

Catarina-Alves mentioned this issue Jun 16, 2021

Improve snaugment #243

Merged

This was referenced Jul 10, 2021

[FEATURE] Additional augmentation techniques to be added to snaugment.py [RFC] #124

Closed

Add catch to prevent data leakage when resampling data [feature/124/augmentation] #179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data augmentation via Nearest Neighbour algorithms #246

Data augmentation via Nearest Neighbour algorithms #246

Catarina-Alves commented Jun 9, 2021 •

edited

Loading

Catarina-Alves commented Jun 16, 2021

Data augmentation via Nearest Neighbour algorithms #246

Data augmentation via Nearest Neighbour algorithms #246

Comments

Catarina-Alves commented Jun 9, 2021 • edited Loading

Catarina-Alves commented Jun 16, 2021

Catarina-Alves commented Jun 9, 2021 •

edited

Loading