Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation via Nearest Neighbour algorithms #246

Open
Catarina-Alves opened this issue Jun 9, 2021 · 1 comment
Open

Data augmentation via Nearest Neighbour algorithms #246

Catarina-Alves opened this issue Jun 9, 2021 · 1 comment
Labels
feature To add a new feature, new standalone files. (High level)

Comments

@Catarina-Alves
Copy link
Collaborator

Catarina-Alves commented Jun 9, 2021

It could be nice to include a class that encapsulates data augmentation via Nearest Neighbour-inspired algorithms such as SMOTE (Synthetic Minority Over-sampling Technique), ADASYN etc. @tallamjr developed some code for this, and it is saved in utils/imblearn_augment.py.

I propose to implement this data augmentation methodology in snaugment. This involves testing and developing unit tests. Note that, in previous analysis, we found that SMOTE augmentation leads to information leaks in the classification step. Thus this must be checked when implementing this augmentation.

File: snaugment.py, utils/imblearn_augment.py

@Catarina-Alves Catarina-Alves added the feature To add a new feature, new standalone files. (High level) label Jun 9, 2021
@Catarina-Alves
Copy link
Collaborator Author

While we do not find this code to work for our imbalanced problem, it might be useful for someone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature To add a new feature, new standalone files. (High level)
Projects
None yet
Development

No branches or pull requests

1 participant