Add catch to prevent data leakage when resampling data [feature/124/augmentation] #179
Labels
enhancement
Improvement to existing functionality or implementation, including adding a new functions/methods.
pre-v2.0.0
Issues that should be completed prior to public release of v2.0.0
We need to ensure that when resampling to done via SMOTE, or other techniques, that there is not the risk of data leaking into the test set such that when one comes to evaluate the models it is not being tested on examples that also exist in the training set.
This can be done with a copying of the original data and perhaps checks to see if augmentation has already occurred elsewhere in the pipeline
The text was updated successfully, but these errors were encountered: