Skip to content

Orange-OpenSource/mislabeled

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mislabeled

Model-probing mislabeled examples detection in machine learning datasets

A ModelProbingDetector assigns trust_scores to training examples $(x, y)$ from a dataset by probing an Ensemble of machine learning model.

Install

pip install git+https://github.com/orange-opensource/mislabeled

Find suspicious digits in MNIST

1. Train a MLP on MNIST

X, y = fetch_openml("mnist_784", return_X_y=True, as_frame=False)
y = LabelEncoder().fit_transform(y)
mlp = make_pipeline(MinMaxScaler(), MLPClassifier())
mlp.fit(X, y)

2. Compute Representer values of the MLP

probe = Representer()
representer_values = probe(mlp, X, y)

3. Inspect your training data

supicious = np.argsort(-representer_values)[0:top_k]
for i in suspicious:
  plt.imshow(X[i].reshape(28, 28))

4. Wanna get the variance of the Representer values during training ?

detector = ModelProbingDetector(mlp, Representer(), ProgressiveEnsemble(), "var")
var_representer_values = detector.trust_scores(X, y)

Predefined detectors

Detector Paper Code (from mislabeled.detect.detectors)
Area Under the Margin (AUM) NeurIPS 2020 import AreaUnderMargin
Influence Paper 1974 import InfluenceDetector
Representer Paper 1972 import RepresenterDetector
TracIn NeurIPS 2020 import TracIn
Forget Scores ICLR 2019 import ForgetScores
VoG CVPR 2022 import VoLG, VoSG, LinearVoSG
Small Loss ICML 2018 import SmallLoss
CleanLab JAIR 2021 import ConfidentLearning
Consensus (C-Scores) Applied Intelligence 2011 import ConsensusConsistency
AGRA ECML 2023 import AGRA

and other limitless combinations by using ModelProbingDetector with any probe and Ensembles from the library.

Tutorials

For more details and examples, check the notebooks !

Paper

If you use this library in a research project, please consider citing the corresponding paper with the following bibtex entry:

@article{george2024mislabeled,
  title={Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark},
  author={Thomas George and Pierre Nodet and Alexis Bondu and Vincent Lemaire},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2024},
  url={https://openreview.net/forum?id=3YlOr7BHkx},
  note={}
}

Development

Install hatch.

To format and lint:

hatch fmt

To run tests:

hatch test

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages