mislabeled

Model-probing mislabeled examples detection in machine learning datasets

A ModelProbingDetector assigns trust_scores to training examples $(x, y)$ from a dataset by probing an Ensemble of machine learning model.

Install

pip install git+https://github.com/orange-opensource/mislabeled

Find suspicious digits in MNIST

1. Train a MLP on MNIST

X, y = fetch_openml("mnist_784", return_X_y=True, as_frame=False)
y = LabelEncoder().fit_transform(y)
mlp = make_pipeline(MinMaxScaler(), MLPClassifier())
mlp.fit(X, y)

2. Compute Representer values of the MLP

probe = Representer()
representer_values = probe(mlp, X, y)

3. Inspect your training data

supicious = np.argsort(-representer_values)[0:top_k]
for i in suspicious:
  plt.imshow(X[i].reshape(28, 28))

4. Wanna get the variance of the Representer values during training ?

detector = ModelProbingDetector(mlp, Representer(), ProgressiveEnsemble(), "var")
var_representer_values = detector.trust_scores(X, y)

Predefined detectors

Detector	Paper	Code (`from mislabeled.detect.detectors`)
Area Under the Margin (AUM)	NeurIPS 2020	`import AreaUnderMargin`
Influence	Paper 1974	`import InfluenceDetector`
Representer	Paper 1972	`import RepresenterDetector`
TracIn	NeurIPS 2020	`import TracIn`
Forget Scores	ICLR 2019	`import ForgetScores`
VoG	CVPR 2022	`import VoLG, VoSG, LinearVoSG`
Small Loss	ICML 2018	`import SmallLoss`
CleanLab	JAIR 2021	`import ConfidentLearning`
Consensus (C-Scores)	Applied Intelligence 2011	`import ConsensusConsistency`
AGRA	ECML 2023	`import AGRA`

and other limitless combinations by using ModelProbingDetector with any probe and Ensembles from the library.

Tutorials

For more details and examples, check the notebooks !

Paper

If you use this library in a research project, please consider citing the corresponding paper with the following bibtex entry:

@article{george2024mislabeled,
  title={Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark},
  author={Thomas George and Pierre Nodet and Alexis Bondu and Vincent Lemaire},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2024},
  url={https://openreview.net/forum?id=3YlOr7BHkx},
  note={}
}

Development

Install hatch.

To format and lint:

hatch fmt

To run tests:

hatch test

Name		Name	Last commit message	Last commit date
Latest commit History 613 Commits
.github/workflows		.github/workflows
examples		examples
mislabeled		mislabeled
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mislabeled

Install

Find suspicious digits in MNIST

1. Train a MLP on MNIST

2. Compute Representer values of the MLP

3. Inspect your training data

4. Wanna get the variance of the Representer values during training ?

Predefined detectors

Tutorials

Paper

Development

About

Releases

Packages

Contributors 23

Languages

License

Orange-OpenSource/mislabeled

Folders and files

Latest commit

History

Repository files navigation

mislabeled

Install

Find suspicious digits in MNIST

1. Train a MLP on MNIST

2. Compute Representer values of the MLP

3. Inspect your training data

4. Wanna get the variance of the Representer values during training ?

Predefined detectors

Tutorials

Paper

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 23

Languages

Packages