Skip to content

Latest commit

 

History

History
105 lines (65 loc) · 2.6 KB

File metadata and controls

105 lines (65 loc) · 2.6 KB

SWH

MLRC 2022: If you like Shapley, then you'll love the core

This repository contains code to reproduce the paper If You Like Shapley Then You’ll Love the Core for the ML Reproducibility Challenge 2022.

Getting Started

We use Python version 3.10 for this repository.

We use Poetry for dependency management. More specifically version 1.2.0.

After installing Poetry, run the following command to create a virtual environment and install all dependencies:

poetry install

You can then activate the virtual environment using:

poetry shell

Experiments

We use DVC to run the experiments and track their results.

To reproduce all results use:

dvc repro

Feature Valuation

Least Core

To reproduce the results of this experiment use:

dvc repro feature-valuation-least-core

You can find the results under output/feature_valuation_least_core.

Data Valuation

Synthetic Data

To reproduce the results of this experiment use:

dvc repro data-valuation-synthetic

You can find the results under output/data_valuation_synthetic.

Dog vs Fish Dataset

Note: This experiment requires downloading the imagenet-1k dataset from HuggingFace Datasets. For that you need to first create an account and then login using the huggingface-cli tool.

To reproduce the results of this experiment use:

dvc repro data-valuation-dog-vs-fish

You can find the results under output/data_valuation_dog_vs_fish.

Fixing Misalabeled Data

To reproduce the results of this experiment use:

dvc repro fixing-mislabeled-data

You can find the results under output/fixing_mislabeled_data.

Noisy Data

To reproduce the results of this experiment use:

dvc repro noisy-data

You can find the results under output/noisy_data.

Contributing

Make sure to install the pre-commit hooks:

pre-commit install