Towards Quantifying the Effect of Datasets for Benchmarking

This repository contains the source code to reproduce the results and analysis of the paper

Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning Ravin Kohli, Matthias Feurer, Bernd Bischl, Katharina Eggensperger, Frank Hutter Data-centric Machine Learning Research (DMLR) Workshop at ICLR 2024

The code is provided as-is and we will neither maintain it nor provide bug fixes.

Installation

git clone https://github.com/automl/dmlr-iclr24-datasets-for-benchmarking
cd tabular_data_experiments
conda create -n tabular_data_experiments python=3.10
conda activate tabular_data_experiments
conda install swig

# Install for usage
pip install .

# Install for development
make install-dev

3rd-party source code

Our code is heavily inspired by the great source code published alongside the paper Why do tree-based models still outperform deep learning on tabular data? by Leo Grinsztajn, Edouard Oyallon and Gael Varoquaux.

Data

The raw data can be found here.

Visualizations

We provide the following notebooks for visualization:

dataset_and_suites.ipynb

Contains code that creates the table used throughout the paper.

result_analysis.ipynb

Contains code that creates the figures used throughout the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
configs		configs
scripts		scripts
splits_pq		splits_pq
tabular_data_experiments		tabular_data_experiments
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
configs.zip		configs.zip
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Towards Quantifying the Effect of Datasets for Benchmarking

Installation

3rd-party source code

Data

Visualizations

dataset_and_suites.ipynb

result_analysis.ipynb

About

Licenses found

Releases

Packages

Languages

License

Licenses found

automl/dmlr-iclr24-datasets-for-benchmarking

Folders and files

Latest commit

History

Repository files navigation

Towards Quantifying the Effect of Datasets for Benchmarking

Installation

3rd-party source code

Data

Visualizations

dataset_and_suites.ipynb

result_analysis.ipynb

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages