Skip to content

Code and results accompanying the DMLR@ICLR'24 paper "Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning"

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE.txt
Notifications You must be signed in to change notification settings

automl/dmlr-iclr24-datasets-for-benchmarking

Towards Quantifying the Effect of Datasets for Benchmarking

This repository contains the source code to reproduce the results and analysis of the paper

Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning Ravin Kohli, Matthias Feurer, Bernd Bischl, Katharina Eggensperger, Frank Hutter Data-centric Machine Learning Research (DMLR) Workshop at ICLR 2024

The code is provided as-is and we will neither maintain it nor provide bug fixes.

Installation

git clone https://github.com/automl/dmlr-iclr24-datasets-for-benchmarking
cd tabular_data_experiments
conda create -n tabular_data_experiments python=3.10
conda activate tabular_data_experiments
conda install swig

# Install for usage
pip install .

# Install for development
make install-dev

3rd-party source code

Our code is heavily inspired by the great source code published alongside the paper Why do tree-based models still outperform deep learning on tabular data? by Leo Grinsztajn, Edouard Oyallon and Gael Varoquaux.

Data

The raw data can be found here.

Visualizations

We provide the following notebooks for visualization:

dataset_and_suites.ipynb

Contains code that creates the table used throughout the paper.

result_analysis.ipynb

Contains code that creates the figures used throughout the paper.

About

Code and results accompanying the DMLR@ICLR'24 paper "Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning"

Resources

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published