TabCSDI: Diffusion models for missing value imputation in tabular data

This is the repo for the workshop paper: Diffusion models for missing value imputation in tabular data | OpenReview.

Setup

pip install -r requirements.txt

Running experiments

We provide 3 datasets, including Breast (original), Breast (diagnostic), and Census datasets. For census datasets, three categorical variable handling methods are provided.

Run pure numerical datasets experiments:

Breast (original) dataset

python exe_breast.py

Breast (diagnostic) dataset

python exe_breastD.py

Run mixed datatypes experiments with census dataset:

Using feature tokenization for categorical variables

python exe_census_ft.py

Using analog bits encoding for categorical variables

python exe_census_analog.py

Using one-hot encoding for categorical variables

python exe_census_onehot.py

Acknowledgements

The code repo is built upon the CSDI repo.

Reference

If you find our code useful or use it in your work, please cite the following paper:

@inproceedings{tashiro2021csdi,
  title={Diffusion models for missing value imputation in tabular data},
  author={Zheng, Shuhan and Charoenphakdee, Nontawat},
  booktitle={NeurIPS Table Representation Learning (TRL) Workshop},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
data_breast		data_breast
data_breastD		data_breastD
data_census_analog		data_census_analog
data_census_ft		data_census_ft
data_census_onehot		data_census_onehot
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
dataset_breast.py		dataset_breast.py
dataset_breastD.py		dataset_breastD.py
dataset_census_analog.py		dataset_census_analog.py
dataset_census_ft.py		dataset_census_ft.py
dataset_census_onehot.py		dataset_census_onehot.py
exe_breast.py		exe_breast.py
exe_breastD.py		exe_breastD.py
exe_census_analog.py		exe_census_analog.py
exe_census_ft.py		exe_census_ft.py
exe_census_onehot.py		exe_census_onehot.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TabCSDI: Diffusion models for missing value imputation in tabular data

Setup

Running experiments

Acknowledgements

Reference

About

Releases

Packages

Contributors 2

Languages

License

pfnet-research/TabCSDI

Folders and files

Latest commit

History

Repository files navigation

TabCSDI: Diffusion models for missing value imputation in tabular data

Setup

Running experiments

Acknowledgements

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages