NOTE: The code for our method is in src
, and the Python script for running experiments using our method is fairtabddpm_opt.py
.
The PyTorch version we used in this project is 2.3.0+cu121
, and you can install the required packages by running the following command:
conda create -n ai python=3.10
source activate ai
pip install -r requirements.txt
pip install dgl -f https://data.dgl.ai/wheels/torch-2.3/cu121/repo.html
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
To download and preprocess the datasets, run the following command:
python build.py
Under the root directory, run the following commands to reproduce the results of our method:
# run experiments for our method
bash fairtabddpm.sh
To reproduce the results of baseline methods, run the following commands:
# go to baselines directory
cd baselines
# run experiments for baselines
bash codi.sh
bash fairsmote.sh
bash fairtabgan.sh
bash goggle.sh
bash great.sh
bash smote.sh
bash stasy.sh
bash tabddpm.sh
bash tabsyn.sh
- Adult
- COMPASS
- German Credit
- Bank Marketing
The baseline methods we used in this project are as follows (sorted alphabetically):
- CoDi
- Goggle
- GReaT
- SMOTE
- STaSy
- TabDDPM
- TabSyn
- Fair Class Balancing (FCB)
- FairTGAN
Avoid repeatition to improve the code quality:
- Replace
exp_config['home']
by importingEXPS_PATH
fromconstant.py
in all running scripts - Replace
data_config['path']
by importingDB_PATH
fromconstant.py
in all running scripts - Delete home of experiments and path of datasets in all
config.toml
files - Add a new argument
--method
to optimization scripts and merge all optimization scripts into one - Find commonly used functions in all running scripts and move them to
utils.py
Organize the code:
- Move
fairtabddpm.sh
,fairtabddpm_run.py
,fairtabddpm_opt.py
tobaseline
directory and renamebaseline
directory tomethods
, and editreadme.md
accordingly - Move
src/evaluate/metrics.py
out to the root directory because it is specific to the project
Automate the experiments and evaluations:
- Refactor and reorganize
assess/present.ipynb
with functional programming - Rewrite all the code in
assess
directory with functional programming
Correct the errors:
- The implementation of TabSyn in
baselines
is incorrect