PrADA: Privacy-preserving Federated Adversarial Domain Adaption over Feature Group for Interpretability

We walk through steps of running experiments on Census Income data. Experiments on PPD loan default follow the same procedure.

0. Download Data

Census Income Data
- You can download census income data from https://archive.ics.uci.edu/ml/datasets/Census-Income+(KDD).
- More detailed feature description can be found at https://docs.1010data.com/Tutorials/MachineLearningExamples/CensusIncomeDataSet.html
PPD Loan Default
- You can download census income data from https://github.com/yankang18/ppd_data

1. Prepare Census Income Data

Update the census_data_creation_config.py file.

census_data_creation = {
    "original_data_dir": "YOUR ORIGINAL DATA DIR",
    "processed_data_dir": "YOUR PROCESSED DATA DIR",
    "train_data_file_name": "census-income.data",
    "test_data_file_name": "census-income.test",
    "positive_sample_ratio": 0.04,
    "number_target_samples": 4000,
    "data_tag": "all4000pos004"
}

Then, run:

python census_prepare_data.py

This would produce data at the location you specified at "processed_data_dir"

2. Perform Experiments

In this section, we show the steps of running following four variants of PrADA

PrADA: apply feature group (FG) based domain adversarial adaptation (DA) with feature group interaction (IR).
PrADA w/o IR: apply feature group based domain adversarial adaptation without feature group interaction
PrADA w/o FG&IR: apply domain adversarial adaptation, but without feature grouping and interaction
PrADA w/o DA&FG&IR: without domain adversarial adaptation, feature grouping, and feature group interaction.

2.1 Run PrADA

Go to directory: prada/experiments/income_census/
Configure hyperparameters using train_config.py file.
- The using_interaction must be set to True

pre_train_hyperparameters = {
    "using_interaction": True,
    "momentum": 0.99,
    "weight_decay": 0.00001,
    "lr": 6e-4,
    "batch_size": 128,
    "max_epochs": 600,
    "epoch_patience": 3,
    "valid_metric": ('ks', 'auc')
}

fine_tune_hyperparameters = {
    "using_interaction": True,
    "load_global_classifier": False,
    "momentum": 0.99,
    "weight_decay": 0.0,
    "lr": 8e-4,
    "batch_size": 128,
    "valid_metric": ('ks', 'auc')
}

First run pretrain task:

python train_census_fg_adapt_pretrain.py

Once the training is completed, a pretrain task is returned, e.g.:

20210730_census_fg_adapt_all4000pos004_intrTrue_lr0.0005_bs128_me600_ts1627606557

Run finetune task with pretain task id as a input:

python train_census_fg_target_finetune.py --pretrain_task_id 20210730_census_fg_adapt_all4000pos004_intrTrue_lr0.0005_bs128_me600_ts1627606557

Output test AUC and test KS.

2.2 Run PrADA w/o IR

Go to directory: prada/experiments/income_census/
Configure hyperparameters using train_config.py file.
- The using_interaction must be set to False

pre_train_hyperparameters = {
    "using_interaction": False,
    "momentum": 0.99,
    "weight_decay": 0.00001,
    "lr": 5e-4,
    "batch_size": 128,
    "max_epochs": 600,
    "epoch_patience": 3,
    "valid_metric": ('ks', 'auc')
}

fine_tune_hyperparameters = {
    "using_interaction": False,
    "load_global_classifier": False,
    "momentum": 0.99,
    "weight_decay": 0.0,
    "lr": 8e-4,
    "batch_size": 128,
    "valid_metric": ('ks', 'auc')
}

First run pretrain task:

python train_census_fg_adapt_pretrain.py

Once the training is completed, a task is returned, e.g.:

20210730_census_fg_adapt_all4000pos004_intrFalse_lr0.0005_bs128_me600_ts1627606557

Run finetune task with pretain task id as a input:

python train_census_fg_target_finetune.py --pretrain_task_id 20210730_census_fg_adapt_all4000pos004_intrFalse_lr0.0005_bs128_me600_ts1627606557

Output test AUC and test KS.

2.3 Run PrADA w/o FG&IR

Go to directory: prada/experiments/income_census/
Configure hyperparameters using train_config.py file.
- NOTE: the using_interaction will not be used in this setting because no feature group is applied. Therefore, leave it by default.

pre_train_hyperparameters = {
    "using_interaction": False,
    "momentum": 0.99,
    "weight_decay": 0.00001,
    "lr": 5e-4,
    "batch_size": 128,
    "max_epochs": 600,
    "epoch_patience": 3,
    "valid_metric": ('ks', 'auc')
}

fine_tune_hyperparameters = {
    "using_interaction": False,
    "load_global_classifier": False,
    "momentum": 0.99,
    "weight_decay": 0.0,
    "lr": 8e-4,
    "batch_size": 128,
    "valid_metric": ('ks', 'auc')
}

First run pretrain task:

python train_census_no_fg_adapt_pretrain.py

Once the training is completed, a task is returned, e.g.:

20210730_census_no_fg_adapt_all4000pos004_lr0.0005_bs128_me600_ts1627612696

Run finetune task with pretain task id as a input:

python train_census_no_fg_target_finetune.py --pretrain_task_id 20210730_census_no_fg_adapt_all4000pos004_lr0.0005_bs128_me600_ts1627612696

Output test AUC and test KS.

2.4 Run PrADA w/o DA&FG&IR

Go to directory: prada/experiments/income_census/
Configure hyperparameters using train_config.py file.
- the train_data_tag specifies whether you use all samples (source + target) or just target samples for training.
- NOTE: apply_feature_group specifies whether applying feature grouping or not. In this setting, we always set it to False.

no_adaptation_hyperparameters = {
    "apply_feature_group": False,
    "train_data_tag": 'all',  # can be either 'all' or 'tgt'
    "momentum": 0.99,
    "weight_decay": 0.00001,
    "lr": 5e-4,
    "batch_size": 128,
    "max_epochs": 600,
    "epoch_patience": 2,
    "valid_metric": ('ks', 'auc')
}

Run task:

python train_census_no_adaptation.py

Once the training is completed, a task is returned, e.g.:

20210730_all4000pos004v8_census_no_ad_wo_fg_all_lr0.0005_bs128_ts1627613954_ve0/

Output test AUC and test KS.

2.5 Run Test

The test AUC and test KS on target test data will be given once the training is completed as shown above. You can also test the trained model on target test data with a separate command, shown as follows:

Go to directory: prada/experiments/income_census/

task_id specifies the task of traing the model that you want to test.
model_tag specifies the variant of PrADA model. It can be fg (with feature group), no_fg (with no feature group) or no_ad (with no adaptation and no feature group)

python test_census_target.py --task_id 20210731_census_fg_adapt_all4000pos004_intrFalse_lr0.0005_bs128_me600_ts1627682125@target_20210731_rt_glr_lr0.0008_bs128_ts1627683284 --model_tag fg

3. Citation

Accepted for publication in IEEE Transactions on Big Data, 2022. Please kindly cite our paper if you find this code useful for your research.

@article{kang2022prada,
  author={Kang, Yan and He, Yuanqin and Luo, Jiahuan and Fan, Tao and Liu, Yang and Yang, Qiang},
  journal={IEEE Transactions on Big Data}, 
  title={Privacy-preserving Federated Adversarial Domain Adaptation over Feature Groups for Interpretability}, 
  year={2022},
  volume={},
  number={},
  pages={1-12},
  doi={10.1109/TBDATA.2022.3188292}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PrADA: Privacy-preserving Federated Adversarial Domain Adaption over Feature Group for Interpretability

0. Download Data

1. Prepare Census Income Data

2. Perform Experiments

2.1 Run PrADA

2.2 Run PrADA w/o IR

2.3 Run PrADA w/o FG&IR

2.4 Run PrADA w/o DA&FG&IR

2.5 Run Test

3. Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

PrADA: Privacy-preserving Federated Adversarial Domain Adaption over Feature Group for Interpretability

0. Download Data

1. Prepare Census Income Data

2. Perform Experiments

2.1 Run PrADA

2.2 Run PrADA w/o IR

2.3 Run PrADA w/o FG&IR

2.4 Run PrADA w/o DA&FG&IR

2.5 Run Test

3. Citation