$DATA
denotes the location where datasets are installed, e.g.
$DATA/
|–– office31/
|–– office_home/
|–– visda17/
Download link: https://people.eecs.berkeley.edu/~jhoffman/domainadapt/#datasets_code.
File structure:
office31/
|–– amazon/
| |–– back_pack/
| |–– bike/
| |–– ...
|–– dslr/
| |–– back_pack/
| |–– bike/
| |–– ...
|–– webcam/
| |–– back_pack/
| |–– bike/
| |–– ...
Note that within each domain folder you need to move all class folders out of the images/
folder and then delete the images/
folder.
Download link: http://hemanthdv.org/OfficeHome-Dataset/.
File structure:
office_home/
|–– art/
|–– clipart/
|–– product/
|–– real_world/
Download link: http://ai.bu.edu/visda-2017/.
The dataset can also be downloaded using our script at datasets/da/visda17.sh
. Run the following command in your terminal under Dassl.pytorch/datasets/da
,
sh visda17.sh $DATA
Once the download is finished, the file structure will look like
visda17/
|–– train/
|–– test/
|–– validation/
Run the following command in your terminal under Dassl.pytorch/datasets/da
,
python cifar_stl.py $DATA/cifar_stl
This will create a folder named cifar_stl
under $DATA
. The file structure will look like
cifar_stl/
|–– cifar/
| |–– train/
| |–– test/
|–– stl/
| |–– train/
| |–– test/
Note that only 9 classes shared by both datasets are kept.
Create a folder $DATA/digit5
and download to this folder the dataset from here. This should give you
digit5/
|–– Digit-Five/
Then, run the following command in your terminal under Dassl.pytorch/datasets/da
,
python digit5.py $DATA/digit5
This will extract the data and organize the file structure as
digit5/
|–– Digit-Five/
|–– mnist/
|–– mnist_m/
|–– usps/
|–– svhn/
|–– syn/
Download link: http://ai.bu.edu/M3SDA/. (Please download the cleaned version of split files)
File structure:
domainnet/
|–– clipart/
|–– infograph/
|–– painting/
|–– quickdraw/
|–– real/
|–– sketch/
|–– splits/
| |–– clipart_train.txt
| |–– clipart_test.txt
| |–– ...
You need to download the DomainNet dataset first. The miniDomainNet's split files can be downloaded at this google drive. After the zip file is extracted, you should have the folder $DATA/domainnet/splits_mini/
.
Download link: google drive.
File structure:
pacs/
|–– images/
|–– splits/
You do not necessarily have to manually download this dataset. Once you run tools/train.py
, the code will detect if the dataset exists or not and automatically download the dataset to $DATA
if missing. This also applies to VLCS, Office-Home-DG, and Digits-DG.
Download link: google drive (credit to https://github.com/fmcarlucci/JigenDG#vlcs)
File structure:
VLCS/
|–– CALTECH/
|–– LABELME/
|–– PASCAL/
|–– SUN/
Download link: google drive.
File structure:
office_home_dg/
|–– art/
|–– clipart/
|–– product/
|–– real_world/
Download link: google driv.
File structure:
digits_dg/
|–– mnist/
|–– mnist_m/
|–– svhn/
|–– syn/
Follow the steps for Digit-5 to organize the dataset.
First download the CIFAR-10-C dataset from https://zenodo.org/record/2535967#.YFxHEWQzb0o to, e.g., $DATA, and extract the file under the same directory. Then, navigate to Dassl.pytorch/datasets/dg
and run the following command in your terminal
python cifar_c.py $DATA/CIFAR-10-C
where the first argument denotes the path to the (uncompressed) CIFAR-10-C dataset.
The script will extract images from the .npy
files and save them to cifar10_c/
created under $DATA. The file structure will look like
cifar10_c/
|–– brightness/
| |–– 1/ # 5 intensity levels in total
| |–– 2/
| |–– 3/
| |–– 4/
| |–– 5/
|–– ... # 19 corruption types in total
Note that cifar10_c/
only contains the test images. The training images are the normal CIFAR-10 images. See CIFAR10/100 and SVHN for how to prepare the CIFAR-10 dataset.
First download the CIFAR-100-C dataset from https://zenodo.org/record/3555552#.YFxpQmQzb0o to, e.g., $DATA, and extract the file under the same directory. Then, navigate to Dassl.pytorch/datasets/dg
and run the following command in your terminal
python cifar_c.py $DATA/CIFAR-100-C
where the first argument denotes the path to the (uncompressed) CIFAR-100-C dataset.
The script will extract images from the .npy
files and save them to cifar100_c/
created under $DATA. The file structure will look like
cifar100_c/
|–– brightness/
| |–– 1/ # 5 intensity levels in total
| |–– 2/
| |–– 3/
| |–– 4/
| |–– 5/
|–– ... # 19 corruption types in total
Note that cifar100_c/
only contains the test images. The training images are the normal CIFAR-100 images. See CIFAR10/100 and SVHN for how to prepare the CIFAR-100 dataset.
No action is required to preprocess WILDS's datasets. The code will automatically download the data.
Run the following command in your terminal under Dassl.pytorch/datasets/ssl
,
python cifar10_cifar100_svhn.py $DATA
This will create three folders under $DATA
, i.e.
cifar10/
|–– train/
|–– test/
cifar100/
|–– train/
|–– test/
svhn/
|–– train/
|–– test/
Run the following command in your terminal under Dassl.pytorch/datasets/ssl
,
python stl10.py $DATA/stl10
This will create a folder named stl10
under $DATA
and extract the data into three folders, i.e. train
, test
and unlabeled
. Then, download from http://ai.stanford.edu/~acoates/stl10/ the "Binary files" and extract it under stl10
.
The file structure will look like
stl10/
|–– train/
|–– test/
|–– unlabeled/
|–– stl10_binary/