Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more datasets #53

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
filelists
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.idea/
__pycache__/
*/__pycache__/
filelists/CUB/CUB_200_2011.tgz
filelists/CUB/CUB_200_2011/
filelists/CUB/*.json
filelists/miniImagenet/ILSVRC2015_CLS-LOC.tar.gz
filelists/miniImagenet/ILSVRC2015_CLS-LOC/
filelists/miniImagenet/*.csv
filelists/miniImagenet/*.json
filelists/emnist/emnist.zip
filelists/emnist/emnist/
filelists/emnist/*.json
filelists/omniglot/images_background/
filelists/omniglot/images_evaluation/
filelists/omniglot/*.txt
filelists/omniglot/*.zip
filelists/omniglot/*.json
6 changes: 6 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM ubuntu:18.04

RUN apt-get update && apt-get install -y python3 python3-pip wget
WORKDIR /repo
COPY requirements-cpu.txt requirements-cpu.txt
RUN pip3 install -r requirements-cpu.txt -f https://download.pytorch.org/whl/torch_stable.html
6 changes: 6 additions & 0 deletions Dockerfile-gpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM nvidia/cuda:10.2-cudnn7-runtime-ubuntu16.04

RUN apt-get update && apt-get install -y python3 python3-pip wget
WORKDIR /repo
COPY requirements-gpu.txt requirements-gpu.txt
RUN pip3 install -r requirements-gpu.txt
26 changes: 22 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,17 @@ year={2019}

## Enviroment
- Python3
- [Pytorch](http://pytorch.org/) before 0.4 (for newer vesion, please see issue #3 )
- [Pytorch](http://pytorch.org/) >= 1.0
- json

To install the dependencies use `pip3 install -r requirements-cpu.txt -f https://download.pytorch.org/whl/torch_stable.html` or `pip3 install -r requirements-gpu.txt`.

## Getting started
### CIFARFS
* Change to directory ./filelists/CIFARFS
* run `source ./download_Cifar.sh`
* run `python3 create-dataset.py` which you can edit to create a different dataset by choosing other classes

### CUB
* Change directory to `./filelists/CUB`
* run `source ./download_CUB.sh`
Expand Down Expand Up @@ -52,24 +59,35 @@ See test.json for reference

## Train
Run
```python ./train.py --dataset [DATASETNAME] --model [BACKBONENAME] --method [METHODNAME] [--OPTIONARG]```
```python3 ./train.py --dataset [DATASETNAME] --model [BACKBONENAME] --method [METHODNAME] [--OPTIONARG]```

For example, run `python ./train.py --dataset miniImagenet --model Conv4 --method baseline --train_aug`
Commands below follow this example, and please refer to io_utils.py for additional options.

## Save features
Save the extracted feature before the classifaction layer to increase test speed. This is not applicable to MAML, but are required for other methods.
Run
```python ./save_features.py --dataset miniImagenet --model Conv4 --method baseline --train_aug```
```python3 ./save_features.py --dataset miniImagenet --model Conv4 --method baseline --train_aug```

## Test
Run
```python ./test.py --dataset miniImagenet --model Conv4 --method baseline --train_aug```
```python3 ./test.py --dataset miniImagenet --model Conv4 --method baseline --train_aug```

## Results
* The test results will be recorded in `./record/results.txt`
* For all the pre-computed results, please see `./record/few_shot_exp_figures.xlsx`. This will be helpful for including your own results for a fair comparison.

## Docker
If you want to use Docker, build the container with `docker build -t closerlookfewshot .`
and execute commands with `docker run -v $(pwd):/repo closerlookfewshot [command]`,
e.g. `docker run -v $(pwd):/repo closerlookfewshot python3 /repo/train.py --dataset CUB --model Conv4 --method baseline --train_aug`.

If you have a GPU and CUDA and cudnn installed, use `nvidia-docker build -t closerlookfewshot -f Dockerfile-gpu .`
and `nvidia-docker run -v $(pwd):/repo closerlookfewshot [command]`,
e.g. `nvidia-docker run -v $(pwd):/repo closerlookfewshot python3 /repo/train.py --dataset CUB --model Conv4 --method baseline --train_aug`.
Change the CUDA version in `10.2-cudnn7-runtime-ubuntu16.04` (`Dockerfile-gpu`) if you have another version than 10.2.


## References
Our testbed builds upon several existing publicly available code. Specifically, we have modified and integrated the following code into this project:

Expand Down
5 changes: 3 additions & 2 deletions configs.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
save_dir = '/work/newriver/wyharveychen/CloserLookFewShot/'
save_dir = './record/'
data_dir = {}
data_dir['CUB'] = './filelists/CUB/'
data_dir['CUB'] = './filelists/CUB/'
data_dir['CIFARFS'] = './filelists/CIFARFS/'
data_dir['miniImagenet'] = './filelists/miniImagenet/'
data_dir['omniglot'] = './filelists/omniglot/'
data_dir['emnist'] = './filelists/emnist/'
57 changes: 57 additions & 0 deletions filelists/CIFARFS/cifar_fs_preprocessing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Dataloader of Gidaris & Komodakis, CVPR 2018
# Adapted from:
# https://github.com/gidariss/FewShotWithoutForgetting/blob/master/dataloader.py
from __future__ import print_function

import os
import os.path
import numpy as np
import pickle



from PIL import Image

# Set the appropriate paths of the datasets here.
_CIFAR_FS_DATASET_DIR = 'Cifar100BySuperclass/'

def load_data(file):
try:
with open(file, 'rb') as fo:
data = pickle.load(fo)
return data
except:
with open(file, 'rb') as f:
u = pickle._Unpickler(f)
u.encoding = 'latin1'
data = u.load()
return data

def split_in_superclasses():
d1 = load_data(os.path.join(
_CIFAR_FS_DATASET_DIR,
'cifar-100-python/test'))
d2 = load_data(os.path.join(
_CIFAR_FS_DATASET_DIR,
'cifar-100-python/train'))
meta = load_data(os.path.join(
_CIFAR_FS_DATASET_DIR,
'cifar-100-python/meta'))

number_of_superclasses = 20
fine_labels = np.concatenate((np.array(d1['fine_labels']), np.array(d2['fine_labels'])))
data = np.concatenate((np.array(d1['data']), np.array(d2['data'])))
coarse_labels = np.concatenate((np.array(d1['coarse_labels']), np.array(d2['coarse_labels'])))
for i in range(number_of_superclasses):
superclass_mask = coarse_labels == i
y_of_superclass = fine_labels[superclass_mask]
assert len(y_of_superclass) == 3000 # 5 * 600
x_of_superclass = data[superclass_mask, :]
superclass_name = meta["coarse_label_names"][i]
with open(f"superclass_{superclass_name}.pickle", 'wb') as f:
pickle.dump({
"superclass": superclass_name,
"labels": y_of_superclass,
"data": x_of_superclass
}, f)

6 changes: 6 additions & 0 deletions filelists/CIFARFS/create-dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Create train, val, test split
from generate_cross_domain_split import generate_cross_domain_split
base_classes = ["beaver", "dolphin", "otter"]
val_classes = ["seal"]
novel_classes = ["whale"]
generate_cross_domain_split(base_classes, val_classes, novel_classes)
3 changes: 3 additions & 0 deletions filelists/CIFARFS/download_Cifar.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env bash
git clone https://github.com/MkuuWaUjinga/Cifar100BySuperclass.git
python3 write_CIFARFS_filelist.py
48 changes: 48 additions & 0 deletions filelists/CIFARFS/generate_cross_domain_split.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import random
import numpy as np
import pickle
import os
from os import listdir
from os.path import isfile, isdir, join
from collections import defaultdict
import json


base_path = os.getcwd()

def write_json(class_list, name, class_name_to_path, class_name_to_label):
d = defaultdict(list)
d["label_names"] = list(class_name_to_path.keys())
for base_class in class_list:
path = class_name_to_path[base_class]
label = class_name_to_label[base_class]
for file in listdir(path):
if ".jpg" in file:
d["image_labels"].append(int(label))
d["image_names"].append(join(path, file))
with open(f"{name}.json", 'w') as f:
json.dump(d, f)

def check_input(class_list, all_classes):
assert all([cls in all_classes for cls in class_list]), f"Only classes for {all_classes} are allowed"


def generate_cross_domain_split(base_classes, val_classes, novel_classes):
"""
Generate base, val and novel json-files on the fly.
:param base_classes: the list of classes to be used for training
:param val_classes: the list of classes to be used for validation
:param novel_classes: the list of classes to be used for testing
:return:
"""
with open("class_name_to_label.pickle", 'rb') as f:
class_name_to_label = pickle.load(f)
with open("class_name_to_path.pickle", 'rb') as f:
class_name_to_path = pickle.load(f)
class_list = list(class_name_to_path.keys())
check_input(base_classes, class_list)
check_input(val_classes, class_list)
check_input(novel_classes, class_list)
write_json(base_classes, "base", class_name_to_path, class_name_to_label)
write_json(val_classes, "val", class_name_to_path, class_name_to_label)
write_json(novel_classes, "novel", class_name_to_path, class_name_to_label)
10 changes: 10 additions & 0 deletions filelists/CIFARFS/meta
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
�}q(Ufine_label_namesq]q(UappleqUaquarium_fishqUbabyqUbearqUbeaverqUbedq Ubeeq
Ubeetleq Ubicycleq UbottleqUbowlqUboyqUbridgeqUbusqU butterflyqUcamelqUcanqUcastleqU caterpillarqUcattleqUchairqU
chimpanzeeqUclockqUcloudqU cockroachqUcouchqUcrabqU crocodileqUcupq Udinosaurq!Udolphinq"Uelephantq#Uflatfishq$Uforestq%Ufoxq&Ugirlq'Uhamsterq(Uhouseq)Ukangarooq*Ukeyboardq+Ulampq,U
lawn_mowerq-Uleopardq.Ulionq/Ulizardq0Ulobsterq1Umanq2U
maple_treeq3U
motorcycleq4Umountainq5Umouseq6Umushroomq7Uoak_treeq8Uorangeq9Uorchidq:Uotterq;U palm_treeq<Upearq=U pickup_truckq>U pine_treeq?Uplainq@UplateqAUpoppyqBU porcupineqCUpossumqDUrabbitqEUraccoonqFUrayqGUroadqHUrocketqIUroseqJUseaqKUsealqLUsharkqMUshrewqNUskunkqOU
skyscraperqPUsnailqQUsnakeqRUspiderqSUsquirrelqTU streetcarqUU sunflowerqVU sweet_pepperqWUtableqXUtankqYU telephoneqZU
televisionq[Utigerq\Utractorq]Utrainq^Utroutq_Utulipq`UturtleqaUwardrobeqbUwhaleqcU willow_treeqdUwolfqeUwomanqfUwormqgeUcoarse_label_namesqh]qi(Uaquatic_mammalsqjUfishqkUflowersqlUfood_containersqmUfruit_and_vegetablesqnUhousehold_electrical_devicesqoUhousehold_furnitureqpUinsectsqqUlarge_carnivoresqrUlarge_man-made_outdoor_thingsqsUlarge_natural_outdoor_scenesqtUlarge_omnivores_and_herbivoresquUmedium_mammalsqvUnon-insect_invertebratesqwUpeopleqxUreptilesqyUsmall_mammalsqzUtreesq{U
vehicles_1q|U
vehicles_2q}eu.
Expand Down
58 changes: 58 additions & 0 deletions filelists/CIFARFS/write_CIFARFS_filelist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from os import listdir
from os.path import isdir, join
import os
from PIL import Image
import pickle
from pathlib import Path
from collections import defaultdict

base_path = os.getcwd()
file_list_path = join(base_path, "CifarFS")
data_path = join(base_path,'Cifar100BySuperclass/')
savedir = './'
dataset_list = ['base','val','novel']


def load_data(file):
try:
with open(file, 'rb') as fo:
data = pickle.load(fo)
return data
except:
with open(file, 'rb') as f:
u = pickle._Unpickler(f)
u.encoding = 'latin1'
data = u.load()
return data

# 1. Write pickled images to files (Structure: Superclass >> Class >> Images.jpg
fine_label_names = load_data(join(base_path, 'meta'))["fine_label_names"]
assert len(fine_label_names) == 100
all_pickled_files = [f for f in listdir(data_path) if ".pickle" in f]
assert len(all_pickled_files) == 20

class_name_to_label = {}
class_name_to_path = {}
for pickled_file in all_pickled_files:
superclass_data = load_data(join(data_path, pickled_file))
superclass_name = superclass_data["superclass"]
labels = superclass_data["labels"]
data = superclass_data["data"]
superclass_path = join(file_list_path, superclass_name)
Path(superclass_path).mkdir(parents=True, exist_ok=True)
num_data_points_per_class = defaultdict(int)
for label, data_point in zip(labels, data):
num_data_points_per_class[label] += 1
class_name = fine_label_names[label]
class_path = join(superclass_path, class_name)
class_name_to_label[class_name] = label
class_name_to_path[class_name] = class_path
Path(class_path).mkdir(parents=True, exist_ok=True)
im = Image.fromarray(data_point.reshape(3, 32, 32).transpose(1, 2, 0))
im.save(join(class_path, f"{class_name}_{num_data_points_per_class[label]}.jpg"))

with open(f"class_name_to_label.pickle", 'wb') as f:
pickle.dump(class_name_to_label, f)

with open(f"class_name_to_path.pickle", 'wb') as f:
pickle.dump(class_name_to_path, f)
2 changes: 1 addition & 1 deletion filelists/CUB/download_CUB.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env bash
wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
tar -zxvf CUB_200_2011.tgz
python write_CUB_filelist.py
python3 write_CUB_filelist.py
4 changes: 2 additions & 2 deletions filelists/emnist/download_emnist.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
wget https://github.com/NanqingD/DAOSL/raw/master/data/emnist.zip
unzip emnist.zip
python invert_emnist.py
python write_cross_char_valnovel_filelist.py
python3 invert_emnist.py
python3 write_cross_char_valnovel_filelist.py
4 changes: 2 additions & 2 deletions filelists/miniImagenet/download_miniImagenet.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ wget https://raw.githubusercontent.com/twitter/meta-learning-lstm/master/data/mi
wget https://raw.githubusercontent.com/twitter/meta-learning-lstm/master/data/miniImagenet/test.csv
wget http://image-net.org/image/ILSVRC2015/ILSVRC2015_CLS-LOC.tar.gz
tar -zxvf ILSVRC2015_CLS-LOC.tar.gz
python write_miniImagenet_filelist.py
python write_cross_filelist.py
python3 write_miniImagenet_filelist.py
python3 write_cross_filelist.py
6 changes: 3 additions & 3 deletions filelists/omniglot/download_omniglot.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ mv $DATADIR/images_evaluation/* $DATADIR/
rmdir $DATADIR/images_background
rmdir $DATADIR/images_evaluation

python rot_omniglot.py
python write_omniglot_filelist.py
python write_cross_char_base_filelist.py
python3 rot_omniglot.py
python3 write_omniglot_filelist.py
python3 write_cross_char_base_filelist.py
4 changes: 2 additions & 2 deletions io_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@

def parse_args(script):
parser = argparse.ArgumentParser(description= 'few-shot script %s' %(script))
parser.add_argument('--dataset' , default='CUB', help='CUB/miniImagenet/cross/omniglot/cross_char')
parser.add_argument('--dataset' , default='CIFARFS', help='CIFARFS/CUB/miniImagenet/cross/omniglot/cross_char')
parser.add_argument('--model' , default='Conv4', help='model: Conv{4|6} / ResNet{10|18|34|50|101}') # 50 and 101 are not used in the paper
parser.add_argument('--method' , default='baseline', help='baseline/baseline++/protonet/matchingnet/relationnet{_softmax}/maml{_approx}') #relationnet_softmax replace L2 norm with softmax to expedite training, maml_approx use first-order approximation in the gradient for efficiency
parser.add_argument('--method' , default='protonet', help='baseline/baseline++/protonet/matchingnet/relationnet{_softmax}/maml{_approx}') #relationnet_softmax replace L2 norm with softmax to expedite training, maml_approx use first-order approximation in the gradient for efficiency
parser.add_argument('--train_n_way' , default=5, type=int, help='class num to classify for training') #baseline and baseline++ would ignore this parameter
parser.add_argument('--test_n_way' , default=5, type=int, help='class num to classify for testing (validation) ') #baseline and baseline++ only use this parameter in finetuning
parser.add_argument('--n_shot' , default=5, type=int, help='number of labeled data in each class, same as n_support') #baseline and baseline++ only use this parameter in finetuning
Expand Down
10 changes: 6 additions & 4 deletions methods/baselinefinetune.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
import torch.nn.functional as F
from methods.meta_template import MetaTemplate

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

class BaselineFinetune(MetaTemplate):
def __init__(self, model_func, n_way, n_support, loss_type = "softmax"):
super(BaselineFinetune, self).__init__( model_func, n_way, n_support)
Expand All @@ -22,26 +24,26 @@ def set_forward_adaptation(self,x,is_feature = True):
z_query = z_query.contiguous().view(self.n_way* self.n_query, -1 )

y_support = torch.from_numpy(np.repeat(range( self.n_way ), self.n_support ))
y_support = Variable(y_support.cuda())
y_support = Variable(y_support.to(device))

if self.loss_type == 'softmax':
linear_clf = nn.Linear(self.feat_dim, self.n_way)
elif self.loss_type == 'dist':
linear_clf = backbone.distLinear(self.feat_dim, self.n_way)
linear_clf = linear_clf.cuda()
linear_clf = linear_clf.to(device)

set_optimizer = torch.optim.SGD(linear_clf.parameters(), lr = 0.01, momentum=0.9, dampening=0.9, weight_decay=0.001)

loss_function = nn.CrossEntropyLoss()
loss_function = loss_function.cuda()
loss_function = loss_function.to(device)

batch_size = 4
support_size = self.n_way* self.n_support
for epoch in range(100):
rand_id = np.random.permutation(support_size)
for i in range(0, support_size , batch_size):
set_optimizer.zero_grad()
selected_id = torch.from_numpy( rand_id[i: min(i+batch_size, support_size) ]).cuda()
selected_id = torch.from_numpy( rand_id[i: min(i+batch_size, support_size) ]).to(device)
z_batch = z_support[selected_id]
y_batch = y_support[selected_id]
scores = linear_clf(z_batch)
Expand Down
Loading