This is the official PyTorch implementation for Learning to Caption Images through a Lifetime by Asking Questions. For technical details, please refer to:
Learning to Caption Images through a Lifetime by Asking Questions
Kevin Shen, Amlan Kar, Sanja Fidler
ICCV 2019
NeurIPS 2018, VIGIL Workshop
[Paper] [Demo]
All the code has been run and tested on:
- Python 2.7.15 (coco-caption requires 2.7)
- Pytorch 1.0.0
- CUDA 9.0
- TITAN X/Xp and GTX 1080Ti GPUs
First clone the repository:
git clone https://github.com/shenkev/Caption-Images-through-a-Lifetime-by-Asking-Questions.git
- Go into the downloaded code directory
- Add the project to PYTHONPATH
cd <path_to_downloaded_directory>
export PYTHONPATH=$PWD
- skip the setup and instead:
- See section "Download pretrained modules" and follow instructions
- Download pretrained resnet model from here and place in Utils/preprocess/checkpoint
Run demo.py, this will run inference on the file cat.png. Feel feel to use your own image and update the path to file in demo.py.
python demo.py
chmod +x setup.sh
./setup.sh
This will:
- Install python dependencies
- Download Stanford NLP package for parsing part-of-speech
- Download coco-caption
- Download pyciderevalcap
-
Download the images from this link. We need the 2014 training images and 2014 val images.
-
You should put the train2014/ and val2014/ in a directory of your choice, denoted as
$IMAGE_ROOT
. -
Download pretrained resnet model from here and place in Utils/preprocess/checkpoint
-
Preprocess images the images by running
python Utils/preprocess/preprocess_imgs.py --input_json Data/annotation/dataset_coco.json --output_dir $IMAGE_ROOT/features --images_root $IMAGE_ROOT
Warning: the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset.
- Download training data here
- Unzip it into Data/annotation
- Precompute indexes for CIDEr
python Utils/preprocess/preprocess_cider.py --data_file Data/annotation/cap_train.p --output_file Data/annotation/coco-words
- Prepare lifelong learning data splits
python Utils/preprocess/preprocess_llsplits.py --data_file Data/annotation/cap_train.p --output_file Data/annotation/train3_split --warmup 3 --num_splits 4 --num_caps 2
- You can play with the chunk sizes and # chunks using
warmup
andnum_splits
parameters
- You can either download trained caption, question generator, VQA modules or train them yourself
- You can download trained Caption, Question generator, VQA modules
- Download model checkpoints here
- Place in Data/model_checkpoints
- The captioning module was trained using 10% warmup data
- Train caption module
- In
Experiments/caption.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
python Scripts/train_caption3.py --experiment Experiments/caption3.json
- Train VQA module
- In
Experiments/vqa.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
python Scripts/train_vqa.py --experiment Experiments/vqa.json
- Train question generator module
- In
Experiments/question3.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
,vqa_path
to vqa model checkpoint andcap_path
to caption model checkpoint
python Scripts/train_quegen.py --experiment Experiments/question3.json
-
In
Experiments/lifelong3.json
changeexp_dir
to the working directory,img_dir
to$IMAGE_ROOT
,vqa_path
to vqa model checkpoint andcap_path
to caption model checkpoint,quegen_path
to question generator model checkpoint -
You can play with parameters
H, lamda, k
python Scripts/train_lifelong.py --experiment Experiments/lifelong3.json
- Track training
cd Results/lifelong
tensorboard --logdir tensorboard/
- Visualize qualitative results
cd Results/lifelong/lifelong3