DAN-VisDial

PyTorch implementation for the EMNLP'19 Dual Attention Networks for Visual Reference Resolution in Visual Dialog.
For the visual dialog v1.0 dataset, our single model achieved state-of-the-art performance on NDCG, MRR, and R@1.

If you use this code in your published research, please consider citing:

@inproceedings{kang2019dual,
  title={Dual Attention Networks for Visual Reference Resolution in Visual Dialog},
  author={Kang, Gi-Cheon and Lim, Jaeseo and Zhang, Byoung-Tak},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  year={2019}
}

Setup and Dependencies

This starter code is implemented using PyTorch v0.3.1 with CUDA 8 and CuDNN 7.
It is recommended to set up this source code using Anaconda or Miniconda.

Install Anaconda or Miniconda distribution based on Python 3.6+ from their downloads' site.
Clone this repository and create an environment:

git clone https://github.com/gicheonkang/DAN-VisDial
conda create -n dan_visdial python=3.6

# activate the environment and install all dependencies
conda activate dan_visdial
cd DAN-VisDial/
pip install -r requirements.txt

Download Features

We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under $PROJECT_ROOT/data/{SPLIT_NAME}_feature directory. We need image_id to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl) because the number of bounding box per image is not fixed (ranging from 10 to 100).

train_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of train split (32GB).
train_imgid2idx.pkl: image_id to bbox index file for train split
val_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of validation split (0.5GB).
val_imgid2idx.pkl: image_id to bbox index file for val split
test_btmup_f.hdf5: Bottom-up features of 10 to 100 proposals from images of test split (2GB).
test_imgid2idx.pkl: image_id to bbox index file for test split

Download the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt under $PROJECT_ROOT/data/glove directory.

Data preprocessing & Word embedding initialization

# data preprocessing
cd DAN-VisDial/data/
python prepro.py

# Word embedding vector initialization (GloVe)
cd ../utils
python utils.py

Training

Simple run

python train.py

Saving model checkpoints

By default, our model save model checkpoints at every epoch. You can change it by using -save_step option.

Logging

Logging data checkpoints/start/time/log.txt shows epoch, loss, and learning rate.

Evaluation

Evaluation of a trained model checkpoint can be evaluated as follows:

python evaluate.py -load_path /path/to/.pth -split val

Validation scores can be checked in offline setting. But if you want to check the test split score, you have to submit a json file to online evaluation server. You can make json format with -save_ranks=True option.

Results

Performance on v1.0 test-std (trained on v1.0 train):

Model	NDCG	MRR	R@1	R@5	R@10	Mean
DAN	0.5759	0.6320	49.63	79.75	89.35	4.30

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
checkpoints		checkpoints
data		data
decoders		decoders
encoders		encoders
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dan_overview.jpg		dan_overview.jpg
dataloader.py		dataloader.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAN-VisDial

Setup and Dependencies

Download Features

Data preprocessing & Word embedding initialization

Training

Saving model checkpoints

Logging

Evaluation

Results

About

Releases

Packages

Languages

License

swstarlab/DAN-VisDial

Folders and files

Latest commit

History

Repository files navigation

DAN-VisDial

Setup and Dependencies

Download Features

Data preprocessing & Word embedding initialization

Training

Saving model checkpoints

Logging

Evaluation

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages