Skip to content

Commit

Permalink
[feat] Visual Genome Support (#140)
Browse files Browse the repository at this point in the history
* [feature] Extractor for various filetypes

* [feat] Add builder for visual genome

- Fixes #82
- Automatically downloads features and other things required for the
dataset
- Extracts them also

* [chores] Extra things for .gitignore as per new scripts

* [feat] Support for loading _info.npy files for each image

* [feat] Load jsonl files in image database and scene graph database

* [feat] Visual Genome dataset, various options for loading scene graphs
etc
- You can load scene_graphs, info about features, objects and
relationships separately
- QA will be loaded by default

* [chores] Update README and docs

* [fix] Address comments
  • Loading branch information
apsdehal committed Aug 7, 2019
1 parent b60e31d commit 12f67cd
Show file tree
Hide file tree
Showing 20 changed files with 575 additions and 70 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
*.swp
.idea/*
**/__pycache__/*
**/output/*
data/.DS_Store
docs/build
results/*
Expand Down
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,13 +97,15 @@ wget imdb_link
tar xf [imdb].tar.gz
```

| Dataset | Key | Task | ImDB Link | Features Link | Features checksum |
|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|
| TextVQA | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` |
| VQA 2.0 | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981` |
| VizWiz | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz) | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz) | `9a28d6a9892dda8519d03fba52fb899f` |
| VisualDialog | visdial | dialog | Coming soon! | Coming soon! | Coming soon! |
| MS COCO | coco | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981`|
| Dataset | Key | Task | ImDB Link | Features Link | Features checksum | Notes|
|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|-----|
| TextVQA | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` ||
| VQA 2.0 | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981` ||
| VizWiz | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz) | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz) | `9a28d6a9892dda8519d03fba52fb899f` ||
| VisualDialog | visdial | dialog | Coming soon! | Coming soon! | Coming soon! | |
| VisualGenome | visual_genome | vqa | Automatically downloaded | Automatically downloaded | Coming soon! | Also supports scene graphs|
| CLEVR | clevr | vqa | Automatically downloaded | Automatically downloaded | | |
| MS COCO | coco | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981`| |

After downloading the features, verify the download by checking the md5sum using

Expand All @@ -119,8 +121,9 @@ supported by the models in Pythia's model zoo.

| Model | Key | Supported Datasets | Pretrained Models | Notes |
|--------|-----------|-----------------------|-------------------|-----------------------------------------------------------|
| Pythia | pythia | vqa2, vizwiz, textvqa | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth) | VizWiz model has been pretrained on VQAv2 and transferred |
| Pythia | pythia | vqa2, vizwiz, textvqa, visual_genome | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth) | VizWiz model has been pretrained on VQAv2 and transferred |
| LoRRA | lorra | vqa2, vizwiz, textvqa | [textvqa](https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth) | |
| CNN LSTM | cnn_lstm | clevr | | Features are calculated on fly. |
| BAN | ban | vqa2, vizwiz, textvqa | Coming soon! | Support is preliminary and haven't been tested thoroughly. |
| BUTD | butd | coco | [coco](https://dl.fbaipublicfiles.com/pythia/pretrained_models/coco_captions/butd.pth) | |

Expand Down
70 changes: 37 additions & 33 deletions docs/source/tutorials/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,33 +31,37 @@ to refer it in the command line arguments.
Following table shows the tasks and their datasets:

```eval_rst
+--------+------------+------------------------+
|**Task**| **Key** | **Datasets** |
+--------+------------+------------------------+
| VQA | vqa | VQA2.0, VizWiz, TextVQA|
+--------+------------+------------------------+
| Dialog | dialog | VisualDialog |
+--------+------------+------------------------+
| Caption| captioning | MS COCO |
+--------+------------+------------------------+
+--------+------------+---------------------------------------------+
|**Task**| **Key** | **Datasets** |
+--------+------------+---------------------------------------------+
| VQA | vqa | VQA2.0, VizWiz, TextVQA, VisualGenome, CLEVR|
+--------+------------+---------------------------------------------+
| Dialog | dialog | VisualDialog |
+--------+------------+---------------------------------------------+
| Caption| captioning | MS COCO |
+--------+------------+---------------------------------------------+
```

Following table shows the inverse of the above table, datasets along with their tasks and keys:

```eval_rst
+--------------+---------+-----------+--------------------+
| **Datasets** | **Key** | **Task** |**Notes** |
+--------------+---------+-----------+--------------------+
| VQA 2.0 | vqa2 | vqa | |
+--------------+---------+-----------+--------------------+
| TextVQA | textvqa | vqa | |
+--------------+---------+-----------+--------------------+
| VizWiz | vizwiz | vqa | |
+--------------+---------+-----------+--------------------+
| VisualDialog | visdial | dialog | Coming soon! |
+--------------+---------+-----------+--------------------+
| MS COCO | coco | captioning| |
+--------------+---------+-----------+--------------------+
+--------------+---------------+-----------+--------------------+
| **Datasets** | **Key** | **Task** |**Notes** |
+--------------+---------------+-----------+--------------------+
| VQA 2.0 | vqa2 | vqa | |
+--------------+---------------+-----------+--------------------+
| TextVQA | textvqa | vqa | |
+--------------+---------------+-----------+--------------------+
| VizWiz | vizwiz | vqa | |
+--------------+---------------+-----------+--------------------+
| VisualDialog | visdial | dialog | Coming soon! |
+--------------+---------------+-----------+--------------------+
| VisualGenome | visual_genome | vqa | |
+--------------+---------------+-----------+--------------------+
| CLEVR | clevr | vqa | |
+--------------+---------------+-----------+--------------------+
| MS COCO | coco | captioning| |
+--------------+---------------+-----------+--------------------+
```

## Models
Expand All @@ -75,17 +79,17 @@ reference in configuration and command line arguments. Following table shows eac
key name and datasets it can be run on.

```eval_rst
+-----------+---------+-----------------------+
| **Model** | **Key** | **Datasets** |
+-----------+---------+-----------------------+
| LoRRA | lorra | textvqa, vizwiz |
+-----------+---------+-----------------------+
| Pythia | pythia | textvqa, vizwiz, vqa2 |
+-----------+---------+-----------------------+
| BAN | ban | textvqa, vizwiz, vqa2 |
+-----------+---------+-----------------------+
| BUTD | butd | coco |
+-----------+---------+-----------------------+
+-----------+---------+--------------------------------------+
| **Model** | **Key** | **Datasets** |
+-----------+---------+--------------------------------------+
| LoRRA | lorra | textvqa, vizwiz |
+-----------+---------+--------------------------------------+
| Pythia | pythia | textvqa, vizwiz, vqa2, visual_genome |
+-----------+---------+--------------------------------------+
| BAN | ban | textvqa, vizwiz, vqa2 |
+-----------+---------+--------------------------------------+
| BUTD | butd | coco |
+-----------+---------+--------------------------------------+
```

```eval_rst
Expand Down
24 changes: 13 additions & 11 deletions docs/source/tutorials/pretrained_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,19 @@ predictions for EvalAI evaluation. This section expects that you have already in
required data as explained in [quickstart](./quickstart).

```eval_rst
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
| Model | Model Key | Supported Datasets | Pretrained Models | Notes |
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
| Pythia | pythia | vqa2, vizwiz, textvqa | `vqa2 train+val`_, `vqa2 train only`_, `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
| LoRRA | lorra | vqa2, vizwiz, textvqa | `textvqa`_ | |
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
| BAN | ban | vqa2, vizwiz, textvqa | Coming soon! | Support is preliminary and haven't been tested throughly. |
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
| BUTD | butd | coco | `coco`_ | |
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
| Model | Model Key | Supported Datasets | Pretrained Models | Notes |
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
| Pythia | pythia | vqa2, vizwiz, textvqa, visual_genome, | `vqa2 train+val`_, `vqa2 train only`_, `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
| LoRRA | lorra | vqa2, vizwiz, textvqa | `textvqa`_ | |
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
| CNNLSTM| cnn_lstm | clevr | | Features are calculated on fly in this on |
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
| BAN | ban | vqa2, vizwiz, textvqa | Coming soon! | Support is preliminary and haven't been tested throughly. |
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
| BUTD | butd | coco | `coco`_ | |
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
.. _vqa2 train+val: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth
.. _vqa2 train only: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth
Expand Down
Loading

0 comments on commit 12f67cd

Please sign in to comment.