[feat] Visual Genome Support (#140)

* [feature] Extractor for various filetypes * [feat] Add builder for visual genome - Fixes #82 - Automatically downloads features and other things required for the dataset - Extracts them also * [chores] Extra things for .gitignore as per new scripts * [feat] Support for loading _info.npy files for each image * [feat] Load jsonl files in image database and scene graph database * [feat] Visual Genome dataset, various options for loading scene graphs etc - You can load scene_graphs, info about features, objects and relationships separately - QA will be loaded by default * [chores] Update README and docs * [fix] Address comments
facebookresearch · Aug 7, 2019 · 12f67cd · 12f67cd
1 parent b60e31d
commit 12f67cd
Show file tree

Hide file tree

Showing 20 changed files with 575 additions and 70 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,7 @@
 *.swp
 .idea/*
 **/__pycache__/*
+**/output/*
 data/.DS_Store
 docs/build
 results/*

diff --git a/README.md b/README.md
@@ -97,13 +97,15 @@ wget imdb_link
 tar xf [imdb].tar.gz
 ```
 
-| Dataset      | Key | Task | ImDB Link                                                                         | Features Link  | Features checksum                                                                 |
-|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|
-| TextVQA      | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` | 
-| VQA 2.0      | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz)                 | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)              | `ab7947b04f3063c774b87dfbf4d0e981` |
-| VizWiz       | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz)           | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz)          | `9a28d6a9892dda8519d03fba52fb899f` |
-| VisualDialog | visdial | dialog | Coming soon!                                                                      | Coming soon!                                                                    | Coming soon! | 
-| MS COCO  | coco    | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz)      | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)           | `ab7947b04f3063c774b87dfbf4d0e981`| 
+| Dataset      | Key | Task | ImDB Link                                                                         | Features Link  | Features checksum                                                                 | Notes|
+|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|-----|
+| TextVQA      | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` || 
+| VQA 2.0      | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz)                 | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)              | `ab7947b04f3063c774b87dfbf4d0e981` ||
+| VizWiz       | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz)           | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz)          | `9a28d6a9892dda8519d03fba52fb899f` ||
+| VisualDialog | visdial | dialog | Coming soon!                                                                      | Coming soon!                                                                    | Coming soon! | |
+| VisualGenome | visual_genome | vqa | Automatically downloaded                                                                      | Automatically downloaded                                                                    | Coming soon! | Also supports scene graphs|
+| CLEVR | clevr | vqa | Automatically downloaded                                                                      | Automatically downloaded                                                                    |  | |
+| MS COCO  | coco    | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz)      | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)           | `ab7947b04f3063c774b87dfbf4d0e981`| |
 
 After downloading the features, verify the download by checking the md5sum using 
 
@@ -119,8 +121,9 @@ supported by the models in Pythia's model zoo.
 
 | Model  | Key | Supported Datasets    | Pretrained Models | Notes                                                     |
 |--------|-----------|-----------------------|-------------------|-----------------------------------------------------------|
-| Pythia | pythia    | vqa2, vizwiz, textvqa | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth)  | VizWiz model has been pretrained on VQAv2 and transferred |
+| Pythia | pythia    | vqa2, vizwiz, textvqa, visual_genome | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth)  | VizWiz model has been pretrained on VQAv2 and transferred |
 | LoRRA  | lorra     | vqa2, vizwiz, textvqa       | [textvqa](https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth)      |                               |
+| CNN LSTM  | cnn_lstm     | clevr       |       | Features are calculated on fly. |                             
 | BAN    | ban       | vqa2, vizwiz, textvqa | Coming soon!      | Support is preliminary and haven't been tested thoroughly. |
 | BUTD    | butd       | coco | [coco](https://dl.fbaipublicfiles.com/pythia/pretrained_models/coco_captions/butd.pth)    |              |
 

diff --git a/docs/source/tutorials/concepts.md b/docs/source/tutorials/concepts.md
@@ -31,33 +31,37 @@ to refer it in the command line arguments.
 Following table shows the tasks and their datasets:
 
 ```eval_rst
-+--------+------------+------------------------+
-|**Task**| **Key**    | **Datasets**           |
-+--------+------------+------------------------+
-| VQA    | vqa        | VQA2.0, VizWiz, TextVQA|
-+--------+------------+------------------------+
-| Dialog | dialog     | VisualDialog           |
-+--------+------------+------------------------+
-| Caption| captioning | MS COCO                |
-+--------+------------+------------------------+
++--------+------------+---------------------------------------------+
+|**Task**| **Key**    | **Datasets**                                |
++--------+------------+---------------------------------------------+
+| VQA    | vqa        | VQA2.0, VizWiz, TextVQA, VisualGenome, CLEVR|
++--------+------------+---------------------------------------------+
+| Dialog | dialog     | VisualDialog                                |
++--------+------------+---------------------------------------------+
+| Caption| captioning | MS COCO                                     |
++--------+------------+---------------------------------------------+
 ```
 
 Following table shows the inverse of the above table, datasets along with their tasks and keys:
 
 ```eval_rst
-+--------------+---------+-----------+--------------------+
-| **Datasets** | **Key** | **Task**  |**Notes**           |
-+--------------+---------+-----------+--------------------+
-| VQA 2.0      | vqa2    | vqa       |                    |
-+--------------+---------+-----------+--------------------+
-| TextVQA      | textvqa | vqa       |                    |
-+--------------+---------+-----------+--------------------+
-| VizWiz       | vizwiz  | vqa       |                    |
-+--------------+---------+-----------+--------------------+
-| VisualDialog | visdial | dialog    |   Coming soon!     |
-+--------------+---------+-----------+--------------------+
-| MS COCO      | coco    | captioning|                    |
-+--------------+---------+-----------+--------------------+
++--------------+---------------+-----------+--------------------+
+| **Datasets** | **Key**       | **Task**  |**Notes**           |
++--------------+---------------+-----------+--------------------+
+| VQA 2.0      | vqa2          | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| TextVQA      | textvqa       | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| VizWiz       | vizwiz        | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| VisualDialog | visdial       | dialog    |   Coming soon!     |
++--------------+---------------+-----------+--------------------+
+| VisualGenome | visual_genome | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| CLEVR        | clevr         | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| MS COCO      | coco          | captioning|                    |
++--------------+---------------+-----------+--------------------+
 ```
 
 ## Models
@@ -75,17 +79,17 @@ reference in configuration and command line arguments. Following table shows eac
 key name and datasets it can be run on.
 
 ```eval_rst
-+-----------+---------+-----------------------+
-| **Model** | **Key** | **Datasets**          |
-+-----------+---------+-----------------------+
-| LoRRA     | lorra   | textvqa, vizwiz       |
-+-----------+---------+-----------------------+
-| Pythia    | pythia  | textvqa, vizwiz, vqa2 |
-+-----------+---------+-----------------------+
-| BAN       | ban     | textvqa, vizwiz, vqa2 |
-+-----------+---------+-----------------------+
-| BUTD      | butd    | coco                  |
-+-----------+---------+-----------------------+
++-----------+---------+--------------------------------------+
+| **Model** | **Key** | **Datasets**                         |
++-----------+---------+--------------------------------------+
+| LoRRA     | lorra   | textvqa, vizwiz                      |
++-----------+---------+--------------------------------------+
+| Pythia    | pythia  | textvqa, vizwiz, vqa2, visual_genome |
++-----------+---------+--------------------------------------+
+| BAN       | ban     | textvqa, vizwiz, vqa2                |
++-----------+---------+--------------------------------------+
+| BUTD      | butd    | coco                                 |
++-----------+---------+--------------------------------------+
 ```
 
 ```eval_rst

diff --git a/docs/source/tutorials/pretrained_models.md b/docs/source/tutorials/pretrained_models.md
@@ -6,17 +6,19 @@ predictions for EvalAI evaluation. This section expects that you have already in
 required data as explained in [quickstart](./quickstart).
 
 ```eval_rst
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| Model  | Model Key | Supported Datasets    | Pretrained Models                                 | Notes                                                     |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| Pythia | pythia    | vqa2, vizwiz, textvqa | `vqa2 train+val`_, `vqa2 train only`_,  `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| LoRRA  | lorra     | vqa2, vizwiz, textvqa | `textvqa`_                                        |                                                           |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| BAN    | ban       | vqa2, vizwiz, textvqa | Coming soon!                                      | Support is preliminary and haven't been tested throughly. |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| BUTD   | butd      | coco                  | `coco`_                                           |                                                           |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| Model  | Model Key | Supported Datasets                    | Pretrained Models                                 | Notes                                                     |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| Pythia | pythia    | vqa2, vizwiz, textvqa, visual_genome, | `vqa2 train+val`_, `vqa2 train only`_,  `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| LoRRA  | lorra     | vqa2, vizwiz, textvqa                 | `textvqa`_                                        |                                                           |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| CNNLSTM| cnn_lstm  | clevr                                 |                                                   | Features are calculated on fly in this on                 |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| BAN    | ban       | vqa2, vizwiz, textvqa                 | Coming soon!                                      | Support is preliminary and haven't been tested throughly. |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| BUTD   | butd      | coco                                  | `coco`_                                           |                                                           |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
 
 .. _vqa2 train+val: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth
 .. _vqa2 train only: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth