PaddleMM Provides processing of multi-modal data including text and image. The folder for storing data sets are organized as follows:
- images (Store the original images of the dataset)
- img_feat.npy (Image region features extracted by Faster-RCNN)
- img_box.npy (The location information of the image area extracted by Faster-RCNN)
- dataset.json (Store the relevant information of the original data set, such as text, data set division, label, etc. See how to read from paddlemm/datasets/reader)
To obtain the standard data loading format of the toolkit, the MS-COCO dataset needs to be processed as follows:
- Step 1. Download COCO2014 Tran/Val images and captions here , merge the training set images and validation set images into 'images' folder.
- Step 2. Download COCO processing and dividing files by Andrej Karpathy here .
- Step 3. Download COCO regional features and location information extracted by Faster-RCNN here .
- Step 4. Use paddlemm/scripts/coco_region.py and paddlemm/scripts/coco_label.py to process the original data to get image features and labels.
If you want to try the visualization module of fusion task,please download the dataset and modify the configuration as follows: