TextOCR Dataset

Data Downloading

The TextOCR dataset Official Website | Download Link

After downloading the images and annotations, unzip the files, after which the directory structure should be like as follows (ignoring the archive files):

TextOCR
  |--- train_val_images
  |    |--- <image_name>.jpg
  |    |--- <image_name>.jpg
  |    |--- ...
  |--- TextOCR_0.1_train.json
  |--- TextOCR_0.1_val.json

Data Preparation

For Detection Task

To prepare the data for text detection, you can run the following commands:

python tools/dataset_converters/convert.py \
    --dataset_name textocr --task det \
    --image_dir path/to/TextOCR/train_val_images/ \
    --label_dir path/to/TextOCR/TextOCR_0.1_train.json \
    --output_path path/to/TextOCR/det_gt.txt

The generated standard annotation file det_gt.txt will now be placed under the folder TextOCR/.

Back to dataset converters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textocr.md

textocr.md

TextOCR Dataset

Data Downloading

Data Preparation

For Detection Task

Files

textocr.md

Latest commit

History

textocr.md

File metadata and controls

TextOCR Dataset

Data Downloading

Data Preparation

For Detection Task