Skip to content

Latest commit

 

History

History
187 lines (147 loc) · 7.61 KB

File metadata and controls

187 lines (147 loc) · 7.61 KB

Alphanumeric Text Spotting

Model that is able to detect and recognize alphanumeric text (figures and letters of English alphabet).

Model Name Complexity (GFLOPs) Size (Mp) Detection F1-score (ICDAR'15) Word Spotting F1-score (ICDAR'15) Links GPU_NUM
text-spotting-0003 190.5 27.76 86.60% 64.71% model template, snapshot 1

Training pipeline

0. Change a directory in your terminal to text_spotting.

cd <training_extensions>/pytorch_toolkit/text_spotting

If You have not created virtual environment yet:

./init_venv.sh

Else:

. venv/bin/activate

or if You use conda:

conda activate <environment_name>

1. Select a model template file and instantiate it in some directory.

export MODEL_TEMPLATE=`realpath ./model_templates/alphanumeric-text-spotting/text-spotting-0003/template.yaml`
export WORK_DIR=/tmp/my_model
python ../tools/instantiate_template.py ${MODEL_TEMPLATE} ${WORK_DIR}

2. Download datasets

To be able to train networks and/or get quality metrics for pre-trained ones, it's necessary to download at least one dataset from following resources.

3. Convert datasets

Extract downloaded datasets in ${DATA_DIR}/text-dataset folder.

export DATA_DIR=${WORK_DIR}/data

Convert it to format that is used internally and split to the train and test part.

  • Training annotation
python3 ./tools/create_dataset.py \
    --config ./model_templates/alphanumeric-text-spotting/dataset_train.json \
    --output ${DATA_DIR}/text-dataset/dataset_train.json \
    --root ${DATA_DIR}/text-dataset/
export TRAIN_ANN_FILE=${DATA_DIR}/text-dataset/dataset_train.json
export TRAIN_IMG_ROOT=${DATA_DIR}/text-dataset
  • Testing annotation
python3 ./tools/create_dataset.py \
    --config ./model_templates/alphanumeric-text-spotting/dataset_test.json \
    --output ${DATA_DIR}/text-dataset/dataset_val.json \
    --root ${DATA_DIR}/text-dataset/
export VAL_ANN_FILE=${DATA_DIR}/text-dataset/dataset_val.json
export VAL_IMG_ROOT=${DATA_DIR}/text-dataset

Examples of json file for train and test dataset configuration can be found in alphanumeric-text-spotting/datasets. So, if you would like not to use all datasets above, please change its content.

The structure of the folder with datasets:

${DATA_DIR}/text-dataset
    ├── coco-text
    ├── icdar2013
    ├── icdar2015
    ├── icdar2017
    ├── icdar2019_art
    ├── icdar2019_mlt
    ├── MSRA-TD500
    ├── IC13TRAIN_IC15_IC17_IC19_MSRATD500_COCOTEXT.json
    └── IC13TEST.json

4. Change current directory to directory where the model template has been instantiated.

cd ${WORK_DIR}

5. Training and Fine-tuning

Try both following variants and select the best one:

  • Training from scratch or pre-trained weights. Only if you have a lot of data, let's say tens of thousands or even more images. This variant assumes long training process starting from big values of learning rate and eventually decreasing it according to a training schedule.

  • Fine-tuning from pre-trained weights. If the dataset is not big enough, then the model tends to overfit quickly, forgetting about the data that was used for pre-training and reducing the generalization ability of the final model. Hence, small starting learning rate and short training schedule are recommended.

  • If you would like to start training from pre-trained weights use --load-weights pararmeter.

    python train.py \
       --load-weights ${WORK_DIR}/snapshot.pth \
       --train-ann-files ${TRAIN_ANN_FILE} \
       --train-data-roots ${TRAIN_IMG_ROOT} \
       --val-ann-files ${VAL_ANN_FILE} \
       --val-data-roots ${VAL_IMG_ROOT} \
       --save-checkpoints-to ${WORK_DIR}/outputs

    Also you can use parameters such as --epochs, --batch-size, --gpu-num, --base-learning-rate, otherwise default values will be loaded from ${MODEL_TEMPLATE}.

  • If you would like to start fine-tuning from pre-trained weights use --resume-from parameter and value of --epochs have to exceed the value stored inside ${MODEL_TEMPLATE} file, otherwise training will be ended immediately. Here we add 5 additional epochs.

    export ADD_EPOCHS=5
    export EPOCHS_NUM=$((`cat ${MODEL_TEMPLATE} | grep epochs | tr -dc '0-9'` + ${ADD_EPOCHS}))
    
    python train.py \
       --resume-from ${WORK_DIR}/snapshot.pth \
       --train-ann-files ${TRAIN_ANN_FILE} \
       --train-data-roots ${TRAIN_IMG_ROOT} \
       --val-ann-files ${VAL_ANN_FILE} \
       --val-data-roots ${VAL_IMG_ROOT} \
       --save-checkpoints-to ${WORK_DIR}/outputs \
       --epochs ${EPOCHS_NUM}

6. Evaluation

Evaluation procedure allows us to get quality metrics values and complexity numbers such as number of parameters and FLOPs.

To compute MS-COCO metrics and save computed values to ${WORK_DIR}/metrics.yaml run:

python eval.py \
   --load-weights ${WORK_DIR}/outputs/latest.pth \
   --test-ann-files ${VAL_ANN_FILE} \
   --test-data-roots ${VAL_IMG_ROOT} \
   --save-metrics-to ${WORK_DIR}/metrics.yaml

You can also save images with predicted bounding boxes using --save-output-to parameter.

python eval.py \
   --load-weights ${WORK_DIR}/outputs/latest.pth \
   --test-ann-files ${VAL_ANN_FILE} \
   --test-data-roots ${VAL_IMG_ROOT} \
   --save-metrics-to ${WORK_DIR}/metrics.yaml \
   --save-output-to ${WORK_DIR}/output_images

7. Export PyTorch* model to the OpenVINO™ format

To convert PyTorch* model to the OpenVINO™ IR format run the export.py script:

python export.py \
   --load-weights ${WORK_DIR}/outputs/latest.pth \
   --save-model-to ${WORK_DIR}/export

This produces model model.xml and weights model.bin in single-precision floating-point format (FP32). The obtained model expects normalized image in planar BGR format.

8. Validation of IR

Instead of passing snapshot.pth you need to pass path to model.bin (or model.xml).

python eval.py \
   --load-weights ${WORK_DIR}/export/model.bin \
   --test-ann-files ${VAL_ANN_FILE} \
   --test-data-roots ${VAL_IMG_ROOT} \
   --save-metrics-to ${WORK_DIR}/metrics.yaml