This document provides tutorials on how to train recognition networks using custom datasets, including the training of recognition networks in Chinese and English languages.
Currently, MindOCR recognition network supports two input formats, namely
Common Dataset
:A file format that stores images and text files. It is read by RecDataset.LMDB Dataset
: A file format provided by LMDB. It is read by LMDBDataset.
The following tutorials take the use of the Common Dataset
file format as an example.
Please place all training images in a single folder, and specify a txt file at a higher directory to label all training image names and corresponding labels. An example of the txt file is as follows:
# File Name # Corresponding label
word_421.png 菜肴
word_1657.png 你好
word_1814.png cathay
Note: Please separate image names and labels using \tab, and avoid using spaces or other delimiters.
The final training set will be stored in the following format:
|-data
|- gt_training.txt
|- training
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
Similarly, please place all validation images in a single folder, and specify a txt file at a higher directory to label all validation image names and corresponding labels. The final validation set will be stored in the following format:
|-data
|- gt_validation.txt
|- validation
|- word_001.png
|- word_002.jpg
|- word_003.jpg
| ...
To train recognition networks for different languages, users need to configure corresponding dictionaries. Only characters that exist in the dictionary will be correctly predicted by the model. MindOCR currently provides three dictionaries, corresponding to Default, Chinese and English respectively.
Default Dictionary
:includes lowercase English letters and numbers only. If users do not configure the dictionay, this one will be used by default.English Dictionary
:includes uppercase and lowercase English letters, numbers and punctuation marks, it is place atmindocr/utils/dict/en_dict.txt
.Chinese Dictionary
:includes commonly used Chinese characters, uppercase and lowercase English letters, numbers, and punctuation marks, it is placed atmindocr/utils/dict/ch_dict.txt
.
Currently, MindOCR does not provide a dictionary configuration for other languages. This feature will be released in a upcoming version.
To configure the corresponding configuration file for a specific network architecture, users need to provide the necessary settings. As an example, we can take CRNN (with backbone Resnet34) as an example.
Please select configs/rec/crnn/crnn_resnet34.yaml
as the initial configuration file and modify the train.dataset
and eval.dataset
fields in it.
...
train:
...
dataset:
type: RecDataset # File reading method. Here we use the `Common Dataset` format
dataset_root: dir/to/data/ # Root directory of the data
data_dir: training/ # Training dataset directory. It will be concatenated with `dataset_root` to form a complete path.
label_file: gt_training.txt # Path of the training label. It will be concatenated with `dataset_root` to form a complete path.
...
eval:
dataset:
type: RecDataset # File reading method. Here we use the `Common Dataset` format
dataset_root: dir/to/data/ # Root directory of the data
data_dir: validation/ # Validation dataset directory. It will be concatenated with `dataset_root` to form a complete path.
label_file: gt_validation.txt # Path of the validation label. It will be concatenated with `dataset_root` to form a complete path.
...
And also modify the corresponding dictionary location to the the English dictionary path.
...
common:
character_dict_path: &character_dict_path mindocr/utils/dict/en_dict.txt
...
To use the complete English dictionary, users need to modify the common:num_classes
attribute in the corresponding configuration file, as the initial configuration file’s dictionary only includes lowercase English and numbers.
...
common:
num_classes: &num_classes 95 # The number is equal to the number of dictionary characters plus 1
...
If the network needs to output spaces, it is necessary to modify the common.use_space_char
attribute and the common: num_classes
attribute as follows:
...
common:
num_classes: &num_classes 96 # The number must be equal to the number of characters in the dictionary plus the number of spaces plus 1.
use_space_char: &use_space_char True # Output `space` character additonaly
...
The user can add, delete, or modify characters within the dictionary as needed. It is important to note that characters must be separated by newline characters \n
, and it is necessary to avoid having duplicate characters in the same dictionary. Additionally, the user must also modify the common: num_classes
attribute in the configuration file to ensure that it is equal to the number of characters in the dictionary plus 1 (in the case of a seq2seq model, it is equal to the number of characters in the dictionary plus 2).
Please select configs/rec/crnn/crnn_resnet34_ch.yaml
as the initial configuration file and modify the train.dataset
and eval.dataset
fields in it.
...
train:
...
dataset:
type: RecDataset # File reading method. Here we use the `Common Dataset` format
dataset_root: dir/to/data/ # Root directory of the data
data_dir: training/ # Training dataset directory. It will be concatenated with `dataset_root` to form a complete path.
label_file: gt_training.txt # Path of the training label. It will be concatenated with `dataset_root` to form a complete path.
...
eval:
dataset:
type: RecDataset # File reading method. Here we use the `Common Dataset` format
dataset_root: dir/to/data/ # Root directory of the data
data_dir: validation/ # Validation dataset directory. It will be concatenated with `dataset_root` to form a complete path.
label_file: gt_validation.txt # Path of the validation label. It will be concatenated with `dataset_root` to form a complete path.
...
And also modify the corresponding dictionary location to the the Chinese dictionary path.
...
common:
character_dict_path: &character_dict_path mindocr/utils/dict/ch_dict.txt
...
If the network needs to output spaces, it is necessary to modify the common.use_space_char
attribute and the common: num_classes
attribute as follows:
...
common:
num_classes: &num_classes 6625 # The number must be equal to the number of characters in the dictionary plus the number of spaces plus 1.
use_space_char: &use_space_char True # Output `space` character additonaly
...
The user can add, delete, or modify characters within the dictionary as needed. It is important to note that characters must be separated by newline characters \n
, and it is necessary to avoid having duplicate characters in the same dictionary. Additionally, the user must also modify the common: num_classes
attribute in the configuration file to ensure that it is equal to the number of characters in the dictionary plus 1 (in the case of a seq2seq model, it is equal to the number of characters in the dictionary plus 2).
When all datasets and configuration files are ready, users can start training models with their own data. As each model has different training methods, users can refer to the corresponding model introduction documentation for the Model Training and Model Evaluation sections. Here, we will only use CRNN as an example.
Users can use the pre-trained models that we provide as a starting point for training. Pre-trained models can often improve the convergence speed and even accuracy of the model. Taking the Chinese model as an example, the url for the pre-trained model that we provide is https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt. Users only need to add model.pretrained
with the corresponding url in the configuration file as follows:
...
model:
type: rec
transform: null
backbone:
name: rec_resnet34
pretrained: False
neck:
name: RNNEncoder
hidden_size: 64
head:
name: CTCHead
out_channels: *num_classes
pretrained: https://download.mindspore.cn/toolkits/mindocr/crnn/crnn_resnet34_ch-7a342e3c.ckpt
...
If users encounter network issues, they can try downloading the pre-trained model to their local machine in advance, and then change model.pretrained
to the local path as follows:
...
model:
type: rec
transform: null
backbone:
name: rec_resnet34
pretrained: False
neck:
name: RNNEncoder
hidden_size: 64
head:
name: CTCHead
out_channels: *num_classes
pretrained: /local_path_to_the_ckpt/crnn_resnet34_ch-7a342e3c.ckpt
...
If users do not need to use the pre-trained model, they can simply delete model.pretrained
.
In the case of a large amount of data, we recommend that users use distributed training. For distributed training across multiple Ascend 910 devices or GPU devices, please modify the configuration parameter system.distribute
to True, for example:
# To perform distributed training on 4 GPU/Ascend devices
mpirun -n 4 python tools/train.py --config configs/rec/crnn/crnn_resnet34_ch.yaml
If you want to train or fine-tune the model on a smaller dataset without distributed training, please modify the configuration parameter system.distribute
to False
and run:
# Training on single CPU/GPU/Ascend devices
python tools/train.py --config configs/rec/crnn/crnn_resnet34_ch.yaml
The training results (including checkpoint, performance of each epoch, and curve graph) will be saved in the directory configured by the train.ckpt_save_dir
parameter in the YAML configuration file, which is set to ./tmp_rec
by default.
If users expect to load the optimizer, learning rate, and other information of the model while starting or continue training, they can add model.resume
to the corresponding local model path in the configuration file as follows, and start training:
...
model:
type: rec
transform: null
backbone:
name: rec_resnet34
pretrained: False
neck:
name: RNNEncoder
hidden_size: 64
head:
name: CTCHead
out_channels: *num_classes
resume: /local_path_to_the_ckpt/model.ckpt
...
Some models (including CRNN, RARE, SVTR) support mixed precision training to accelerate training speed. Users can try setting the system.amp_level
in the configuration file to O2
to start mixed precision training, as shown in the following example:
system:
mode: 0
distribute: True
amp_level: O2 # Mixed precision training
amp_level_infer: O2
seed: 42
log_interval: 100
val_while_train: True
drop_overflow_update: False
ckpt_max_keep: 5
...
To disable mixed precision training, change system.amp_level
to O0
.
To evaluate the accuracy of a trained model, users can use tools/eval.py
. Please set the ckpt_load_path
parameter in the eval
section of the configuration file to the file path of the model checkpoint, and set system.distribute
to False, as shown below:
system:
distribute: False # During evaluation stage, set to False
...
eval:
ckpt_load_path: /local_path_to_the_ckpt/model.ckpt
and run
python tools/eval.py --config configs/rec/crnn/crnn_resnet34_ch.yaml
You will get a model evaluation result similar to the following:
2023-06-16 03:41:20,237:INFO:Performance: {'acc': 0.821939, 'norm_edit_distance': 0.917264}
The number corresponding to acc
is the accuracy of the model.
Users can quickly obtain the inference results of the model by using the inference script. First, place the images in the same folder, and then execute:
python tools/infer/text/predict_rec.py --image_dir {dir_to_your_image_data} --rec_algorithm CRNN_CH --draw_img_save_dir inference_results
The results will be stored in draw_img_save_dir/rec_results.txt
. Here are some examples:
cert_id.png
doc_cn3.png
You will get inference results similar to the following:
cert_id.png 公民身份号码44052419
doc_cn3.png 马拉松选手不会为短暂的领先感到满意,而是永远在奔跑。