Skip to content

Latest commit

 

History

History
129 lines (96 loc) · 9.48 KB

README_en.md

File metadata and controls

129 lines (96 loc) · 9.48 KB

English | 简体中文

Introduction

Converting PaddleOCR to PyTorch.

This repository aims to

  • learn PaddleOCR
  • use models in PyTorch which are trained in Paddle
  • give a guideline for Paddle2PyTorch

Notice

PytorchOCR models are converted from PaddleOCRv2.0.

Recent updates

  • 2024.02.20 PP-OCRv4, support mobile version and server version
    • PP-OCRv4-mobile:When the speed is comparable, the effect of the Chinese scene is improved by 4.5% compared with PP-OCRv3, the English scene is improved by 10%, and the average recognition accuracy of the 80-language multilingual model is increased by more than 8%.
    • PP-OCRv4-server:Release the OCR model with the highest accuracy at present, the detection model accuracy increased by 4.9% in the Chinese and English scenes, and the recognition model accuracy increased by 2%
  • 2023.04.16 Handwritten Mathematical Expression Recognition CAN
  • 2023.04.07 Image Super-Resolution Text Telescope
  • 2022.10.17 Text Recognition: ViTSTR
  • 2022.10.07 Text Detection: DB++
  • 2022.07.24 text detection algorithms (FCENET)
  • 2022.07.16 text recognition algorithms (SVTR)
  • 2022.06.19 text recognition algorithms (SAR)
  • 2022.05.29 PP-OCRv3: With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%
  • 2022.05.14 PP-OCRv3 text detection model
  • 2022.04.17 1text recognition algorithm (NRTR)
  • 2022.03.20 1 text detection algorithm (PSENet)
  • 2021.09.11 PP-OCRv2. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
  • 2021.06.01 update SRN
  • 2021.04.25 update AAAI 2021 end-to-end algorithm PGNet
  • 2021.04.24 update RARE
  • 2021.04.12 update STARNET
  • 2021.04.08 update DB, SAST, EAST, ROSETTA, CRNN
  • 2021.04.03 update more than 25+ multilingual recognition models models list, including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated Develop Plan.
  • 2021.01.10 upload Chinese and English general OCR models.

Features

  • PTOCR series of high-quality pre-trained models, comparable to commercial effects
    • Ultra lightweight PP-OCR series models: detection + direction classifier + recognition
    • Ultra lightweight ptocr_mobile series models
    • General ptocr_server series models
    • Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
    • Support multi-language recognition: Korean, Japanese, German, French, etc.

Model List (updating)

PyTorch models in BaiduPan:https://pan.baidu.com/s/1r1DELT8BlgxeOP2RqREJEg code:6clx

PaddleOCR models in BaiduPan:https://pan.baidu.com/s/1getAprT2l_JqwhjwML0g9g code:lmv7

If you want to get more models including multilingual models,please refer to PTOCR series.

Tutorials

TODO

  • Add implementation of cutting-edge algorithms:Text Detection DRRG, Text Recognition RFL
  • Text Recognition: ABINet, VisionLAN, SPIN, RobustScanner
  • Table Recognition: TableMaster
  • PP-Structurev2,with functions and performance fully upgraded, adapted to Chinese scenes, and new support for Layout Recovery and one line command to convert PDF to Word
  • Layout Analysis optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms
  • Table Recognition optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption
  • Key Information Extraction optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%
  • text recognition algorithms (SEED)
  • key information extraction algorithm (SDMGR)
  • 3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM)
  • a new structured documents analysis toolkit, i.e., PP-Structure, support layout analysis and table recognition (One-key to export chart images to Excel files).

PP-OCR Pipeline

[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).

[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2.

Visualization

  • Chinese OCR model
  • English OCR model
  • Multilingual OCR model

References