Layout-Parser · an1018 · Jun 8, 2021 · Jun 8, 2021 · Jun 8, 2021 · Jun 9, 2021
diff --git a/README.md b/README.md
@@ -39,13 +39,53 @@ pip install layoutparser[ocr]
 
 **For Windows Users:** Please read [installation.md](installation.md) for details about installing Detectron2.
 
+## **Recent updates**
+
+2021.6.8 Update new layout detection model (PaddleDetection) and ocr model (PaddleOCR). We test Detectron2 and PaddleDetection models on PubLayNet and TableBank datasets, the  indicators are as follows:
+
+PubLayNet Dataset:
+
+|      Model      |  mAP  | CPU time cost | GPU time cost |
+| :-------------: | :---: | :-----------: | :-----------: |
+|   Detectron2    | 88.98 |   16545.5ms   |    209.5ms    |
+| PaddleDetection | 93.6  |   1713.7ms    |    66.6ms     |
+
+TableBank Dataset:
+
+|      Model      |  mAP  | CPU time cost | GPU time cost |
+| :-------------: | :---: | :-----------: | :-----------: |
+|   Detectron2    | 91.26 |   7623.2ms    |   104.2.ms    |
+| PaddleDetection | 96.2  |   1968.4ms    |    65.1ms     |
+
+**Envrionment：**	
+
+	**CPU：**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz，24core
+
+	**GPU：**  a single NVIDIA Tesla P40
+
+You can also find detailed installation instructions in [installation.md](installation.md). But generally, it's just `pip install` 
+some libraries: 
+
+```Python
+# Install PaddlePaddle
+# CUDA10.1
+python -m pip install paddlepaddle-gpu==2.1.0.post101 -f https://paddlepaddle.org.cn/whl/mkl/stable.html
+# CPU
+python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+
+# Install the paddle ocr components when necessary 
+pip install layoutparser[paddleocr] 
+```
+
+For more PaddlePaddle CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)
+
 ## Quick Start
 
 We provide a series of examples for to help you start using the layout parser library: 
 
 1. [Table OCR and Results Parsing](https://github.com/Layout-Parser/layout-parser/blob/master/examples/OCR%20Tables%20and%20Parse%20the%20Output.ipynb): `layoutparser` can be used for conveniently OCR documents and convert the output in to structured data. 
-
 2. [Deep Layout Parsing Example](https://github.com/Layout-Parser/layout-parser/blob/master/examples/Deep%20Layout%20Parsing.ipynb): With the help of Deep Learning, `layoutparser` supports the analysis very complex documents and processing of the hierarchical structure in the layouts. 
+3. [Deep Layout Parsing using Paddle](examples/Deep%20Layout%20Parsing%20using%20Paddle.ipynb): `layoutparser` supports the analysis very complex documents and processing of the hierarchical structure in the layouts Using Paddle models.
 
 
 ## DL Assisted Layout Prediction Example 
@@ -54,7 +94,7 @@ We provide a series of examples for to help you start using the layout parser li
 
 *The images shown in the figure above are: a screenshot of [this paper](https://arxiv.org/abs/2004.08686), an image from the [PRIMA Layout Analysis Dataset](https://www.primaresearch.org/dataset/), a screenshot of the [WSJ website](http://wsj.com), and an image from the [HJDataset](https://dell-research-harvard.github.io/HJDataset/).*
 
-With only 4 lines of code in `layoutparse`, you can unlock the information from complex documents that existing tools could not provide. You can either choose a deep learning model from the [ModelZoo](https://github.com/Layout-Parser/layout-parser/blob/master/docs/notes/modelzoo.md), or load the model that you trained on your own. And use the following code to predict the layout as well as visualize it: 
+With only 4 lines of code in `layoutparse`, you can unlock the information from complex documents that existing tools could not provide. You can either choose a deep learning model from the [ModelZoo](docs/notes/modelzoo.md), or load the model that you trained on your own. And use the following code to predict the layout as well as visualize it: 
 
 ```python
 >>> import layoutparser as lp
@@ -63,6 +103,19 @@ With only 4 lines of code in `layoutparse`, you can unlock the information from
 >>> lp.draw_box(image, layout,) # With extra configurations
 ```
 
+Use PaddleDetection model：
+
+```python
+>>> import layoutparser as lp
+>>> model = lp.PaddleDetectionLayoutModel('lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config')
+>>> layout = model.detect(image) # You need to load the image somewhere else, e.g., image = cv2.imread(...)
+>>> lp.draw_box(image, layout,) # With extra configurations
+```
+
+If you want to train a Paddledetection model yourself, please refer to：[Train PaddleDetection model](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/GETTING_STARTED.md)
+
+If you want to learn more about PaddleOCR, please refer to: [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)、[PaddleOCR infer](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_ch/whl.md)
+
 ## Contributing
 
 We encourage you to contribute to Layout Parser! Please check out the [Contributing guidelines](.github/CONTRIBUTING.md) for guidelines about how to proceed. Join us!
@@ -78,4 +131,5 @@ If you find `layoutparser` helpful to your work, please consider citing our tool
   journal={arXiv preprint arXiv:2103.15348},
   year={2021}
 }
-```
+```
+
diff --git a/dev-requirements.txt b/dev-requirements.txt
@@ -10,4 +10,6 @@ sphinx_rtd_theme
 google-cloud-vision==1
 pytesseract
 pycocotools
-git+https://github.com/facebookresearch/[email protected]#egg=detectron2
+git+https://github.com/facebookresearch/[email protected]#egg=detectron2
+paddlepaddle==2.1.0
+paddleocr>=2.0.1
diff --git a/docs/notes/installation.md b/docs/notes/installation.md
diff --git a/docs/notes/installation.md b/docs/notes/installation.md
@@ -0,0 +1,63 @@
+# Installation
+
+## Install Python
+
+Layout Parser is a Python package that requires Python >= 3.6. If you do not have Python installed on your computer, you might want to turn to [the official instruction](https://www.python.org/downloads/) to download and install the appropriate version of Python.
+
+## Install the Layout Parser main library
+
+Installing the Layout Parser library is very straightforward: you just need to run the following command: 
+
+```bash
+pip3 install -U layoutparser
+```
+
+## [Optional] Install Detectron2 for Using Layout Models
+
+### For Mac OS and Linux Users 
+
+If you would like to use deep learning models for layout detection, you also need to install Detectron2 on your computer. This could be done by running the following command: 
+
+```bash
+pip3 install 'git+https://github.com/facebookresearch/[email protected]#egg=detectron2' 
+```
+
+This might take some time as the command will *compile* the library. You might also want to install a Detectron2 version 
+with GPU support or encounter some issues during the installation process. Please refer to the official Detectron2 
+[installation instruction](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md) for detailed
+information. 
+
+### For Windows users
+
+As reported by many users, the installation of Detectron2 can be rather tricky on Windows platforms. In our extensive tests, we find that it is nearly impossible to provide a one-line installation command for Windows users. As a workaround solution, for now we list the possible challenges for installing Detectron2 on Windows, and attach helpful resources for solving them. We are also investigating other possibilities to avoid installing Detectron2 to use pre-trained models. If you have any suggestions or ideas, please feel free to [submit an issue](https://github.com/Layout-Parser/layout-parser/issues) in our repo. 
+
+1. Challenges for installing `pycocotools` 
+    - You can find detailed instructions on [this post](https://changhsinlee.com/pycocotools/) from Chang Hsin Lee. 
+    - Another solution is try to install `pycocotools-windows`, see https://github.com/cocodataset/cocoapi/issues/415. 
+2. Challenges for installing `Detectron2` 
+    - [@ivanpp](https://github.com/ivanpp) curates a detailed description for installing `Detectron2` on Windows: [Detectron2 walkthrough (Windows)](https://ivanpp.cc/detectron2-walkthrough-windows/#step3installdetectron2)
+    - `Detectron2` maintainers claim that they won't provide official support for Windows (see [1](https://github.com/facebookresearch/detectron2/issues/9#issuecomment-540974288) and [2](https://detectron2.readthedocs.io/en/latest/tutorials/install.html)), but Detectron2 is continuously built on windows with CircleCI (see [3](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues)). Hopefully this situation will be improved in the future.
+
+
+## [Optional] Install OCR utils
+
+Layout Parser also comes with supports for OCR functions. In order to use them, you need to install the OCR utils via: 
+
+```bash
+pip3 install -U layoutparser[ocr]
+```
+
+Additionally, if you want to use the Tesseract-OCR engine, you also need to install it on your computer. Please check the 
+[official documentation](https://tesseract-ocr.github.io/tessdoc/Installation.html) for detailed installation instructions. 
+
+## Known issues
+
+<details><summary>Error: instantiating `lp.GCVAgent.with_credential` returns module 'google.cloud.vision' has no attribute 'types'. </summary>
+<p>
+
+In this case, you have a newer version of the google-cloud-vision. Please consider downgrading the API using: 
+```bash
+pip install layoutparser[ocr]
+```
+</p>
+</details>
diff --git a/docs/notes/modelzoo.md b/docs/notes/modelzoo.md
@@ -2,7 +2,7 @@
 
 We provide a spectrum of pre-trained models on different datasets.
 
-## Example Usage: 
+## Example Usage using Detectron2: 
 
 ```python
 import layoutparser as lp
@@ -14,22 +14,58 @@ model = lp.Detectron2LayoutModel(
 model.detect(image)
 ```
 
+## Example  Usage using PaddleDetection: 
+
+```python
+import layoutparser as lp
+model = lp.PaddleDetectionLayoutModel(
+  					config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config", # In model catalog
+            label_map   ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
+            threshold =0.5] # Optional
+        )
+model.detect(image)
+```
+
 ## Model Catalog
 
-| Dataset                                                               | Model                                                                                      | Config Path                                            | Eval Result (mAP)                                                         |
-|-----------------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------------|
-| [HJDataset](https://dell-research-harvard.github.io/HJDataset/)       | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/j4yseny2u0hn22r/config.yml?dl=1)       | lp://HJDataset/faster_rcnn_R_50_FPN_3x/config          |                                                                           |
-| [HJDataset](https://dell-research-harvard.github.io/HJDataset/)       | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/4jmr3xanmxmjcf8/config.yml?dl=1)         | lp://HJDataset/mask_rcnn_R_50_FPN_3x/config            |                                                                           |
-| [HJDataset](https://dell-research-harvard.github.io/HJDataset/)       | [retinanet_R_50_FPN_3x](https://www.dropbox.com/s/z8a8ywozuyc5c2x/config.yml?dl=1)         | lp://HJDataset/retinanet_R_50_FPN_3x/config            |                                                                           |
-| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)                 | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/f3b12qc4hc0yh4m/config.yml?dl=1)       | lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config          |                                                                           |
-| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)                 | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/u9wbsfwz4y0ziki/config.yml?dl=1)         | lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config            |                                                                           |
-| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)                 | [mask_rcnn_X_101_32x8d_FPN_3x](https://www.dropbox.com/s/nau5ut6zgthunil/config.yaml?dl=1) | lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config     | 88.98 [eval.csv](https://www.dropbox.com/s/15ytg3fzmc6l59x/eval.csv?dl=0) |
-| [PrimaLayout](https://www.primaresearch.org/dataset/)                 | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/yc92x97k50abynt/config.yaml?dl=1)        | lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config          | 69.35 [eval.csv](https://www.dropbox.com/s/9uuql57uedvb9mo/eval.csv?dl=0) |
-| [NewspaperNavigator](https://news-navigator.labs.loc.gov/)            | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/wnido8pk4oubyzr/config.yml?dl=1)       | lp://NewspaperNavigator/faster_rcnn_R_50_FPN_3x/config |                                                                           |
-| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/7cqle02do7ah7k4/config.yaml?dl=1)      | lp://TableBank/faster_rcnn_R_50_FPN_3x/config          | 89.78 [eval.csv](https://www.dropbox.com/s/1uwnz58hxf96iw2/eval.csv?dl=0) |
-| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_101_FPN_3x](https://www.dropbox.com/s/h63n6nv51kfl923/config.yaml?dl=1)     | lp://TableBank/faster_rcnn_R_101_FPN_3x/config         | 91.26 [eval.csv](https://www.dropbox.com/s/e1kq8thkj2id1li/eval.csv?dl=0) |
+| Dataset                                                      | Model                                                        | Config Path                                                  | Eval Result (mAP)                                            |
+| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/j4yseny2u0hn22r/config.yml?dl=1) | lp://HJDataset/faster_rcnn_R_50_FPN_3x/config                |                                                              |
+| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/4jmr3xanmxmjcf8/config.yml?dl=1) | lp://HJDataset/mask_rcnn_R_50_FPN_3x/config                  |                                                              |
+| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [retinanet_R_50_FPN_3x](https://www.dropbox.com/s/z8a8ywozuyc5c2x/config.yml?dl=1) | lp://HJDataset/retinanet_R_50_FPN_3x/config                  |                                                              |
+| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/f3b12qc4hc0yh4m/config.yml?dl=1) | lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config                |                                                              |
+| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/u9wbsfwz4y0ziki/config.yml?dl=1) | lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config                  |                                                              |
+| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | [mask_rcnn_X_101_32x8d_FPN_3x](https://www.dropbox.com/s/nau5ut6zgthunil/config.yaml?dl=1) | lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config           | 88.98 [eval.csv](https://www.dropbox.com/s/15ytg3fzmc6l59x/eval.csv?dl=0) |
+| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | [ppyolov2_r50vd_dcn_365e_publaynet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | 93.6 [eval.csv](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/eval_publaynet.csv) |
+| [PrimaLayout](https://www.primaresearch.org/dataset/)        | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/yc92x97k50abynt/config.yaml?dl=1) | lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config                | 69.35 [eval.csv](https://www.dropbox.com/s/9uuql57uedvb9mo/eval.csv?dl=0) |
+| [NewspaperNavigator](https://news-navigator.labs.loc.gov/)   | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/wnido8pk4oubyzr/config.yml?dl=1) | lp://NewspaperNavigator/faster_rcnn_R_50_FPN_3x/config       |                                                              |
+| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/7cqle02do7ah7k4/config.yaml?dl=1) | lp://TableBank/faster_rcnn_R_50_FPN_3x/config                | 89.78 [eval.csv](https://www.dropbox.com/s/1uwnz58hxf96iw2/eval.csv?dl=0) |
+| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_101_FPN_3x](https://www.dropbox.com/s/h63n6nv51kfl923/config.yaml?dl=1) | lp://TableBank/faster_rcnn_R_101_FPN_3x/config               | 91.26 [eval.csv](https://www.dropbox.com/s/e1kq8thkj2id1li/eval.csv?dl=0) |
+| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [ppyolov2_r50vd_dcn_365e_tableBank_word](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | 96.2 [eval.csv](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/eval_tablebank.csv) |
 
 * For PubLayNet models, we suggest using `mask_rcnn_X_101_32x8d_FPN_3x` model as it's trained on the whole training set, while others are only trained on the validation set (the size is only around 1/50). You could expect a 15% AP improvement using the `mask_rcnn_X_101_32x8d_FPN_3x` model.
+* Compare the time cost  of **Detectron2** and **PaddleDetection**(ppyolov2_* models in the above table):
+
+PubLayNet Dataset:
+
+|      Model      | model | mAP |  CPU time cost | GPU time cost |
+| :-------------: | :---: | :-----------: | :-----------: | :-----------: |
+|   Detectron2    | mask_rcnn_X_101_32x8d_FPN_3x | 89.0 |  16545.5ms   |    209.5ms    |
+| PaddleDetection | ppyolov2_r50vd_dcn_365e | 93.6 |  1713.7ms    |    66.6ms     |
+
+
+TableBank Dataset:
+
+|      Model      | model | mAP | CPU time cost | GPU time cost |
+| :-------------: | :---: | :-----------: | :-----------: | :-----------: |
+|   Detectron2    | faster_rcnn_R_101_FPN_3x | 91.3 |  7623.2ms    |   104.2.ms    |
+| PaddleDetection | ppyolov2_r50vd_dcn_365e | 96.2 |  1968.4ms    |    65.1ms     |
+
+**Envrionment：**	
+
+	**CPU：**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz，24core
+
+	**GPU：**  a single NVIDIA Tesla P40
 
 ## Model `label_map`
 
@@ -39,4 +75,4 @@ model.detect(image)
 | [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | `{0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}`     |
 | [PrimaLayout](https://www.primaresearch.org/dataset/)        | `{1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"}` |
 | [NewspaperNavigator](https://news-navigator.labs.loc.gov/)        | `{0: "Photograph", 1: "Illustration", 2: "Map", 3: "Comics/Cartoon", 4: "Editorial Cartoon", 5: "Headline", 6: "Advertisement"}` |
-| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html)         | `{0: "Table"}` |
+| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html)         | `{0: "Table"}` |