|
1 |
| -#Pytorch Implementation of Pixel-LINK |
| 1 | +# Pytorch Implementation of [Pixel-LINK](https://arxiv.org/pdf/1801.01315.pdf) |
| 2 | + |
| 3 | +## A brief abstract of your project including the problem statement and solution approach |
| 4 | + |
| 5 | +We are attempting to detect all kinds of text in the wild. The technique used for text detection is based on the paper PixelLink: Detecting Scene Text via Instance Segmentation (https://arxiv.org/abs/1801.01315) by Deng et al. The text instances present in the scene images lie very close to each other, and it is challenging to distinguish them using semantic segmentation. So, there is a need of instance segmentation. |
| 6 | + |
| 7 | +The approach consists of two key steps: |
| 8 | +a) Linking of pixels in the same text instance - Segmentation step, |
| 9 | +b) Text bounding box extraction using the linking done. |
| 10 | + |
| 11 | +There are two kinds of predictions getting done here at each pixel level in the image: |
| 12 | +a) Text/non-text prediction, |
| 13 | +b) Link prediction. |
| 14 | + |
| 15 | +This approach sets it apart from other kinds of methodologies used so far for text detection. Before PixelLink, the SOTA approaches on text detection does two kinds of prediction: a) Text/non-text prediction, b) Location Regression. Here both of these predictions are made at one go taking many fewer number of iterations and less training data. |
| 16 | + |
| 17 | +## Demo([Youtube Link](https://www.youtube.com/watch?v=3d3J0kH3u6c)) |
| 18 | + |
| 19 | +## Results: If numerical, mention them in tabular format. If visual, display. If you've done a great project, this is the area to show it! ToDo |
| 20 | + |
| 21 | +## A list of code dependencies. |
| 22 | + |
| 23 | +All Code dependencies are present in the file requirements.txt<br/> |
| 24 | +Run "pip install -r requirements.txt" to install all dependencies |
| 25 | + |
| 26 | +## Code Structure |
| 27 | +```bash |
| 28 | +. |
| 29 | +├── Coding_Guidelines.md |
| 30 | +├── configs |
| 31 | +│ ├── config.yaml |
| 32 | +│ ├── dataset.yaml |
| 33 | +│ └── text_config.yaml |
| 34 | +├── Dockerfile |
| 35 | +├── Errors_got.txt |
| 36 | +├── ideas.txt |
| 37 | +├── LICENSE |
| 38 | +├── main.py |
| 39 | +├── Misc |
| 40 | +├── README.md |
| 41 | +├── requirements.txt |
| 42 | +├── sonar-project.properties |
| 43 | +├── src |
| 44 | +│ ├── Dlmodel |
| 45 | +│ │ ├── Dlmodel.py |
| 46 | +│ │ ├── __pycache__ (Cache Folder generated by python) |
| 47 | +│ │ ├── TestOneImageD.py |
| 48 | +│ │ ├── TestOneImageRD.py |
| 49 | +│ │ ├── TestOneImageR.py |
| 50 | +│ │ ├── TestRD.py |
| 51 | +│ │ ├── TrainTestD.py |
| 52 | +│ │ └── TrainTestR.py |
| 53 | +│ ├── helper |
| 54 | +│ │ ├── logger.py |
| 55 | +│ │ ├── profiler.py |
| 56 | +│ │ ├── __pycache__ (Cache Folder generated by python) |
| 57 | +│ │ ├── read_yaml.py |
| 58 | +│ │ └── utils.py |
| 59 | +│ ├── loader |
| 60 | +│ │ ├── art.py |
| 61 | +│ │ ├── dete_loader.py |
| 62 | +│ │ ├── generic_dataloader.py |
| 63 | +│ │ ├── mnist.py |
| 64 | +│ │ ├── __pycache__ |
| 65 | +│ │ ├── reco_loader.py |
| 66 | +│ │ ├── scale_two.py |
| 67 | +│ │ └── square.py |
| 68 | +│ ├── model |
| 69 | +│ │ ├── crnn.py |
| 70 | +│ │ ├── densenet.py |
| 71 | +│ │ ├── generic_model.py |
| 72 | +│ │ ├── model_loader.py |
| 73 | +│ │ ├── __pycache__ (Cache Folder generated by python) |
| 74 | +│ │ ├── resnet_own.py |
| 75 | +│ │ └── trial.py (Under Development CRNN |
| 76 | +│ ├── pipeline_manager.py (Controls the flow of the repository) |
| 77 | +│ ├── prepare_metadata (Preprocessing steps to be performed before Training/Testing |
| 78 | +│ │ ├── meta_artificial.py (Prepare Metadata for artificial Dataset) |
| 79 | +│ │ ├── meta_coco.py (Prepare Metadata for COCO V2 Dataset) |
| 80 | +│ │ ├── meta_ic13.py (Prepare Metadata for IC13 Dataset) |
| 81 | +│ │ ├── meta_ic15.py (Prepare Metadata for IC15 Dataset) |
| 82 | +│ │ ├── meta_own.py (Prepare Metadata for OWN Dataset) |
| 83 | +│ │ ├── meta_synth.py (Prepare Metadata for SynthText Dataset) |
| 84 | +│ │ ├── prepare_metadata.py |
| 85 | +│ │ └── __pycache__ (Cache Folder generated by python) |
| 86 | +│ └── __pycache__ |
| 87 | +└── text.sublime-workspace |
| 88 | +``` |
| 89 | +
|
| 90 | +## Instructions to run the code |
| 91 | +
|
| 92 | +### Setting up the dataset |
| 93 | +
|
| 94 | + 1. In the configs/dataset.yaml file add your dataset in the following format under the field metadata |
| 95 | +
|
| 96 | + 1. <Name of the dataset> |
| 97 | + 1. dir:<Path-to-Dataset-Folder> |
| 98 | + 2. image: <Path-to-Dataset-Folder>/Images |
| 99 | + 3. label: <Path-to-Dataset-Folder>/Labels |
| 100 | + 4. meta: <Path-to-Dataset-Folder>/Meta |
| 101 | + 5. contour_length_thresh_min: <Contours with length less than this are excluded from training and testing> |
| 102 | + 6. contour_area_thresh_min: <Contours with area less than this are excluded from training and testing> |
| 103 | + 7. segmentation_thresh: <Confidence value over which pixel is classified as positive> |
| 104 | + 8. link_thresh: <Confidence value over which link is classified as positive> |
| 105 | + 9. cal_avg: <If True: Padding with Average of the image, else: Padding with Zeros> |
| 106 | + 10. split_type: <% of Training Images which is randomly picked from the dataset, remaining is used for validation> |
| 107 | +
|
| 108 | + 2. Put all your images in the *<Path-to-Dataset-Folder>/Images* folder |
| 109 | +
|
| 110 | + 3. Create Labels in the format - |
| 111 | + 1. Contours = List of all bounding box(dtype=np.float32, shape=[4, 1, 2](4, 2 for four co-ordinates with two dimensions)) |
| 112 | + 2. Text = List of all strings which have text corresponding to every Contour |
| 113 | + 3. Labels corresponding to every image would have the name <image-name.extension-of-image.pkl>. It will be a pickle dump of the list [Contours, Text] |
| 114 | +
|
| 115 | + 4. Save all the labels for the images in the folder *<Path-to-Dataset-Folder>/Labels* |
| 116 | +
|
| 117 | + 5. Create the folder *<Path-to-Dataset-Folder>/Meta* |
| 118 | +
|
| 119 | + 6. In the configs/dataset.yaml file put your dataset name in the field *dataset_train* and *dataset_test* |
| 120 | +
|
| 121 | + 7. Run python main.py prepare_metadata |
| 122 | +
|
| 123 | +### Training your own model(Detection) |
| 124 | + |
| 125 | + 1. The configs/config.yaml contains all the hyper-parameters for training the detection model. |
| 126 | + 2. After your dataset and config file is in place run the command `python main.py train_d` |
| 127 | +
|
| 128 | +### Testing your own model(Detection) |
| 129 | + |
| 130 | + 1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True |
| 131 | + 2. Configure the path of the model in the field "PreTrained_Model" |
| 132 | + 3. After your dataset and config file is in place run the command `python main.py test_d` |
| 133 | +
|
| 134 | +### Generate Visual Results on a single image |
| 135 | +
|
| 136 | + 1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True |
| 137 | + 2. Configure the path of the model in the field "PreTrained_Model" |
| 138 | + 3. Run the command `python main.py test_one_d -p <path-to-test-image> -o <path-to-folder-output>` |
| 139 | +
|
| 140 | +### Generate Visual Results on an entire folder |
| 141 | +
|
| 142 | + 1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True |
| 143 | + 2. Configure the path of the model in the field "PreTrained_Model" |
| 144 | + 3. Run the command `python main.py test_entire_folder_d -p <path-to-test-folder> -o <path-to-output-folder>` |
| 145 | +
|
| 146 | +## If your code requires a model that can't be provided on GitHub, store it somewhere else and provide a download link. ToDo |
| 147 | +
|
| 148 | +## Additional details, discussions, etc. ToDo |
| 149 | +
|
| 150 | +## References. |
| 151 | +* Deng, Dan, et al. "Pixellink: Detecting scene text via instance segmentation." Thirty-Second AAAI Conference on Artificial Intelligence. 2018. |
| 152 | +* Karatzas, Dimosthenis, et al. "ICDAR 2015 competition on robust reading." 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015. |
| 153 | +* VGG Synth Text in the wild: A. Gupta, A. Vedaldi, A. Zisserman "Synthetic Data for Text Localisation in Natural Images" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 |
| 154 | +* Ren, Mengye, and Richard S. Zemel. "End-to-end instance segmentation with recurrent attention." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. |
| 155 | +* Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015. |
2 | 156 |
|
3 | 157 | This repository is currently into active development. Do raise issues and we will solve them as soon as possible.
|
4 | 158 |
|
|
0 commit comments