Skip to content

Commit d6646b4

Browse files
author
Mayank
committed
merging
2 parents 11517e2 + 3e1b113 commit d6646b4

File tree

5 files changed

+545
-1
lines changed

5 files changed

+545
-1
lines changed

README.md

Lines changed: 155 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,158 @@
1-
#Pytorch Implementation of Pixel-LINK
1+
# Pytorch Implementation of [Pixel-LINK](https://arxiv.org/pdf/1801.01315.pdf)
2+
3+
## A brief abstract of your project including the problem statement and solution approach
4+
5+
We are attempting to detect all kinds of text in the wild. The technique used for text detection is based on the paper PixelLink: Detecting Scene Text via Instance Segmentation (https://arxiv.org/abs/1801.01315) by Deng et al. The text instances present in the scene images lie very close to each other, and it is challenging to distinguish them using semantic segmentation. So, there is a need of instance segmentation.
6+
7+
The approach consists of two key steps:
8+
a) Linking of pixels in the same text instance - Segmentation step,
9+
b) Text bounding box extraction using the linking done.
10+
11+
There are two kinds of predictions getting done here at each pixel level in the image:
12+
a) Text/non-text prediction,
13+
b) Link prediction.
14+
15+
This approach sets it apart from other kinds of methodologies used so far for text detection. Before PixelLink, the SOTA approaches on text detection does two kinds of prediction: a) Text/non-text prediction, b) Location Regression. Here both of these predictions are made at one go taking many fewer number of iterations and less training data.
16+
17+
## Demo([Youtube Link](https://www.youtube.com/watch?v=3d3J0kH3u6c))
18+
19+
## Results: If numerical, mention them in tabular format. If visual, display. If you've done a great project, this is the area to show it! ToDo
20+
21+
## A list of code dependencies.
22+
23+
All Code dependencies are present in the file requirements.txt<br/>
24+
Run "pip install -r requirements.txt" to install all dependencies
25+
26+
## Code Structure
27+
```bash
28+
.
29+
├── Coding_Guidelines.md
30+
├── configs
31+
│   ├── config.yaml
32+
│   ├── dataset.yaml
33+
│   └── text_config.yaml
34+
├── Dockerfile
35+
├── Errors_got.txt
36+
├── ideas.txt
37+
├── LICENSE
38+
├── main.py
39+
├── Misc
40+
├── README.md
41+
├── requirements.txt
42+
├── sonar-project.properties
43+
├── src
44+
│   ├── Dlmodel
45+
│   │   ├── Dlmodel.py
46+
│   │   ├── __pycache__ (Cache Folder generated by python)
47+
│   │   ├── TestOneImageD.py
48+
│   │   ├── TestOneImageRD.py
49+
│   │   ├── TestOneImageR.py
50+
│   │   ├── TestRD.py
51+
│   │   ├── TrainTestD.py
52+
│   │   └── TrainTestR.py
53+
│   ├── helper
54+
│   │   ├── logger.py
55+
│   │   ├── profiler.py
56+
│   │   ├── __pycache__ (Cache Folder generated by python)
57+
│   │   ├── read_yaml.py
58+
│   │   └── utils.py
59+
│   ├── loader
60+
│   │   ├── art.py
61+
│   │   ├── dete_loader.py
62+
│   │   ├── generic_dataloader.py
63+
│   │   ├── mnist.py
64+
│   │   ├── __pycache__
65+
│   │   ├── reco_loader.py
66+
│   │   ├── scale_two.py
67+
│   │   └── square.py
68+
│   ├── model
69+
│   │   ├── crnn.py
70+
│   │   ├── densenet.py
71+
│   │   ├── generic_model.py
72+
│   │   ├── model_loader.py
73+
│   │   ├── __pycache__ (Cache Folder generated by python)
74+
│   │   ├── resnet_own.py
75+
│   │   └── trial.py (Under Development CRNN
76+
│   ├── pipeline_manager.py (Controls the flow of the repository)
77+
│   ├── prepare_metadata (Preprocessing steps to be performed before Training/Testing
78+
│   │   ├── meta_artificial.py (Prepare Metadata for artificial Dataset)
79+
│   │   ├── meta_coco.py (Prepare Metadata for COCO V2 Dataset)
80+
│   │   ├── meta_ic13.py (Prepare Metadata for IC13 Dataset)
81+
│   │   ├── meta_ic15.py (Prepare Metadata for IC15 Dataset)
82+
│   │   ├── meta_own.py (Prepare Metadata for OWN Dataset)
83+
│   │   ├── meta_synth.py (Prepare Metadata for SynthText Dataset)
84+
│   │   ├── prepare_metadata.py
85+
│   │   └── __pycache__ (Cache Folder generated by python)
86+
│   └── __pycache__
87+
└── text.sublime-workspace
88+
```
89+
90+
## Instructions to run the code
91+
92+
### Setting up the dataset
93+
94+
1. In the configs/dataset.yaml file add your dataset in the following format under the field metadata
95+
96+
1. <Name of the dataset>
97+
1. dir:<Path-to-Dataset-Folder>
98+
2. image: <Path-to-Dataset-Folder>/Images
99+
3. label: <Path-to-Dataset-Folder>/Labels
100+
4. meta: <Path-to-Dataset-Folder>/Meta
101+
5. contour_length_thresh_min: <Contours with length less than this are excluded from training and testing>
102+
6. contour_area_thresh_min: <Contours with area less than this are excluded from training and testing>
103+
7. segmentation_thresh: <Confidence value over which pixel is classified as positive>
104+
8. link_thresh: <Confidence value over which link is classified as positive>
105+
9. cal_avg: <If True: Padding with Average of the image, else: Padding with Zeros>
106+
10. split_type: <% of Training Images which is randomly picked from the dataset, remaining is used for validation>
107+
108+
2. Put all your images in the *<Path-to-Dataset-Folder>/Images* folder
109+
110+
3. Create Labels in the format -
111+
1. Contours = List of all bounding box(dtype=np.float32, shape=[4, 1, 2](4, 2 for four co-ordinates with two dimensions))
112+
2. Text = List of all strings which have text corresponding to every Contour
113+
3. Labels corresponding to every image would have the name <image-name.extension-of-image.pkl>. It will be a pickle dump of the list [Contours, Text]
114+
115+
4. Save all the labels for the images in the folder *<Path-to-Dataset-Folder>/Labels*
116+
117+
5. Create the folder *<Path-to-Dataset-Folder>/Meta*
118+
119+
6. In the configs/dataset.yaml file put your dataset name in the field *dataset_train* and *dataset_test*
120+
121+
7. Run python main.py prepare_metadata
122+
123+
### Training your own model(Detection)
124+
125+
1. The configs/config.yaml contains all the hyper-parameters for training the detection model.
126+
2. After your dataset and config file is in place run the command `python main.py train_d`
127+
128+
### Testing your own model(Detection)
129+
130+
1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True
131+
2. Configure the path of the model in the field "PreTrained_Model"
132+
3. After your dataset and config file is in place run the command `python main.py test_d`
133+
134+
### Generate Visual Results on a single image
135+
136+
1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True
137+
2. Configure the path of the model in the field "PreTrained_Model"
138+
3. Run the command `python main.py test_one_d -p <path-to-test-image> -o <path-to-folder-output>`
139+
140+
### Generate Visual Results on an entire folder
141+
142+
1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True
143+
2. Configure the path of the model in the field "PreTrained_Model"
144+
3. Run the command `python main.py test_entire_folder_d -p <path-to-test-folder> -o <path-to-output-folder>`
145+
146+
## If your code requires a model that can't be provided on GitHub, store it somewhere else and provide a download link. ToDo
147+
148+
## Additional details, discussions, etc. ToDo
149+
150+
## References.
151+
* Deng, Dan, et al. "Pixellink: Detecting scene text via instance segmentation." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
152+
* Karatzas, Dimosthenis, et al. "ICDAR 2015 competition on robust reading." 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015.
153+
* VGG Synth Text in the wild: A. Gupta, A. Vedaldi, A. Zisserman "Synthetic Data for Text Localisation in Natural Images" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
154+
* Ren, Mengye, and Richard S. Zemel. "End-to-end instance segmentation with recurrent attention." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
155+
* Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
2156
3157
This repository is currently into active development. Do raise issues and we will solve them as soon as possible.
4158

src/model/u_net_resnet_50_encoder.py

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
import torchvision
2+
from .u_net_resnet_50_parts import *
3+
import torch
4+
import torch.nn as nn
5+
import torch.nn.functional as F
6+
import torch.optim as optim
7+
from .generic_model import model
8+
from src.helper.logger import Logger
9+
10+
log = Logger()
11+
12+
import random
13+
14+
class UNetWithResnet50Encoder(model):
15+
16+
def __init__(self, config, mode, profiler):
17+
18+
super().__init__()
19+
20+
self.config = config
21+
22+
if not self.config['hard_negative_mining']:
23+
self.class_weight = torch.FloatTensor(self.config['class_weight']).cuda()
24+
25+
self.seed()
26+
self.profiler = profiler
27+
self.DEPTH = 5 #3, 4, 5
28+
if config['lossf'] == 'CEL':
29+
if self.config['link']:
30+
self.classes = 18
31+
else:
32+
self.classes = 2
33+
34+
self.loss_name = self.config['lossf']
35+
36+
self.channel_depth = [64, 256, 512, 1024, 2048, 1024, 512, 256, 64, self.classes]
37+
38+
profiler(self.define_architecture, profiler_type="once")
39+
40+
self.prev_lr = config['lr'][1]
41+
42+
if mode == 'train':
43+
44+
if self.config['optimizer'] == 'Adam':
45+
log.info('Using Adam optimizer')
46+
self.opt = optim.Adam(self.parameters(), lr=config['lr'][1], weight_decay=config['weight_decay'])
47+
48+
elif self.config['optimizer'] == 'SGD':
49+
log.info('Using SGD optimizer')
50+
self.opt = optim.SGD(self.parameters(), lr=config['lr'][1], momentum=config['momentum'], weight_decay=config['weight_decay'])
51+
52+
if config['lossf'] == 'CEL':
53+
log.info('Using CEL')
54+
self.lossf = nn.CrossEntropyLoss(reduction='none')
55+
56+
def define_architecture(self):
57+
58+
resnet = torchvision.models.resnet.resnet50(pretrained=self.config['PreTrained_net'])
59+
60+
self.input_block = nn.Sequential(*list(resnet.children()))[:3]
61+
self.input_pool = list(resnet.children())[3]
62+
down_blocks = []
63+
for bottleneck in list(resnet.children()):
64+
if isinstance(bottleneck, nn.Sequential):
65+
down_blocks.append(bottleneck)
66+
self.down_blocks = nn.ModuleList(down_blocks[0:self.DEPTH+1])
67+
del down_blocks
68+
69+
self.bridge = Bridge(self.channel_depth[self.DEPTH - 1], self.channel_depth[9-self.DEPTH])
70+
71+
up_blocks = []
72+
up_blocks.append(UpBlockForUNetWithResNet50(2048, 1024))
73+
up_blocks.append(UpBlockForUNetWithResNet50(1024, 512))
74+
up_blocks.append(UpBlockForUNetWithResNet50(512, 256))
75+
up_blocks.append(UpBlockForUNetWithResNet50(in_channels=128 + 64, out_channels=128,
76+
up_conv_in_channels=256, up_conv_out_channels=128))
77+
up_blocks.append(UpBlockForUNetWithResNet50(in_channels=64 + 3, out_channels=64,
78+
up_conv_in_channels=128, up_conv_out_channels=64))
79+
self.up_blocks = nn.ModuleList(up_blocks[5-self.DEPTH:])
80+
del up_blocks
81+
82+
self.out = nn.Conv2d(64, self.classes, kernel_size=1, stride=1)
83+
self.output = nn.Softmax(dim=1)
84+
85+
#UNetResNet architecture constructor
86+
87+
def forward(self, x_big, with_output_feature_map=False):
88+
89+
output_list = []
90+
if with_output_feature_map:
91+
output_feature_map_list = []
92+
93+
for no, x in enumerate(x_big):
94+
95+
pre_pools = dict()
96+
pre_pools["layer_0"] = x
97+
x = self.input_block(x)
98+
pre_pools["layer_1"] = x
99+
x = self.input_pool(x)
100+
x = self.down_blocks[0](x)
101+
pre_pools["layer_"+str(2)] = x
102+
103+
for i, block in enumerate(self.down_blocks[1:], 3):
104+
x = block(x)
105+
106+
if i == (self.DEPTH):
107+
break
108+
pre_pools["layer_"+str(i)] = x
109+
110+
x = self.bridge(x)
111+
112+
for i, block in enumerate(self.up_blocks, 1):
113+
114+
key = "layer_"+str(self.DEPTH - i)
115+
x = block(x, pre_pools[key])
116+
117+
output_feature_map = x
118+
x = self.out(x)
119+
del pre_pools
120+
121+
if with_output_feature_map:
122+
output_list.append(x), output_feature_map_list.append(output_feature_map)
123+
else:
124+
output_list.append(x)
125+
126+
if with_output_feature_map:
127+
return output_list, output_feature_map_list
128+
else:
129+
return output_list
130+
131+
def __name__(self):
132+
133+
return 'ResNet_UNet_forward_pass'
134+
135+
#Returns string 'ResNet_UNet_forward_Pass' when called

src/model/u_net_resnet_50_parts.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
import torchvision
2+
import torch.nn as nn
3+
import torch
4+
class ConvBlock(nn.Module):
5+
"""
6+
Helper module that consists of a Conv -> BN -> ReLU
7+
"""
8+
9+
def __init__(self, in_channels, out_channels, padding=1, kernel_size=3, stride=1, with_nonlinearity=True):
10+
super().__init__()
11+
self.conv = nn.Conv2d(in_channels, out_channels, padding=padding, kernel_size=kernel_size, stride=stride)
12+
self.bn = nn.BatchNorm2d(out_channels)
13+
self.relu = nn.ReLU()
14+
self.with_nonlinearity = with_nonlinearity
15+
16+
def forward(self, x):
17+
x = self.conv(x)
18+
x = self.bn(x)
19+
if self.with_nonlinearity:
20+
x = self.relu(x)
21+
return x
22+
23+
#Forward propgation and constructor
24+
25+
26+
class Bridge(nn.Module):
27+
"""
28+
This is the middle layer of the UNet which just consists of some
29+
"""
30+
31+
def __init__(self, in_channels, out_channels):
32+
super().__init__()
33+
self.bridge = nn.Sequential(
34+
ConvBlock(in_channels, out_channels),
35+
ConvBlock(out_channels, out_channels)
36+
)
37+
38+
def forward(self, x):
39+
return self.bridge(x)
40+
41+
42+
class UpBlockForUNetWithResNet50(nn.Module):
43+
"""
44+
Up block that encapsulates one up-sampling step which consists of Upsample -> ConvBlock -> ConvBlock
45+
"""
46+
47+
def __init__(self, in_channels, out_channels, up_conv_in_channels=None, up_conv_out_channels=None,
48+
upsampling_method="conv_transpose"):
49+
super().__init__()
50+
51+
if up_conv_in_channels == None:
52+
up_conv_in_channels = in_channels
53+
if up_conv_out_channels == None:
54+
up_conv_out_channels = out_channels
55+
56+
if upsampling_method == "conv_transpose":
57+
self.upsample = nn.ConvTranspose2d(up_conv_in_channels, up_conv_out_channels, kernel_size=2, stride=2)
58+
elif upsampling_method == "bilinear":
59+
self.upsample = nn.Sequential(
60+
nn.Upsample(mode='bilinear', scale_factor=2),
61+
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1)
62+
)
63+
self.conv_block_1 = ConvBlock(in_channels, out_channels)
64+
self.conv_block_2 = ConvBlock(out_channels, out_channels)
65+
66+
def forward(self, up_x, down_x):
67+
"""
68+
69+
:param up_x: this is the output from the previous up block
70+
:param down_x: this is the output from the down block
71+
:return: upsampled feature map
72+
"""
73+
# print('befcat', up_x.shape, down_x.shape)
74+
x = self.upsample(up_x)
75+
x = torch.cat([x, down_x], 1)
76+
# print('cat', x.shape)
77+
x = self.conv_block_1(x)
78+
x = self.conv_block_2(x)
79+
return x

0 commit comments

Comments
 (0)