mayank-git-hub
diff --git a/‎README.md
Lines changed: 155 additions & 1 deletion b/‎README.md
Lines changed: 155 additions & 1 deletion
diff --git a/‎src/model/u_net_resnet_50_encoder.py
Lines changed: 135 additions & 0 deletions b/‎src/model/u_net_resnet_50_encoder.py
Lines changed: 135 additions & 0 deletions
diff --git a/‎src/model/u_net_resnet_50_parts.py
Lines changed: 79 additions & 0 deletions b/‎src/model/u_net_resnet_50_parts.py
Lines changed: 79 additions & 0 deletions
@@ -1,4 +1,158 @@
-#Pytorch Implementation of Pixel-LINK
+# Pytorch Implementation of [Pixel-LINK](https://arxiv.org/pdf/1801.01315.pdf)
+
+## A brief abstract of your project including the problem statement and solution approach
+
+We are attempting to detect all kinds of text in the wild. The technique used for text detection is based on the paper PixelLink: Detecting Scene Text via Instance Segmentation (https://arxiv.org/abs/1801.01315) by Deng et al. The text instances present in the scene images lie very close to each other, and it is challenging to distinguish them using semantic segmentation. So, there is a need of instance segmentation. 
+
+The approach consists of two key steps: 
+a) Linking of pixels in the same text instance - Segmentation step, 
+b) Text bounding box extraction using the linking done.
+
+There are two kinds of predictions getting done here at each pixel level in the image: 
+a) Text/non-text prediction, 
+b) Link prediction.
+
+This approach sets it apart from other kinds of methodologies used so far for text detection. Before PixelLink, the SOTA approaches on text detection does two kinds of prediction: a) Text/non-text prediction, b) Location Regression. Here both of these predictions are made at one go taking many fewer number of iterations and less training data.
+
+## Demo([Youtube Link](https://www.youtube.com/watch?v=3d3J0kH3u6c))
+
+## Results: If numerical, mention them in tabular format. If visual, display. If you've done a great project, this is the area to show it! ToDo
+
+## A list of code dependencies.
+
+All Code dependencies are present in the file requirements.txt<br/>
+Run "pip install -r requirements.txt" to install all dependencies
+
+## Code Structure
+```bash
+.
+├── Coding_Guidelines.md
+├── configs
+│   ├── config.yaml
+│   ├── dataset.yaml
+│   └── text_config.yaml
+├── Dockerfile
+├── Errors_got.txt
+├── ideas.txt
+├── LICENSE
+├── main.py
+├── Misc
+├── README.md
+├── requirements.txt
+├── sonar-project.properties
+├── src
+│   ├── Dlmodel
+│   │   ├── Dlmodel.py
+│   │   ├── __pycache__ (Cache Folder generated by python)
+│   │   ├── TestOneImageD.py
+│   │   ├── TestOneImageRD.py
+│   │   ├── TestOneImageR.py
+│   │   ├── TestRD.py
+│   │   ├── TrainTestD.py
+│   │   └── TrainTestR.py
+│   ├── helper
+│   │   ├── logger.py
+│   │   ├── profiler.py
+│   │   ├── __pycache__ (Cache Folder generated by python)
+│   │   ├── read_yaml.py
+│   │   └── utils.py
+│   ├── loader
+│   │   ├── art.py
+│   │   ├── dete_loader.py
+│   │   ├── generic_dataloader.py
+│   │   ├── mnist.py
+│   │   ├── __pycache__
+│   │   ├── reco_loader.py
+│   │   ├── scale_two.py
+│   │   └── square.py
+│   ├── model
+│   │   ├── crnn.py
+│   │   ├── densenet.py
+│   │   ├── generic_model.py
+│   │   ├── model_loader.py
+│   │   ├── __pycache__ (Cache Folder generated by python)
+│   │   ├── resnet_own.py
+│   │   └── trial.py (Under Development CRNN
+│   ├── pipeline_manager.py (Controls the flow of the repository)
+│   ├── prepare_metadata (Preprocessing steps to be performed before Training/Testing
+│   │   ├── meta_artificial.py (Prepare Metadata for artificial Dataset)
+│   │   ├── meta_coco.py (Prepare Metadata for COCO V2 Dataset)
+│   │   ├── meta_ic13.py (Prepare Metadata for IC13 Dataset)
+│   │   ├── meta_ic15.py (Prepare Metadata for IC15 Dataset)
+│   │   ├── meta_own.py (Prepare Metadata for OWN Dataset)
+│   │   ├── meta_synth.py (Prepare Metadata for SynthText Dataset)
+│   │   ├── prepare_metadata.py
+│   │   └── __pycache__ (Cache Folder generated by python)
+│   └── __pycache__
+└── text.sublime-workspace
+```
+
+## Instructions to run the code
+
+### Setting up the dataset
+
+	1. In the configs/dataset.yaml file add your dataset in the following format under the field metadata
+
+		1. <Name of the dataset>
+			1. dir:<Path-to-Dataset-Folder> 
+			2. image: <Path-to-Dataset-Folder>/Images
+			3. label: <Path-to-Dataset-Folder>/Labels
+			4. meta: <Path-to-Dataset-Folder>/Meta
+			5. contour_length_thresh_min: <Contours with length less than this are excluded from training and testing>
+			6. contour_area_thresh_min: <Contours with area less than this are excluded from training and testing>
+			7. segmentation_thresh: <Confidence value over which pixel is classified as positive>
+    		8. link_thresh: <Confidence value over which link is classified as positive>
+    		9. cal_avg: <If True: Padding with Average of the image, else: Padding with Zeros>
+    		10. split_type: <% of Training Images which is randomly picked from the dataset, remaining is used for validation>
+
+	2. Put all your images in the *<Path-to-Dataset-Folder>/Images* folder
+
+	3. Create Labels in the format - 
+		1. Contours = List of all bounding box(dtype=np.float32, shape=[4, 1, 2](4, 2 for four co-ordinates with two dimensions))
+		2. Text = List of all strings which have text corresponding to every Contour
+		3. Labels corresponding to every image would have the name <image-name.extension-of-image.pkl>. It will be a pickle dump of the list [Contours, Text]
+
+	4. Save all the labels for the images in the folder *<Path-to-Dataset-Folder>/Labels*
+
+	5. Create the folder *<Path-to-Dataset-Folder>/Meta*
+
+	6. In the configs/dataset.yaml file put your dataset name in the field *dataset_train* and *dataset_test*
+
+	7. Run python main.py prepare_metadata
+
+### Training your own model(Detection)
+	
+	1. The configs/config.yaml contains all the hyper-parameters for training the detection model.
+	2. After your dataset and config file is in place run the command `python main.py train_d`
+
+### Testing your own model(Detection)
+	
+	1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True
+	2. Configure the path of the model in the field "PreTrained_Model"
+	3. After your dataset and config file is in place run the command `python main.py test_d`
+
+### Generate Visual Results on a single image
+
+	1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True
+	2. Configure the path of the model in the field "PreTrained_Model"
+	3. Run the command `python main.py test_one_d -p <path-to-test-image> -o <path-to-folder-output>`
+
+### Generate Visual Results on an entire folder
+
+	1. In the configs/config.yaml in the field "PreTrained_model" change the value of the field "check" to True
+	2. Configure the path of the model in the field "PreTrained_Model"
+	3. Run the command `python main.py test_entire_folder_d -p <path-to-test-folder> -o <path-to-output-folder>`
+
+## If your code requires a model that can't be provided on GitHub, store it somewhere else and provide a download link. ToDo
+
+## Additional details, discussions, etc. ToDo
+
+## References.
+* Deng, Dan, et al. "Pixellink: Detecting scene text via instance segmentation." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
+* Karatzas, Dimosthenis, et al. "ICDAR 2015 competition on robust reading." 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015.
+* VGG Synth Text in the wild: A. Gupta, A. Vedaldi, A. Zisserman "Synthetic Data for Text Localisation in Natural Images" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
+* Ren, Mengye, and Richard S. Zemel. "End-to-end instance segmentation with recurrent attention." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
+* Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
 
 This repository is currently into active development. Do raise issues and we will solve them as soon as possible.
 
 
@@ -0,0 +1,135 @@
+import torchvision
+from .u_net_resnet_50_parts import *
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+from .generic_model import model
+from src.helper.logger import Logger
+
+log = Logger()
+
+import random
+
+class UNetWithResnet50Encoder(model):
+
+	def __init__(self, config, mode, profiler):
+		
+		super().__init__()
+
+		self.config = config
+
+		if not self.config['hard_negative_mining']:
+			self.class_weight = torch.FloatTensor(self.config['class_weight']).cuda()
+			
+		self.seed()
+		self.profiler = profiler
+		self.DEPTH = 5 #3, 4, 5
+		if config['lossf'] == 'CEL':
+			if self.config['link']:
+				self.classes = 18
+			else:
+				self.classes = 2
+
+		self.loss_name = self.config['lossf']
+
+		self.channel_depth = [64, 256, 512, 1024, 2048, 1024, 512, 256, 64, self.classes]
+
+		profiler(self.define_architecture, profiler_type="once")
+		
+		self.prev_lr = config['lr'][1]
+
+		if mode == 'train':
+
+			if self.config['optimizer'] == 'Adam':
+				log.info('Using Adam optimizer')
+				self.opt = optim.Adam(self.parameters(), lr=config['lr'][1], weight_decay=config['weight_decay'])
+			
+			elif self.config['optimizer'] == 'SGD':
+				log.info('Using SGD optimizer')
+				self.opt = optim.SGD(self.parameters(), lr=config['lr'][1], momentum=config['momentum'], weight_decay=config['weight_decay'])
+
+		if config['lossf'] == 'CEL':
+			log.info('Using CEL')
+			self.lossf = nn.CrossEntropyLoss(reduction='none')
+
+	def define_architecture(self):
+
+		resnet = torchvision.models.resnet.resnet50(pretrained=self.config['PreTrained_net'])
+		
+		self.input_block = nn.Sequential(*list(resnet.children()))[:3]
+		self.input_pool = list(resnet.children())[3]
+		down_blocks = []
+		for bottleneck in list(resnet.children()):
+			if isinstance(bottleneck, nn.Sequential):
+				down_blocks.append(bottleneck)
+		self.down_blocks = nn.ModuleList(down_blocks[0:self.DEPTH+1])
+		del down_blocks
+
+		self.bridge = Bridge(self.channel_depth[self.DEPTH - 1], self.channel_depth[9-self.DEPTH])
+
+		up_blocks = []
+		up_blocks.append(UpBlockForUNetWithResNet50(2048, 1024))
+		up_blocks.append(UpBlockForUNetWithResNet50(1024, 512))
+		up_blocks.append(UpBlockForUNetWithResNet50(512, 256))
+		up_blocks.append(UpBlockForUNetWithResNet50(in_channels=128 + 64, out_channels=128,
+													up_conv_in_channels=256, up_conv_out_channels=128))
+		up_blocks.append(UpBlockForUNetWithResNet50(in_channels=64 + 3, out_channels=64,
+													up_conv_in_channels=128, up_conv_out_channels=64))
+		self.up_blocks = nn.ModuleList(up_blocks[5-self.DEPTH:])
+		del up_blocks
+		
+		self.out = nn.Conv2d(64, self.classes, kernel_size=1, stride=1)
+		self.output = nn.Softmax(dim=1)
+
+		#UNetResNet architecture constructor
+
+	def forward(self, x_big, with_output_feature_map=False):
+
+		output_list = []
+		if with_output_feature_map:
+			output_feature_map_list = []
+
+		for no, x in enumerate(x_big):
+
+			pre_pools = dict()
+			pre_pools["layer_0"] = x
+			x = self.input_block(x)
+			pre_pools["layer_1"] = x
+			x = self.input_pool(x)
+			x = self.down_blocks[0](x)
+			pre_pools["layer_"+str(2)] = x
+
+			for i, block in enumerate(self.down_blocks[1:], 3):
+				x = block(x)
+				
+				if i == (self.DEPTH):
+					break
+				pre_pools["layer_"+str(i)] = x
+
+			x = self.bridge(x)
+
+			for i, block in enumerate(self.up_blocks, 1):
+				
+				key = "layer_"+str(self.DEPTH - i)
+				x = block(x, pre_pools[key])
+
+			output_feature_map = x
+			x = self.out(x)
+			del pre_pools
+
+			if with_output_feature_map:
+				output_list.append(x), output_feature_map_list.append(output_feature_map)
+			else:
+				output_list.append(x)
+
+		if with_output_feature_map:
+			return output_list, output_feature_map_list
+		else:
+			return output_list
+
+	def __name__(self):
+
+		return 'ResNet_UNet_forward_pass'
+
+		#Returns string 'ResNet_UNet_forward_Pass' when called
@@ -0,0 +1,79 @@
+import torchvision
+import torch.nn as nn
+import torch
+class ConvBlock(nn.Module):
+    """
+    Helper module that consists of a Conv -> BN -> ReLU
+    """
+
+    def __init__(self, in_channels, out_channels, padding=1, kernel_size=3, stride=1, with_nonlinearity=True):
+        super().__init__()
+        self.conv = nn.Conv2d(in_channels, out_channels, padding=padding, kernel_size=kernel_size, stride=stride)
+        self.bn = nn.BatchNorm2d(out_channels)
+        self.relu = nn.ReLU()
+        self.with_nonlinearity = with_nonlinearity
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        if self.with_nonlinearity:
+            x = self.relu(x)
+        return x
+    
+    #Forward propgation and constructor
+
+
+class Bridge(nn.Module):
+    """
+    This is the middle layer of the UNet which just consists of some
+    """
+
+    def __init__(self, in_channels, out_channels):
+        super().__init__()
+        self.bridge = nn.Sequential(
+            ConvBlock(in_channels, out_channels),
+            ConvBlock(out_channels, out_channels)
+        )
+
+    def forward(self, x):
+        return self.bridge(x)
+
+
+class UpBlockForUNetWithResNet50(nn.Module):
+    """
+    Up block that encapsulates one up-sampling step which consists of Upsample -> ConvBlock -> ConvBlock
+    """
+
+    def __init__(self, in_channels, out_channels, up_conv_in_channels=None, up_conv_out_channels=None,
+                 upsampling_method="conv_transpose"):
+        super().__init__()
+
+        if up_conv_in_channels == None:
+            up_conv_in_channels = in_channels
+        if up_conv_out_channels == None:
+            up_conv_out_channels = out_channels
+
+        if upsampling_method == "conv_transpose":
+            self.upsample = nn.ConvTranspose2d(up_conv_in_channels, up_conv_out_channels, kernel_size=2, stride=2)
+        elif upsampling_method == "bilinear":
+            self.upsample = nn.Sequential(
+                nn.Upsample(mode='bilinear', scale_factor=2),
+                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1)
+            )
+        self.conv_block_1 = ConvBlock(in_channels, out_channels)
+        self.conv_block_2 = ConvBlock(out_channels, out_channels)
+
+    def forward(self, up_x, down_x):
+        """
+
+        :param up_x: this is the output from the previous up block
+        :param down_x: this is the output from the down block
+        :return: upsampled feature map
+        """
+        # print('befcat', up_x.shape, down_x.shape)
+        x = self.upsample(up_x)
+        x = torch.cat([x, down_x], 1)
+        # print('cat', x.shape)
+        x = self.conv_block_1(x)
+        x = self.conv_block_2(x)
+        return x