Here is an example of how I am using this in the actual project.
import os
from predict import predict
from crack_detection_util.image_util import MyImage
def main():
folder = "./predict"
mi = MyImage(model_name='maskrcnn',chunk_size=512, folder=folder)
mi.clear_cache()
mi.batch_chunk_images()
mi.batch_predict_images(predict=predict)
mi.batch_merge_images()
if __name__ == '__main__':
main()
# The path structure is as follows
# .root/
# predict.py -- the predict function
# batch_predict.py --this file
# /predict
# /input -- put the images in this folder to be predicted
# /cache -- this folder is automatically generated
# /output -- this folder is automatically generated, get the outputs here
# ... -- other dependencies for predict
Here is an example of how I am using this in the actual project.
from crack_detection_util.evaluate_util import Evaluator
data = "example/evaluation"
dataset = "PCL7"
ev = Evaluator(input_folder=data, dataset=dataset)
ev.calculate_evaluation_metrics()
ev.merge_csv_files()
# The path structure is as follows
# .root/
# example.ipynb -- this file
# /example
# /evaluation
# /PCL7 -- name of dataset, put all masks files here as following rules
# /true -- put all true mask images(*.jpg) here
# /prediction -- put all prediction mask images(*.jpg) here, divided by prediction models
# /UNet
# /NestedUNet
# ...
# /... -- put more than 1 datasets here is ok
# PCL7_scores.csv -- this file is automatically generated including evaluation metrics for a single dataset and all prediction models
# scores.csv -- this file is automatically generated including evaluation metrics for all datasets and all prediction models (merged)
ChunkAndMerge is a class to chunk and merge the images.
chunk
function will chunk the image into blocks by the size of chunk_size(default: 512), naming them from the left-top to the right-bottom. When the size of an image cannot be evenly divided by chunk_size, it will be enlarged to the nearest size, and the blank areas will be filled with black pixels. info.json
is additionally created to restore the size of originally image.
merge
function will read the info.json
to merge the chunked image into a full one. If no modifications are made to the image, the output image will be consistent with the original image.
MyImage is a class to batch-process the images, including format-convert and predict workflow.
convert
function provides a way to convert images in the input folder to the specified format and save them in the output/timestamp-convert folder. However this function can be replaced with many tools, like ReNamer.
batch_chunk_images
, batch_predict_images
, batch_merge_images
are executed according to the chunk-predict-merge process, where the predict
method needs to be determined based on the actual model. This is an extremely simple and rough approach, often resulting in incomplete alignment of crack structures. You can use strategies such as overlap to make the predicted results structurally more accuarate.
During batch processing, some intermediate files are generated, which I call cache. They are saved by default in a single processing for easy inspection. You can also delete them using the clear_cache
function.
Evaluator is a class to evaluate the semantic segmentation model by metrics.
calculate_evaluation_metrics
calculates some common metrics like Mean Abusolute Error(MAE), Mean Intersection over Union(mIoU), mean Average Precision(mAP), Recall, and F-measure.
You need to provide a true mask set and several prediction mask sets in a single group of evalution, when the correspondence between images is correct, each model's prediction will output a set of metrics and restore them in a csv file.
I added a function evaluate_all_datasets
to handle multiple groups(but each group is independent of each other), and the final result will be horizontally concatenated and merged for quick viewing.
IMPORTANT
- All metrics here are macro-scores. That is to say, the function will calculate each image's scores, and calculate the average scores of all images. This method is simple and efficient when all images share the same size, and prediction result of a single image is more inportant(but not the whole dataset result).
- A pixel based tolerance coefficient
dilate_radius
is introduced: it will expand the entire real binary mask to a larger range, with a defaultdilate_radius
of 1, which means it will expand to a range of 3 x 3. This improvement aims to tolerate acceptable errors that are almost predicted correctly but still exist at the pixel level offset. If you want the result to be more pixel-accurate, then you should set the value to 0.
Firstly, calculate the confusion matrix between the predicted mask and the actual mask.
Then, calculate the metrics of a single picture:
This section of the algorithm is proposed by CrackTree: Automatic crack detection from pavement images.
This section uses the original image and marked cracks, and randomly selects some cracks for smooth filling to achieve the effect of data enhancement.
The process steps for a single image is as follows:
- Count the number of cracks n by finding contours in the binary mask.
- Calculate the Median RGB as bottom color by channels. (Obviously a 3d-median cannot be directly calculated)
- Randomly choose cracks.
- Dilate the cracks by 2 pixels radius to conclude better crack mask.
- Fill the cracks using Median RGB.
- Fix the contours using
cv2.INPAINT_TELEA
algorithm.
The simple algorithm is more effective when there is a significant difference between the crack image and the background, and there are not many cracks connected together or extremely close in position.
About Telea inpainting algorithm
The Telea inpainting algorithm is an image inpainting method developed by Alexandru Telea in 2004. Inpainting is the process of reconstructing lost or deteriorated parts of images and videos. The Telea method is based on the Fast Marching Method (FMM) and is considered a priority-based, or process-based, method.
The algorithm works by propagating information from the boundary of the missing region (the region to be inpainted) into the region itself, following the direction of minimal change (the isophotes). The information propagated is the pixel intensity, which is obtained by an edge-preserving interpolation from the known (non-missing) part of the image.
The Telea method is relatively fast and produces visually plausible results, making it suitable for interactive applications. However, like all inpainting methods, it may produce artifacts or unrealistic results if the missing region is too large or the surrounding texture is too complex.