A vectorized implementation of the image binarization algorithm of Su et al. (2010) using numpy. Example images from the DIBCO2009 dataset are provided in dibco2009
. The dataset consists of scans of handwritten and printed documents.
If the color channels are neatly orthogonal, it is possible to binarize each color channel individually. This is not recommended for the provided DIBCO2009 dataset.
./main.py split $IMAGE_FILE
Binarize an image using the Su et al. (2010) algorithm. The resulting file will be stored next to the input file.
./main.py binarize dibco2009/DIBC02009_Test_images-handwritten/dibco_img0004.tif
It is also possible to evaluate the binarization algorithm on either the handwritten or the printed portion of the provided DIBCO2009 dataset. The below command shows the F1
and PSNR
metrics on the handwritten documents.
./main.py evaluate dibco2009/DIBC02009_Test_images-handwritten
Again, the parameters of the algorithm can be changed by setting command line parameters (see main.py evaluate -h
).
A comparison of the implementation with default parameters and the values stated in Su et al. (2010) is given below:
Algorithm | F1 (%) | PSNR |
---|---|---|
Su et al. (2010) | 89.93 | 19.94 |
This | 86.01 | 18.37 |