Skip to content

[IWCP 24] MONSTERMASH: Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting

Notifications You must be signed in to change notification settings

taineleau/MONSTERMASH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MONSTERMASH

Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting

Examples of test images.

The data is collected from 10th centuries to 20th. Writing sytle included but not limited to Naskh (varied style), Nasta'liq, Maghribi, Sudani, Ta'liq and Bihari.

Time period count

Language count

For example, the follow image:

Examples of test images.

{
  "pid": 26,
 "title": "Hartford Seminary Arabic MSS 199/Jawāhiral-nuṣūṣ fī ḥall kalimāt al-Fuṣūṣ",	
"time": "18th century",
"language": "Arabic",
"style: "Naskh",
"note": "Complex title page with multiple ownersip notes etc and other annotations",
"polygons": [(x0, y0), (x1, y1), ..., ]

}

Usage

The dataset is under data/. We use labelme to annotate the dataset.


Three baseline models

kraken
  1. Install https://github.com/mittagessen/kraken, and we use 4.3 in our project.

  2. Download the model:

wget https://github.com/OpenITI/arabic_script_ocr_models/blob/main/ms_mellon_print_layout.mlmodel
  1. Run script at models/kraken_pipe.ipynb for inference code.
TESTR

Install instruction can be found: https://github.com/mlpc-ucsd/TESTR

doc-ufcn

Install instruction can be found: https://gitlab.teklia.com/dla/doc-ufcn (you may need to register first to visit their gitlab).

Checkpoints is available on huggingface: https://huggingface.co/Teklia/doc-ufcn-generic-historical-line

Citaion

@inproceedings{chen2024monstermash,
  title={{MONSTERMASH}: Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting},
  author={Chen, Danlu and Murel, Jacob and Taimoor Shahid and Xiang Zhang and Jonathan Parkes Allen and Taylor Berg-Kirkpatrick and and David A.Smith},
  booktitle={International Conference on Document Analysis and Recognition, IWCP workshop},
  year={2024},
  organization={Springer}
}

About

[IWCP 24] MONSTERMASH: Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published