Skip to content

A multi-label language identification dataset based on regional Indian languages. It contains 5 languages (Hindi, Bengali, Malayalam, Kannada, and English) with the presence of two scripts per image (implying the multi-linguality). The dataset is diverse in nature with the existence of curved, perspective distorted, and multi-oriented text in ad…

Notifications You must be signed in to change notification settings

Naosekpam/IIITG-MLRIT2022-Scene-Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IIITG-MLRIT2022

A multi-label language identification dataset based on regional Indian languages. It contains 5 languages (Hindi, Bengali, Malayalam, Kannada, and English) with the presence of two scripts per image (implying the multi-linguality). The dataset is diverse in nature with the existence of curved, perspective distorted, and multi-oriented text in addition to the horizontal text. This diversity is achieved by applying various image transformation techniques such as affine, arcs, and perspective distortion with different angular degrees. The dataset is harvested from multiple sources: captured from mobile cameras, existing datasets, and web sources.

alt text Fig. 1 : Sample examples from IIITG-MLRIT2022

Please cite the following papers if code or part of the code is used :

@article{naosekpam2023multi,
  title={Multi-label Indian scene text language identification},
  author={Naosekpam, Veronica and Sahu, Nilkanta},
  journal={Intelligent Systems and Applications in Computer Vision},
  year={2023},
  publisher={CRC Press}<
}

or

Naosekpam, Veronica, et al. "EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification." International Conference on Computer Analysis of Images and Patterns. Cham: Springer Nature Switzerland, 2023.

About

A multi-label language identification dataset based on regional Indian languages. It contains 5 languages (Hindi, Bengali, Malayalam, Kannada, and English) with the presence of two scripts per image (implying the multi-linguality). The dataset is diverse in nature with the existence of curved, perspective distorted, and multi-oriented text in ad…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published