IIITG-MLRIT2022

A multi-label language identification dataset based on regional Indian languages. It contains 5 languages (Hindi, Bengali, Malayalam, Kannada, and English) with the presence of two scripts per image (implying the multi-linguality). The dataset is diverse in nature with the existence of curved, perspective distorted, and multi-oriented text in addition to the horizontal text. This diversity is achieved by applying various image transformation techniques such as affine, arcs, and perspective distortion with different angular degrees. The dataset is harvested from multiple sources: captured from mobile cameras, existing datasets, and web sources.

Fig. 1 : Sample examples from IIITG-MLRIT2022

Please cite the following papers if code or part of the code is used :

@article{naosekpam2023multi,
  title={Multi-label Indian scene text language identification},
  author={Naosekpam, Veronica and Sahu, Nilkanta},
  journal={Intelligent Systems and Applications in Computer Vision},
  year={2023},
  publisher={CRC Press}<
}

or

Naosekpam, Veronica, et al. "EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification." International Conference on Computer Analysis of Images and Patterns. Cham: Springer Nature Switzerland, 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Data		Data
README.md		README.md
dataset11 (1).jpg		dataset11 (1).jpg
real_test.csv		real_test.csv
real_train.csv		real_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IIITG-MLRIT2022

Please cite the following papers if code or part of the code is used :

About

Releases

Packages

Naosekpam/IIITG-MLRIT2022-Scene-Text

Folders and files

Latest commit

History

Repository files navigation

IIITG-MLRIT2022

Please cite the following papers if code or part of the code is used :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages