DS201 / Deep Learning

About

This is a college course project about Diacritic restoration problem in Vietnamese text using Deep learning based models.
Techniques applied:

Word tokenizations

Models: RNN-LSTM, GRU, Bidirectional RNN approach

Metric: Bleu-score and accuracy

Data source

A Large-scale Vietnamese News Text Classification Corpus
This dataset was used in the following paper:

A Comparative Study on Vietnamese Text Classification Methods Cong Duy Vu Hoang, Dien Dinh, Le Nguyen Nguyen, Quoc Hung Ngo. In Proceedings of IEEE International Conference on Research, Innovation and Vision for the Future (RIVF 2007) (long), 2007.

Data preprocessing

The source data is split into single sentences giving a dataset of 500,000 data points.

Code

Feature extraction and models training (and so on) in this repo are implemented in Google Colab.
All codes are organized in name.ipynb files.

Report

References

All references are cited in the report file.

For citation

@INPROCEEDINGS{9530818,
  author={Tran, Quang-Linh and Lam, Gia-Huy and Duong, Van-Binh and Do, Trong-Hop},
  booktitle={2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)},
  title={A Study on Diacritic Restoration Problem in Vietnamese Text using Deep Learning based Models},
  year={2021},  volume={},  number={},  pages={306-310},  doi={10.1109/COMNETSAT53002.2021.9530818}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
news_test		news_test
news_train		news_train
README.md		README.md
evaluation_result_token_1.csv		evaluation_result_token_1.csv
evaluation_result_token_2.csv		evaluation_result_token_2.csv
make_data.ipynb		make_data.ipynb
report.pdf		report.pdf
vn_report.pdf		vn_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DS201 / Deep Learning

About

Table of contents

Data source

Data preprocessing

Code

Report

References

For citation

About

Releases

Packages

Languages

binhfdv/DS201-Deeplearning

Folders and files

Latest commit

History

Repository files navigation

DS201 / Deep Learning

About

Table of contents

Data source

Data preprocessing

Code

Report

References

For citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages