Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Latest commit

 

History

History
23 lines (17 loc) · 427 Bytes

README.md

File metadata and controls

23 lines (17 loc) · 427 Bytes

Data Processing Toolkit in GluonNLP

We provide a bunch of data

Clean and Tokenize a Parallel Corpus

To clean and tokenize a parallel corpus, use

nlp_process clean_tok_para_corpus --help

Learn a subword model

To learn a subword tokenizer, use

nlp_process learn_subword --help

Apply the learned subword model

To apply the learned subword tokenizer, user

nlp_process apply_subword --help