This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
v0.4.1
Highlights
Models
- Language Model
- The Large Scale Word Language Model as introduced by Jozefowicz, Rafal, et al. “Exploring the limits of language modeling”. arXiv preprint arXiv:1602.02410 (2016) achieved test PPL 43.62 on GBW dataset (#179 #270 #277 #278 #286 #294)
- The NT-ASGD based Language Model as introduced by Merity, S., et al. “Regularizing and optimizing LSTM language models”. ICLR 2018 achieved test PPL 65.62 on WikiText-2 dataset (#170)
- Document Classification
- The Classification Model as introduced by Joulin, Armand, et al. “Bag of tricks for efficient text classification” achieved validation accuracy validation accuracy 98 on Yelp review dataset (#258 #297)
- Question Answering
New Tutorials
- Machine Translation
- The Google NMT as introduced by Wu, Yonghui, et al. “Google's neural machine translation system:
Bridging the gap between human and machine translation”. arXiv preprint arXiv:1609.08144 (2016) is introduced as part of the gluonnlp tutorial (#261) - The Transformer based Machine Translation by Vaswani, Ashish, et al. “Attention is all you need.” Advances in Neural Information Processing Systems. 2017 is introduced as part of the gluonnlp tutorial (#279)
- The Google NMT as introduced by Wu, Yonghui, et al. “Google's neural machine translation system:
- Sentence Embedding
- A Structured Self-attentive Sentence Embedding (#366) by Z. Lin, M. Feng, C. Santos, M. Yu, B. Xiang, B. Zhou, Y. Bengio, "A Structured Self-attentive Sentence Embedding" ICLR 2017 is introduced in gluonnlp tutorial (#366)
New Datasets
- Word Embedding
- Wikipedia (#218)
- Fil9 dataset(#363)
- FastText crawl-300d-2M-subword(#336), wiki-news-300d-1M-subword(#368), cc.en.300(#373)
API updates
- Added dataloader that allows multi-shard sampling (#237 #280 #285)
- Simplified DataStream, added DatasetStream, refactored and extended PrefetchingStream (#235)
- Unified BPTT batchify for dataset and stream (#246)
- Added symbolic beam search (#233)
- Added SequenceSampler (#272)
- Refactored Transform APIs (#282)
- Reorganized index of the repo and model zoo page (#357)
Fixes & Small Changes
- Fixed module name in batchify.py example (#239)
- Improved imports structure (#248)
- Added test for nmt scripts (#234)
- Speeded up batchify.Pad (#249)
- Fixed LanguageModelDataset.bptt_batchify (#243)
- Fixed weight drop and add tests (#268)
- Fixed relative links that pypi doesn't handle (#293)
- Updated notebook build logic (#309)
- Added community link (#313)
- Enabled run tests in parallel (#317)
- Enabled word embedding scripts tests (#321)