v0.2.0
Features
GluonNLP provides its users with easy access to
- State of the art models
- Pre-trained word embeddings
- Many public datasets for different tasks
- Examples friendly to users that are new to the task
- Reproducible training scripts
Models
Gluon NLP Toolkit supplies model definitions for common NLP tasks. These can be
adapted for the users requirements or taken as blueprint for new developments.
All of these are implemented using Gluon Blocks
allowing easy reuse as plug-and-play neural network building blocks.
- Language Models
- Attention Cells
- Beam Search
Data
Gluon NLP Toolkit provides tools for building efficient data pipelines for NLP
tasks by defining a Dataset class interface and utilities for transforming them.
Several datasets are included by default and will be automatically downloaded
when used.
- Language modeling with WikiText
- WikiText is a popular language modeling dataset from Salesforce. It is a
collection of over 100 million tokens extracted from the set of verified
Good and Featured articles on Wikipedia.
- WikiText is a popular language modeling dataset from Salesforce. It is a
- Sentiment Analysis with IMDB
- IMDB: IMDB is a popular dataset for binary sentiment classification. It
provides a set of 25,000 highly polar movie reviews for training, 25,000 for
testing, and additional unlabeled data.
- IMDB: IMDB is a popular dataset for binary sentiment classification. It
- CoNLL datasets
- These datasets include data for the shared tasks, such as part-of-speech
(POS) tagging, chunking, named entity recognition (NER), semantic role
labeling (SRL), etc. - We provide built in support for CoNLL 2000 – 2002, 2004, as well as the
Universal Dependencies dataset which is used in the 2017 and 2018
competitions.
- These datasets include data for the shared tasks, such as part-of-speech
- Word embedding evaluation datasets
- There are a number of commonly used datasets for intrinsic evaluation for
word embeddings. We provide commonly used datasets for the similarity and
analogy evaluation tasks.
- There are a number of commonly used datasets for intrinsic evaluation for
Gluon NLP further ships with common datasets data transformation functions,
dataset samplers to determine how to iterate through datasets as well as
functions to generate data batches.
Other features
Examples and scripts
- Word Embedding Evaluation
- Beam Search Generator
- Word language modeling
- Sentiment Analysis through Fine-tuning, w/ Bucketing
- Machine Translation