NGram Word Predictor

Quick Links:

Description:

This web application takes your text input and predicts the next most likely word. The model is created using ngrams. Text data from twitter, blogs, and news were extracted and combined into a corpus. After cleaning it, tokenization was performed in order to extract the most frequent ngrams, where n in this case is 1 to 4.

The model works by first counting how many words are typed. If there are at least 3 words, then it will search for the next likely word based on the quadgram data. If there's less than 3 words, or if the preceeding words were unfamiliar to the model, it then uses the trigram data. The same process goes for bigram. If the model does not understand the word/s completely, then it will return one of the most likely word in random.

Some Results from Analysis

Based on the analysis performed on the dataset, we obtain the following results:

As seen from the wordcloud, larger words such as said, will, and one appear more frequently in the dataset, while words like night, team, and show appear less frequently.

For the bigrams, phrases such as 'last year', 'New York', and 'right now' were among those most used in the dataset.

Finally, for the trigrams, the top 3 that are most frequently present include 'New York City', 'President Barack Obama', and 'let us know'.

Web App in Action

Some screenshots of the n-gram word predictor in action can be seen below.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Images		Images
BigramFinal.RDS		BigramFinal.RDS
InitialAnalysis.R		InitialAnalysis.R
Milestone_report.Rmd		Milestone_report.Rmd
Predictor.R		Predictor.R
QuadgramFinal.RDS		QuadgramFinal.RDS
README.md		README.md
TrigramFinal.RDS		TrigramFinal.RDS
UnigramFinal.RDS		UnigramFinal.RDS
bigram.RDS		bigram.RDS
bigram_with_SW.RDS		bigram_with_SW.RDS
data_summary.Rda		data_summary.Rda
final_sampleCorpus.RDS		final_sampleCorpus.RDS
quadgram.RDS		quadgram.RDS
trigram.RDS		trigram.RDS
trigram_with_SW.RDS		trigram_with_SW.RDS
unigram.RDS		unigram.RDS
unigram_with_SW.RDS		unigram_with_SW.RDS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NGram Word Predictor

Quick Links:

Description:

Some Results from Analysis

Web App in Action

About

Releases

Packages

Languages

Gianatmaja/NGram-Word-Predictor

Folders and files

Latest commit

History

Repository files navigation

NGram Word Predictor

Quick Links:

Description:

Some Results from Analysis

Web App in Action

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages