Most Common Words

I find languages quite interesting, and I have spent some time learning foreign languages. When starting a new language, I have found that an effective way to increase your vocabulary in a new language is to study the top x words in the target language. For example, you may start with the top 100 words, then increase to 300 words, then 500 words, and so on.

There are websites that provide this information, but I thought it would be fun to collect some data myself. This project uses Twitter API with the Python library, Tweepy, to collect tweets from selected Twitter accounts and extract the most common words from those tweets.

Process

The process is split into three separate iPython notebooks:

Tweet Collection
- We collect the tweets using the Tweepy library
Data Cleaning
- Tweets contain characters, symbols, and emojis that we do not want for the analysis
  - For example, 'RT', '#', '@', links, etc.
- In this notebook, we remove all the unnecessary characters from the data
Common Words
- We create a Word Cloud for each language
- We also create a DataFrame of the document-term matrix
- To get the most common words, we sort the data by the word frequency in descending order

Streamlit Web App

After we collect the data in the iPython notebooks, we create a web application using the Streamlit library.

In the web app:

The user chooses a language and the number of words
A Web Cloud and a table with their selection of words are displayed

Improvements

I'm interested in the outout of words for some of the languages. I hypothesis that the accounts I collected tweets from are not a diverse representation of the languages. I am working on selecting different accounts for the data collection to test my hypothesis.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
wordclouds		wordclouds
1. Tweet_Collection.ipynb		1. Tweet_Collection.ipynb
2. Data_Cleaning.ipynb		2. Data_Cleaning.ipynb
3. Common_Words.ipynb		3. Common_Words.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Most Common Words

Process

Streamlit Web App

Improvements

About

Uh oh!

Releases

Packages

Languages

johng034/language-tweets

Folders and files

Latest commit

History

Repository files navigation

Most Common Words

Process

Streamlit Web App

Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages