Predicting English quote tags from Franco-Arabic or English Language.
- Start with scraping all quotes in Goodreads that are
82460 quotes
with27 label
, that each label have2945 quote
. - Makes all preprocessing pipeline for cleaning data.
- Makes some of EDA 'Exploratory Data Analysis' for each words appear with all tags and alos word cloud for visualization, feature engineering for knowing lenght fo each quote and number of words in each one.
- Showing most frequent n-grams "one, two" words appear in each tag.
- Makes frequent tags which is appear in data, and customize the tags by the top 20 tags appear.
- Modeling as a ML models for multi class classification and also DL model by RoBERTa.
- Shared the data on kaggle
- Makes some notebooks like:
- EDA | Feature Engineering For Multi-Class.
- Multi-label tags classification
- Modling by pretrained model
RoBERTa
formulti-class classification
porblem to predicting a quote islove
ormotivation
orwisdom
.
Check out the YouTube videos showing whole project here
The deployed web app is live at here