Skip to content

Qualitative Data Analysis and Text Mining in Python containing NLP project and others

Notifications You must be signed in to change notification settings

pawelp0499/text-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qualitative Data Analysis and Text Mining Classes

Welcome in Qualitative Data Analysis and Text Mining (Analiza danych jakościowych i Text Mining) classes repo 👋

Main branch includes NLP project containing the analysis of English Premier League tweets about top clubs (wordcloud, tokens, documents and visualizations) and classification using the following classifiers:

  • Multinomial Logistic Regression
  • Decision Tree
  • Random Forest
  • Gradient Boosting
  • MLP
  • Bagging

Others branches content:

🔸 'lab1' Branch - regex (Regular expression operations)

🔸 'lab2' Branch - clearing text with regex cd., removing stop words, stemming and lemmatization with nltk library

🔸 'lab3' Branch - WordCloud

🔸 'lab4' Branch - tokenization and vectorization of text with scikit-klearn library, operations on numpy arrays, visualizations with matplotlib

🔸 'lab5' Branch - text classification with decision tree, random forest, SVM, AdaBoost, Bagging

🔸 'entity_matching' Branch - calculation of distance and similarity - euclidean similarity, cosine distance, cosine similarity

🔸 'project' Branch - merged with main branch

About

Qualitative Data Analysis and Text Mining in Python containing NLP project and others

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published