Welcome in Qualitative Data Analysis and Text Mining (Analiza danych jakościowych i Text Mining) classes repo 👋
Main branch includes NLP project containing the analysis of English Premier League tweets about top clubs (wordcloud, tokens, documents and visualizations) and classification using the following classifiers:
- Multinomial Logistic Regression
- Decision Tree
- Random Forest
- Gradient Boosting
- Bagging
Others branches content:
🔸 'lab1' Branch - regex (Regular expression operations)
🔸 'lab2' Branch - clearing text with regex cd., removing stop words, stemming and lemmatization with nltk library
🔸 'lab3' Branch - WordCloud
🔸 'lab4' Branch - tokenization and vectorization of text with scikit-klearn library, operations on numpy arrays, visualizations with matplotlib
🔸 'lab5' Branch - text classification with decision tree, random forest, SVM, AdaBoost, Bagging
🔸 'entity_matching' Branch - calculation of distance and similarity - euclidean similarity, cosine distance, cosine similarity
🔸 'project' Branch - merged with main branch