Skip to content

mwpnava/Data-Science-Projects

Repository files navigation

Data Science Projects

Repository containing data science projects completed by me for academic and self learning purposes. Those are presented through Jupyter Notebooks and datasets (csv files).

Content:

  • Machine Learning

    • Topics Extraction from Conference Speeches The objective of this project is to propose a way to extract key insights and topics efficiently from a group of conference speeches. I implemented a unsupervised learning model using the algorithm K-means, created wordcloud visualizations to represent the most frequent and relevant words in each cluster and finally, I utilized a topic modeling algorithm called Latent Dirichlet Allocation (LDA) to discover the underlying topics within the clusters.

    • Flight Delays Predictions This project consists of two parts: flight & weather datasets analysis and, flight delays prediction. One dataset has a year’s worth of all US flight delay info retrieved from Kaggle and the other dataset has been gathered by web-scraping weather site. My team and I implemented a model to predict weather-induced airline delays using ML algorithm Random Forest and built a streamlit application to provide an user-interactive interfce.

    • Principal Components Analysis with numpy: In this project, I will apply PCA to a dataset without using any of the popular machine learning libraries such as scikit-learn and statsmodels. The goal of this document is to have a deeper understanding of the PCA fundamentals using functions just from numpy library.

    • Shopper Segmentation (Unsupervised Learning): The objective of this project is to segment shoppers from a dataset given. K-Means, Agglomerative and DBSCAN are the three different unsupervised machine learning algorithms used for the project. At the end of the notebook, you can find the evaluation of those models comparing metrics as ARS (Adjusted Rand Score), NMI (Normalized Mutual Information) and Average Score.

    • Online News Popularity Prediction (Supervised Learning): This is project which objective is to predict the popularity of articles published by Mashable website. The machine learning algorithms used for this project were: Random Forest, Support Vector Classification and KNN / K-Nearest Neighbor.

    • Predictions of Admissions to Master's Degree (Supervised Learning): Using a Linear Regression Algorithm, this project was developed to predict the chance of admission of foreign students to Master's Degree Programs in American Colleges.

Tools: Python 3, Scikit-learn, pandas, numpy, matplotlib streamlit, seaborn, nltk

  • Data Analytics, Visualization and miscellaneous

    • A/B Test Analysis - email Campaign This project is an A/B Test Analysis, I will analyze the results of an email campaign experiment, which main objective is to influence customers to make a decision. I will apply the test of means analysis to verify weather the results of the campaign are occurring by chance or because the email strategy is working as expected.

    • Women Legal Rights in the World: This is an Analytic Report of legal gender differentiation around the world. The analysis of data collected in 187 countries, from 2009 to 2018, highlights the inequity in terms of laws and regulations.

    • Creating my own Dataset of Boston Apartments Leasing (Web Scraping): The goal of this project is to create my own dataset for future analysis. Data was extracting from the RentHop site and store it into a CSV file (apartments_leasing.csv).

Tools: Python 3, pandas, matplotlib, BeautifulSoup

Author

Wendy Navarrete

About

Repository containing data science projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published