Hello!
Welcome to my Portfolio. My name is Daniel Chang and I am excited to have you here!
In this repository, you will come across several folders and sub-folders that contain projects that demonstrate the following, but not limited to the following skills: data gathering with APIs, data wrangling, data exploration, statistical analysis, data visualization, presentation skills, etc. A majority of my projects are done using the Python programming language in Jupyter Notebook. Some of the most used packages are Pandas, Numpy, Seaborn and Matplotlib.
In the following sections of this README, I will provide a short summary of my data journey and brief descriptions of the projects in each folder. Let’s get started!
I am currently interning as a Market Consultant at Stiddle where my primary job is to write content for the company’s website, email marketing campaigns, social media campaigns, etc. I had the opportunity to practice my data-related skills by gathering and analyzing from various online sources to suggest relevant topics to write and potential industries/locations to target for our social media campaigns. Now, I am taking on a more analytical role in the company where I’m currently analyzing the eCommerce market.
My data journey began in my last year at University of California, Santa Barbara, where I received my History bachelor degree. I became mesmerized with data after learning how raw data was utilized by the United States to turn the tide of the war at the Battle of Midway in favor of the Allies. Prior to this, I have taken quantitative focused courses such as statistics, linear algebra, R, etc. After exploring the use of data more in different industries such as marketing, healthcare and finance, I officially began my data education with Udacity with their Programming for Data Science with Python Nanodegree Program. The most recent topic that I’ve learned is how to build a Machine Learning pipeline and make recommendation engines from their Data Scientist Nanodegree Program.
For this project, I am mainly interested in conducting data exploration and analysis on the offensive stats and characteristics of different NBA teams based on Finals ranking which is a new column I will create that contains 4 values: Champion, Runner-Up, Knocked Out and Never Qualified. This project contains a total of 3 notebooks that describe the steps that I’ve taken; the skills that I’ve focused on in this project are data gathering via web-scraping, data cleaning and data exploration(analysis and visualization).
For this project, I've decided to analyze various financial Youtube channels that I've been following; most of the channels talk about the stock market. If you've been watching Youtube, you often hear content creators telling you to like, comment, and subscribe to help promote their videos. While this might not necessarily be the case, it certainly is worth exploring to look for certain factors that may support this claim. That is exactly what I am going to be doing in this notebook, along with identifying trends among certain Youtube channels with a particular niche or topic.
The skills that I’ve focused on in this project are data gathering via Youtube API, data cleaning and data exploration(analysis and visualization). Additionally, there is a question that I’ve asked and answered using natural language processing skills such as tokenizing sentences.
In this project, I am interested in working to understand the crime rate in London, England. My goal is to work through this notebook to understand violent crime rates and when they are likely to occur throughout the year. This dataset contains all crimes (non-violent and violent) committed between 2008 and 2016. However, the nature of the crime-violent or non-violent- is not specified in this dataset, so we will need to deal with that during the preprocessing phase. We will also need to specify which months are the ones when daylight saving is in effect.
This project contains a total of 2 notebooks. The skills that I have chosen to focus on are hypothesis testing and statistics, as well as regression data modeling.
This was done as a project for Data Science 6306 at Southern Methodist University.We want to analyze 2 datasets(totalbeer.csv and totalbreweries.csv) to gain actionable insights to present to the CEO and CFO of Budweiser. Budweiser has hired us to answer some questions. We will perform Exploratory Data Analysis to answer the questions that they are interested in, all of which are listed down below. Afterward, we want to use the KNN classifier to see if we can predict what group a beer belongs to based on its IBU and/or ABV.
In this Udacity Project, I'll be analyzing, modeling, and visualizing the datasets from Airbnb to provide a clear connection between my business questions and how the data answers them. I am interested in using the 2016-17 Seattle and Boston Airbnb datasets to answer the following questions:
- What are the important amenities of these listings? Compare the two cities.
- Is it possible to predict the price with 8 features? If yes, compare the 2 cities.
- How does the price in each city change each month? Be sure to compare the 2 cities.
- How does the total number of listings change each month? Be sure to compare the 2 cities.
Medium Article Link: https://medium.com/@mr.dcny/a-study-of-airbnb-listings-seattle-boston-ff3a69646edf
In this Udacity Project, I'll be analyzing the interactions that users have with articles on the IBM Watson Studio platform, and making recommendations to them about new articles that they might be interested in. The project contains the following tasks: - Exploratory Data Analysis: This part is for data exploration.
- Rank Based Recommendations: Here, I begin by finding the most popular articles based on the most interactions. These articles are the ones that we might recommend to new users.
- User-User Based Collaborative Filtering: In order to give better recommendations to the users of IBM's platform, I examine users that are similar in terms of the items they have interacted with. These items could then be recommended to similar users.
- Matrix Factorization: For the final step, I created a machine learning approach to building recommendations. Using the user-item interactions, I built out a matrix decomposition which helps me in predicting new articles an individual might interact with.
In this udacity project, I will create a machine learning/NLP pipeline to categorize these events and build a model to classify messages that are sent during disasters. There are 36 pre-defined categories, and examples of these categories include Aid Related, Medical Help, Search And Rescue, etc. By classifying these messages, we can allow these messages to be sent to the appropriate disaster relief agency. This project also includes a web app.
This folder consists of multiple sub-folders of projects that I’ve completed with Udacity’s Data Analyst program. There are a total of 4 folders where the primary focus is to practice the Exploratory Data Analysis(EDA) process which includes gathering and wrangling data then analyzing and visualizing them. Here is a list of the projects and a short explanation to complement them:
- Movie Database Analysis: In this project, the skills that I’ve demonstrated are: assessing and cleaning data, analyzing data and visualizing data. The goal of this project is to answer the questions that I asked at the beginning.
- Analyze Web Page A/B Test: For this project, I will be working to understand the results of an A/B test run by an e-commerce website. The goal is to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision. In this project, I’ve demonstrated my skills in Probability, A/B Testing and Logistic Regression.
- Twitter Data Wrangling & Analysis: In this project, I gathered data from the archives of a Twitter account called WeRateDogs which rates owned dogs in their tweets and adds a humorous comment with it. My primary goal for this project is to practice the data analysis process which includes gathering data using the Twitter API, downloading datasets programmatically with the requests and BeautifulSoup library, cleaning and analyzing the data, and visualizing them.
- Loans Exploratory Data Analysis: In this project, I am interested in conducting the full exploratory data analysis process on a dataset that was provided to me by Prosper Loan via Udacity. Along with assessing and cleaning the dataset, I demonstrated my skills in creating different kinds of univariate, bivariate, and multivariate visualizations to study the different variables and their relationships with one another. Afterward, I used my analysis and visuals that I’ve created to create a slideshow presentation as if I were presenting it to stakeholders.
- SQL Movie Rental Analysis: In this project, I will be querying the Sakila DVD Rental database, which holds information about a company that rents DVDs. I am doing this to gain an understanding of the customer base and to answer the questions asked at the beginning of the project. While the goal of this project is to investigate the database and create visuals answering the questions listed above, this project is also an opportunity to showcase what I've learned as part of the Nanodegree program. Some skills I would like to draw attention to are my ability to join many tables, create window functions, create Common Table Expressions (CTE) and perform calculations with the help of logical operators.