Skip to content

This is a folder of these data science project practice

Notifications You must be signed in to change notification settings

Hongyan-Wang/DataScienceProject

Repository files navigation

DataScienceProject

This is a folder of data science project practice

01 - yfinance_and_webscraping

Analysing Historical Stock/ Revenue Data and Building a Dashboard

This project is from the IBM Data Science Professional Certifiticate - Python Project for Data Science

1. Extracting data using a Python Library

  • Using yfinance to Extract Stock Info
  • Using yfinance to Extract Historical Share Price Data
  • Using yfinance to Extract Historical Dividends Data

2. Extracting data using Beautiful soupe

  • Downloading the Webpage Using Requests Library
  • Parsing Webpage HTML Using BeautifulSoup
  • Extracting Data and Building DataFrame

Using the yfinance Library to extract stock data

Using the Ticker module we can create an object that will allow us to access functions to extract data.

AAPL - Apple Inc

GOOG - Google

MSFT - Microsoft

AMZN - Amazon.com, Inc.

3. Analysis

3.1.Check the price of stock overtime

3.2.Analysis of Moving Average of various stocks

3.3.Daily Return of stock on average

3.4.Correlation between different stock's closing price

3.5.How much value is at risk by investing in a particular stock

02 - Prediction_of_house_price

The data set contains house sale price for King County, Seattle. It includes homes between May 2014 and May 2015.

The original data comes from Kaggle: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-wwwcourseraorg-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDA0101ENSkillsNetwork20235326-2022-01-01

This project is conducted based on the Coursera course - Data analysis with Python by IBM.

Conclusions

  1. Three predicting technique: Polynomial Regression model, Ridge Regression model and Random forest model, are studied and compored.

  2. From the analysis, it was found that the Random Forest Regression model performed better than the Ridge Regression model and Polynomial Regression model.

  3. By using the Folium lib, the location of the house corresponding to the price is ploted. And it was found that location is a very important factor in determining the price of the house.

03 - Fraud Dection in Online Transactions

This is the Kaggle project in IEEE Computational Intelligence Society (IEEE-CIS) Fraud Dection: Addison Howard, Bernadette Bouchon-Meunier, IEEE CIS, inversion, John Lei, Lynn@Vesta, Marcus2010, Prof. Hussein Abbass. (2019). IEEE-CIS Fraud Detection. Kaggle. https://kaggle.com/competitions/ieee-fraud-detection Methodology

About

This is a folder of these data science project practice

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published