Skip to content

thuynh323/IBM-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

IBM-Machine-Learning

Details

Topics covered

  1. A Brief History of Modern AI and its Applications
  2. Retrieving Data, Exploratory Data Analysis, and Feature Engineering
  3. Inferential Statistics and Hypothesis Testing

Project requirements

...spend some time finding a data set that you are really passionate about. This can be a data set similar to the data you have available at work or data you have always wanted to analyze. For some people this will be sports data sets, while some other folks prefer to focus on data from a datathon or data for good.

Once you have selected a data set, you will produce the deliverables listed below and submit them to one of your peers for review. Treat this exercise as an opportunity to produce analysis that are ready to highlight your analytical skills for a senior audience, for example, the Chief Data Officer, or the Head of Analytics at your company. Sections required in your report:

  • Brief description of the data set and a summary of its attributes.
  • Initial plan for data exploration.
  • Actions taken for data cleaning and feature engineering.
  • Key Findings and Insights, which synthesizes the results of Exploratory Data Analysis in an insightful and actionable manner.
  • Formulating at least 3 hypothesis about this data.
  • Conducting a formal significance test for one of the hypotheses and discuss the results.
  • Suggestions for next steps in analyzing this data.
  • A paragraph that summarizes the quality of this data set and a request for additional data if needed.

Submission

Details

Topics covered

  1. Introduction to Supervised Machine Learning and Linear Regression
  2. Data Splits and Cross Validation
  3. Regression with Regularization Techniques: Ridge, LASSO, and Elastic Net

Project requirements

...spend some time finding a data set that you are really passionate about. This can be a data set similar to the data you have available at work or data you have always wanted to analyze. For some people this will be sports data sets, while some other folks prefer to focus on data from a datathon or data for good.

Once you have selected a data set, you will produce the deliverables listed below and submit them to one of your peers for review. Treat this exercise as an opportunity to produce analysis that are ready to highlight your analytical skills for a senior audience, for example, the Chief Data Officer, or the Head of Analytics at your company. Sections required in your report:

  • Main objective of the analysis that specifies whether your model will be focused on prediction or interpretation.
  • Brief description of the data set you chose and a summary of its attributes.
  • Brief summary of data exploration and actions taken for data cleaning and feature engineering.
  • Summary of training at least three linear regression models which should be variations that cover using a simple linear regression as a baseline, adding polynomial effects, and using a regularization regression. Preferably, all use the same training and test splits, or the same cross-validation method.
  • A paragraph explaining which of your regressions you recommend as a final model that best fits your needs in terms of accuracy and explainability.
  • Summary Key Findings and Insights, which walks your reader through the main drivers of your model and insights from your data derived from your linear regression model.
  • Suggestions for next steps in analyzing this data, which may include suggesting revisiting this model adding specific data features to achieve a better explanation or a better prediction.

Submission