Skip to content

Exploratory data analysis of white wines data set to understand its chemical propertie using ggplot2 library of R

Notifications You must be signed in to change notification settings

sanjeevai/Explore_and_Summarise_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

White Wines Dataset Exploration

Project Overview

This is the second project of term 2 of Udacity's Data Analyst Nanodegree program. In this project, I have used R and applied exploratory data analysis techniques to explore relationships in one variable to multiple variables and to explore the data set for distributions, outliers, and anomalies.

What do you need to install?

In order to complete the project, you will need to install R. You can download and install R from the Comprehensive R Archive Network (CRAN).

After installing R, you will need to download and install R Studio. Choose the appropriate installation for your operating system.

Why this Project?

Exploratory Data Analysis (EDA) is the numerical and graphical examination of data characteristics and relationships before formal, rigorous statistical analyses are applied.

EDA can lead to insights, which may uncover to other questions, and eventually predictive models. It also is an important “line of defense” against bad data and is an opportunity to notice that your assumptions or intuitions about a data set are violated.

What have I learned?

After completing the project, I have:

  • Understood the distribution of a variable and to check for anomalies and outliers

  • Learned how to quantify and visualize individual variables within a data set by using appropriate plots such as scatter plots, histograms, bar charts, and box plots

  • Explored variables to identify the most important variables and relationships within a data set before building predictive models; calculate correlations, and investigate conditional means

  • Learned powerful methods and visualizations for examining relationships among multiple variables, such as reshaping data frames and using aesthetics like color and shape to uncover more information

Data

I have done exploratory data analysis for white wines quality dataset. This tidy data set contains 4,898 white wines with 11 variables on quantifying the chemical properties of each wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).

Guiding Question

Which chemical properties influence the quality of white wines?

About

Exploratory data analysis of white wines data set to understand its chemical propertie using ggplot2 library of R

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages