Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Practice 21 #60

Merged
merged 3 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 30 additions & 46 deletions Practices/Practice21_Basic_Stats_I_Averages.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For this practice, let's use the Boston dataset."
"# Practice: Basic Statistics I: Averages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this practice, let's use the California dataset."
]
},
{
Expand All @@ -27,8 +34,8 @@
},
"outputs": [],
"source": [
"# Import the load_boston method \n",
"from sklearn.datasets import load_boston"
"# Import the fetch_california_housing method to load the California data later on\n",
"from sklearn.datasets import fetch_california_housing"
]
},
{
Expand All @@ -39,7 +46,7 @@
},
"outputs": [],
"source": [
"# Import pandas, so that we can work with the data frame version of the Boston data\n",
"# Import pandas, so that we can work with the data frame version of the California data\n",
"import pandas as pd"
]
},
Expand All @@ -51,8 +58,8 @@
},
"outputs": [],
"source": [
"# Load the Boston data\n",
"boston = load_boston()"
"# Load the California data\n",
"california = fetch_california_housing()"
]
},
{
Expand All @@ -61,21 +68,8 @@
"metadata": {},
"outputs": [],
"source": [
"# This will provide the characteristics for the Boston dataset\n",
"print(boston.DESCR)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Here, I'm including the prices of Boston's houses, which is boston['target'], as a column with the other \n",
"# features in the Boston dataset.\n",
"boston_data = np.concatenate((boston['data'], pd.DataFrame(boston['target'])), axis = 1)"
"# This will provide the characteristics for the California dataset\n",
"print(california.DESCR)"
]
},
{
Expand All @@ -86,9 +80,12 @@
},
"outputs": [],
"source": [
"# Convert the Boston data to a data frame format, so that it's easier to view and process\n",
"boston_df = pd.DataFrame(boston_data, columns = np.concatenate((boston['feature_names'], 'MEDV'), axis = None))\n",
"boston_df"
"# Convert the housing object to a data frame format, so that it's easier to view and process\n",
"california_df = pd.DataFrame(california['data'], columns = california['feature_names'])\n",
"# Here, I'm including the prices of California's houses, which is california['target'], \n",
"# as a column with the other features in the California dataset.\n",
"california_df['HouseValue'] = california['target']\n",
"california_df"
]
},
{
Expand Down Expand Up @@ -123,9 +120,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
"source": []
},
{
"cell_type": "markdown",
Expand All @@ -138,7 +133,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We will determine the average price for houses along the Charles River and that for houses NOT along the river."
"We will determine the average price for houses less than 20 years old and that for houses 20 years old or more."
]
},
{
Expand All @@ -149,22 +144,20 @@
},
"outputs": [],
"source": [
"# Use the query method to define a subset of boston_df that only include houses are along the river (CHAS = 1). "
"# Use the query method to define a subset of california_df that only include houses less than 20 years old. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What do you notice about the CHAS column? "
"What do you notice about the HouseAge column? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
"source": []
},
{
"cell_type": "code",
Expand All @@ -174,14 +167,14 @@
},
"outputs": [],
"source": [
"# Now determine the average price for these houses. 'MEDV' is the column name for the prices. "
"# Now determine the average price for these houses. 'HouseValue' is the column name for the prices. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now try determining the average for houses NOT along the River."
"Now try determining the average for houses 20 years or older."
]
},
{
Expand All @@ -192,7 +185,7 @@
},
"outputs": [],
"source": [
"# Determine the average price for houses that are NOT along the Charles River (when CHAS = 0). "
"# Determine the average price for houses that are 20 years or older. \n"
]
},
{
Expand All @@ -201,15 +194,6 @@
"source": [
"Good work! You're becoming an expert in subsetting and determining averages on subsetted data. This will be integral for your capstone projects and future careers as data scientists! "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -228,7 +212,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.1.0"
}
},
"nbformat": 4,
Expand Down
Loading