Skip to content

Commit

Permalink
Lowered knn default radius, removed interpolation from seasonal graphs.
Browse files Browse the repository at this point in the history
  • Loading branch information
CyrusVorwald2 committed Jun 30, 2021
1 parent 896a4c0 commit 5b63f86
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 17 deletions.
2 changes: 1 addition & 1 deletion data/daily_and_seasonal_weighted_temperature.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/missing_data.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/monthly_weighted_temperature.html

Large diffs are not rendered by default.

28 changes: 16 additions & 12 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
import math
from typing import List, Tuple

import plotly.graph_objects as go

import pandas as pd

from definitions import LATITUDES_BY_AIRPORT, LONGITUDES_BY_AIRPORT, POPULATION_DATA_DF, \
Expand Down Expand Up @@ -31,7 +33,7 @@ def get_nearest_city_from_airport(airport: str) -> Tuple[str, str]:
return city, state


def knn(city: str, state: str, k: int = 3, radius: int = 1) -> List[Tuple[str, str]]:
def knn(city: str, state: str, k: int = 3, radius: int = .22) -> List[Tuple[str, str]]:
"""
Gets the nearest k cities in POPULATION_DATA_DF from a city within the bounds of radius. Skips any cities in
VISITED so that they are not double counted.
Expand Down Expand Up @@ -148,17 +150,19 @@ def main():
'datetime64[ns, UTC]')
population_weighted_min_temperatures_ts.index = population_weighted_min_temperatures_ts.index.astype(
'datetime64[ns, UTC]')
p_21 = pd.concat([population_weighted_mean_temperatures_ts,
population_weighted_mean_temperatures_ts.resample('3M').mean(),
population_weighted_max_temperatures_ts.resample('3M').max(),
population_weighted_min_temperatures_ts.resample('3M').min()], axis=1)

p_21.columns = ['Temperature', 'Seasonal Average', 'Seasonal Max', 'Seasonal Min']
p_21['Seasonal Average'].interpolate(method='time', inplace=True)
p_21['Seasonal Max'].interpolate(method='time', inplace=True)
p_21['Seasonal Min'].interpolate(method='time', inplace=True)

fig = p_21.plot(title="US Population Weighted Daily and Seasonal Temperature")
seasonal_mean = population_weighted_mean_temperatures_ts.resample('3M').mean()
seasonal_max = population_weighted_max_temperatures_ts.resample('3M').max()
seasonal_min = population_weighted_min_temperatures_ts.resample('3M').min()

data = [go.Scatter(x=population_weighted_mean_temperatures_ts.index,
y=population_weighted_mean_temperatures_ts.values, name='Temperature'),
go.Scatter(x=seasonal_mean.index,
y=seasonal_mean.values, name='Seasonal Average'),
go.Scatter(x=seasonal_max.index,
y=seasonal_max.values, name='Seasonal Max'),
go.Scatter(x=seasonal_min.index,
y=seasonal_min.values, name='Seasonal Min')]
fig = go.Figure(data=data)
fig.show()
fig.write_html(DAILY_SEASONAL_WEIGHTED_TEMPERATURE_GRAPH_PATH)

Expand Down
7 changes: 5 additions & 2 deletions readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@ Washington and Wash DC/Dulles both refer to Washington DC.
We are assuming the population of each city is a good weight to represent the average temperature in the US.
For this exercise, I loop over each airport, interpolate missing data, find the closest city to the airport,
take a simple coefficient of that city's population divided by the total population, and then do the same for k=3
neighboring cities within approximately 69 miles. This is pretty arbitrary and only represents about 50% of the total
US population. To account for more of the population, the temperature data would need to cover more land.
neighboring cities within approximately 15 miles. This is pretty arbitrary and only represents about 40% of the total
US population. To account for more of the population, the temperature data would need to cover more land. With the
given dataset, there is a tradeoff between how much of the US population is represented, and accuracy of the representation.
Weather is highly variable and can differ greatly over a small area, so the further away the city is from a station,
the less accurate the representation will be.
In reality, population weighted daily temperature timeseries is calculated in a more regional way
(see https://www.eia.gov/outlooks/steo/special/pdf/2012_sp_04.pdf).

Expand Down

0 comments on commit 5b63f86

Please sign in to comment.