Skip to content

Commit

Permalink
only retain pandas error calculation method
Browse files Browse the repository at this point in the history
  • Loading branch information
prakaa committed Oct 19, 2023
1 parent 4b980eb commit 3c1941a
Show file tree
Hide file tree
Showing 13 changed files with 30 additions and 265 deletions.
2 changes: 1 addition & 1 deletion docs/source/_static/p5min_error_2021_ahead_time_hists.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/source/_static/p5min_error_2021_tod_percentile.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/source/_static/pd_error_2021_ahead_samples.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/source/_static/pd_error_2021_da_dists.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

126 changes: 5 additions & 121 deletions docs/source/examples/p5min_demand_forecast_error_2021.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@
"# standard libraries\n",
"from datetime import datetime, timedelta\n",
"from pathlib import Path\n",
"import multiprocessing as mp\n",
"\n",
"# NEM data libraries\n",
"# NEMOSIS for actual demand data\n",
Expand Down Expand Up @@ -187,131 +186,16 @@
"The {term}`actual run time` of 5MPD is approximately 5 minutes before the nominal {term}`run time`. We will adjust for this in this when calculating forecast ahead times. See the note in {ref}`this section <quick_start:core concepts and information for users>`.\n",
"```\n",
"\n",
"We provide two methods below:\n",
"\n",
"1. A **simpler** implementation that uses handy functionalities from both `xarray` and `pandas`. This implementation is a quick and simple way to compute demand forecast error for a couple of forecasted intervals. Though we provide a way to compute error over a longer period (e.g. a year), you should use the next method to compute error unless RAM/memory is a limiting factor (though it should be noted that whilst using `multiprocessing` with this method will speed things up, it will consume more memory).\n",
"\n",
"2. A **vectorised**, pure-`pandas` implementation. This implementation requires more lines of `pandas` code, but is much faster and preferable to the first implementation if you are computing error across a longer period (e.g. a year). However, as data for the entire period is loaded into memory, adapt the length of the period you select to your machine specifications (e.g. a year's worth of forecast data consumed ~15GB on the test machine)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `xarray` + `pandas` implementation (simpler code)\n",
"\n",
"The code below uses functionalities offered by `NEMOSIS`, `NEMSEER` and `xarray` to simplify coding effort. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def calculate_p5min_demand_forecast_error_simpler(forecasted_time: str) -> pd.DataFrame:\n",
" \"\"\"\n",
" Calculates P5MIN demand forecast error (Actual - Forecast) for all forecasts\n",
" that are run for a given forecasted_time.\n",
"\n",
" Args:\n",
" forecasted_time: Datetime string in the form YYYY/mm/dd HH:MM:SS\n",
" Returns:\n",
" pandas DataFrame with forecast error in `TOTALDEMAND` columns, the ahead time\n",
" of the forecast run in `ahead_time`, and the forecasted time in\n",
" `forecasted_time`.\n",
" \"\"\"\n",
" # necessary for datetime indexing with pandas and xarray\n",
" time = str(forecasted_time).replace(\"-\", \"/\")\n",
" # get forecast data for forecasted_time\n",
" run_start, run_end = generate_runtimes(time, time, \"P5MIN\")\n",
" nemseer_data = compile_data(\n",
" run_start,\n",
" run_end,\n",
" time,\n",
" time,\n",
" \"P5MIN\",\n",
" \"REGIONSOLUTION\",\n",
" \"nemseer_cache/\",\n",
" data_format=\"xr\",\n",
" )\n",
" demand_forecasts = nemseer_data[\"REGIONSOLUTION\"][\"TOTALDEMAND\"]\n",
" # get actual demand data for forecasted_time\n",
" # nemosis start time must precede end of interval of interest by 5 minutes\n",
" nemosis_start = (\n",
" datetime.strptime(time, \"%Y/%m/%d %H:%M:%S\") - timedelta(minutes=5)\n",
" ).strftime(\"%Y/%m/%d %H:%M:%S\")\n",
" # compile data using nemosis, using cached parquet and filtering out interventions\n",
" nemosis_data = nemosis.dynamic_data_compiler(\n",
" nemosis_start,\n",
" time,\n",
" \"DISPATCHREGIONSUM\",\n",
" nemosis_cache,\n",
" filter_cols=[\"INTERVENTION\"],\n",
" filter_values=([0],),\n",
" fformat=\"parquet\",\n",
" )\n",
" # sum actual demand across regions\n",
" actual_demand = nemosis_data.groupby(\"SETTLEMENTDATE\")[\"TOTALDEMAND\"].sum()[time]\n",
" # sum forecast demand across regions\n",
" query_forecasts = demand_forecasts.sum(dim=\"REGIONID\").sel(forecasted_time=time)\n",
" # calculate error and return as a pandas DataFrame\n",
" error = (actual_demand - query_forecasts).to_dataframe()\n",
" # calculate number of minutes ahead, but adjust for nominal vs actual run time of P5MIN\n",
" error[\"ahead_time\"] = error[\"forecasted_time\"] - (\n",
" error.index - timedelta(minutes=5)\n",
" )\n",
" error = error.set_index(\"forecasted_time\")\n",
" return error"
"As data for the entire period is loaded into memory, adapt the length of the period you select to your machine specifications (e.g. a year's worth of forecast data consumed ~15GB on the test machine)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Computing error across 2021\n",
"\n",
"```{caution}\n",
"While this code demonstrates how you could use the `pandas` + `xarray` implementation to compute error across a year, we only provide this as an example. We recommend you use the vectorised implementation if your system memory permits.\n",
"```\n",
"\n",
"Because we haven't optimised our code, it will take a while to calculate forecast error across a year.\n",
"### Forecast error calculation functions\n",
"\n",
"To speed up computation, we will use Python's [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) module. In this example, we use 10 simultaneous processes.\n",
"\n",
"`tqdm` provides us with a progress bar that shows us how many iterations are being completed in a second, as well as the progress over all intervals in the year or interest.\n",
"\n",
"Results DataFrames are added to a list as processes finish computation. Once they've finished, we can then concatenate these DataFrames to get a forecast error DataFrame"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-output"
]
},
"outputs": [],
"source": [
"times = pd.date_range(analysis_start, analysis_end, freq=\"5T\")\n",
"with mp.Pool(10) as p:\n",
" results = list(\n",
" tqdm(\n",
" p.imap(calculate_p5min_demand_forecast_error_simpler, times),\n",
" total=len(times),\n",
" )\n",
" )\n",
"forecast_error = pd.concat(results, axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### pure-`pandas` implementation (vectorised code)\n",
"\n",
"The code below uses functionalities offered by `NEMOSIS`, `NEMSEER` and `pandas` to quickly calculate demand forecast error across a longer period."
"The code below uses functionalities offered by `NEMOSIS`, `NEMSEER` and `pandas` to calculate demand forecast error."
]
},
{
Expand Down Expand Up @@ -693,7 +577,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -726,7 +610,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
Expand Down
Loading

0 comments on commit 3c1941a

Please sign in to comment.