only retain pandas error calculation method

UNSW-CEEM · Oct 19, 2023 · 3c1941a · 3c1941a
1 parent 4b980eb
commit 3c1941a
Show file tree

Hide file tree

Showing 13 changed files with 30 additions and 265 deletions.
diff --git a/docs/source/_static/p5min_error_2021_ahead_time_hists.html b/docs/source/_static/p5min_error_2021_ahead_time_hists.html
diff --git a/docs/source/_static/p5min_error_2021_ahead_time_percentile.html b/docs/source/_static/p5min_error_2021_ahead_time_percentile.html
diff --git a/docs/source/_static/p5min_error_2021_tod_percentile.html b/docs/source/_static/p5min_error_2021_tod_percentile.html
diff --git a/docs/source/_static/pd_error_2021_ahead_samples.html b/docs/source/_static/pd_error_2021_ahead_samples.html
diff --git a/docs/source/_static/pd_error_2021_da_dists.html b/docs/source/_static/pd_error_2021_da_dists.html
diff --git a/docs/source/_static/pd_error_NEM_2021_ahead_time_percentile.html b/docs/source/_static/pd_error_NEM_2021_ahead_time_percentile.html
diff --git a/docs/source/_static/pd_error_NSW1_2021_aheadtime_percentile.html b/docs/source/_static/pd_error_NSW1_2021_aheadtime_percentile.html
diff --git a/docs/source/_static/pd_error_QLD1_2021_aheadtime_percentile.html b/docs/source/_static/pd_error_QLD1_2021_aheadtime_percentile.html
diff --git a/docs/source/_static/pd_error_SA1_2021_aheadtime_percentile.html b/docs/source/_static/pd_error_SA1_2021_aheadtime_percentile.html
diff --git a/docs/source/_static/pd_error_TAS1_2021_aheadtime_percentile.html b/docs/source/_static/pd_error_TAS1_2021_aheadtime_percentile.html
diff --git a/docs/source/_static/pd_error_VIC1_2021_aheadtime_percentile.html b/docs/source/_static/pd_error_VIC1_2021_aheadtime_percentile.html
diff --git a/docs/source/examples/p5min_demand_forecast_error_2021.ipynb b/docs/source/examples/p5min_demand_forecast_error_2021.ipynb
@@ -34,7 +34,6 @@
     "# standard libraries\n",
     "from datetime import datetime, timedelta\n",
     "from pathlib import Path\n",
-    "import multiprocessing as mp\n",
     "\n",
     "# NEM data libraries\n",
     "# NEMOSIS for actual demand data\n",
@@ -187,131 +186,16 @@
     "The {term}`actual run time` of 5MPD is approximately 5 minutes before the nominal {term}`run time`. We will adjust for this in this when calculating forecast ahead times. See the note in {ref}`this section <quick_start:core concepts and information for users>`.\n",
     "```\n",
     "\n",
-    "We provide two methods below:\n",
-    "\n",
-    "1. A **simpler** implementation that uses handy functionalities from both `xarray` and `pandas`. This implementation is a quick and simple way to compute demand forecast error for a couple of forecasted intervals. Though we provide a way to compute error over a longer period (e.g. a year), you should use the next method to compute error unless RAM/memory is a limiting factor (though it should be noted that whilst using `multiprocessing` with this method will speed things up, it will consume more memory).\n",
-    "\n",
-    "2. A **vectorised**, pure-`pandas` implementation. This implementation requires more lines of `pandas` code, but is much faster and preferable to the first implementation if you are computing error across a longer period (e.g. a year). However, as data for the entire period is loaded into memory, adapt the length of the period you select to your machine specifications (e.g. a year's worth of forecast data consumed ~15GB on the test machine)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### `xarray` + `pandas` implementation (simpler code)\n",
-    "\n",
-    "The code below uses functionalities offered by `NEMOSIS`, `NEMSEER` and `xarray` to simplify coding effort. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def calculate_p5min_demand_forecast_error_simpler(forecasted_time: str) -> pd.DataFrame:\n",
-    "    \"\"\"\n",
-    "    Calculates P5MIN demand forecast error (Actual - Forecast) for all forecasts\n",
-    "    that are run for a given forecasted_time.\n",
-    "\n",
-    "    Args:\n",
-    "        forecasted_time: Datetime string in the form YYYY/mm/dd HH:MM:SS\n",
-    "    Returns:\n",
-    "        pandas DataFrame with forecast error in `TOTALDEMAND` columns, the ahead time\n",
-    "        of the forecast run in `ahead_time`, and the forecasted time in\n",
-    "        `forecasted_time`.\n",
-    "    \"\"\"\n",
-    "    # necessary for datetime indexing with pandas and xarray\n",
-    "    time = str(forecasted_time).replace(\"-\", \"/\")\n",
-    "    # get forecast data for forecasted_time\n",
-    "    run_start, run_end = generate_runtimes(time, time, \"P5MIN\")\n",
-    "    nemseer_data = compile_data(\n",
-    "        run_start,\n",
-    "        run_end,\n",
-    "        time,\n",
-    "        time,\n",
-    "        \"P5MIN\",\n",
-    "        \"REGIONSOLUTION\",\n",
-    "        \"nemseer_cache/\",\n",
-    "        data_format=\"xr\",\n",
-    "    )\n",
-    "    demand_forecasts = nemseer_data[\"REGIONSOLUTION\"][\"TOTALDEMAND\"]\n",
-    "    # get actual demand data for forecasted_time\n",
-    "    # nemosis start time must precede end of interval of interest by 5 minutes\n",
-    "    nemosis_start = (\n",
-    "        datetime.strptime(time, \"%Y/%m/%d %H:%M:%S\") - timedelta(minutes=5)\n",
-    "    ).strftime(\"%Y/%m/%d %H:%M:%S\")\n",
-    "    # compile data using nemosis, using cached parquet and filtering out interventions\n",
-    "    nemosis_data = nemosis.dynamic_data_compiler(\n",
-    "        nemosis_start,\n",
-    "        time,\n",
-    "        \"DISPATCHREGIONSUM\",\n",
-    "        nemosis_cache,\n",
-    "        filter_cols=[\"INTERVENTION\"],\n",
-    "        filter_values=([0],),\n",
-    "        fformat=\"parquet\",\n",
-    "    )\n",
-    "    # sum actual demand across regions\n",
-    "    actual_demand = nemosis_data.groupby(\"SETTLEMENTDATE\")[\"TOTALDEMAND\"].sum()[time]\n",
-    "    # sum forecast demand across regions\n",
-    "    query_forecasts = demand_forecasts.sum(dim=\"REGIONID\").sel(forecasted_time=time)\n",
-    "    # calculate error and return as a pandas DataFrame\n",
-    "    error = (actual_demand - query_forecasts).to_dataframe()\n",
-    "    # calculate number of minutes ahead, but adjust for nominal vs actual run time of P5MIN\n",
-    "    error[\"ahead_time\"] = error[\"forecasted_time\"] - (\n",
-    "        error.index - timedelta(minutes=5)\n",
-    "    )\n",
-    "    error = error.set_index(\"forecasted_time\")\n",
-    "    return error"
+    "As data for the entire period is loaded into memory, adapt the length of the period you select to your machine specifications (e.g. a year's worth of forecast data consumed ~15GB on the test machine)."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Computing error across 2021\n",
-    "\n",
-    "```{caution}\n",
-    "While this code demonstrates how you could use the `pandas` + `xarray` implementation to compute error across a year, we only provide this as an example. We recommend you use the vectorised implementation if your system memory permits.\n",
-    "```\n",
-    "\n",
-    "Because we haven't optimised our code, it will take a while to calculate forecast error across a year.\n",
+    "### Forecast error calculation functions\n",
     "\n",
-    "To speed up computation, we will use Python's [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) module. In this example, we use 10 simultaneous processes.\n",
-    "\n",
-    "`tqdm` provides us with a progress bar that shows us how many iterations are being completed in a second, as well as the progress over all intervals in the year or interest.\n",
-    "\n",
-    "Results DataFrames are added to a list as processes finish computation. Once they've finished, we can then concatenate these DataFrames to get a forecast error DataFrame"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "remove-output"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "times = pd.date_range(analysis_start, analysis_end, freq=\"5T\")\n",
-    "with mp.Pool(10) as p:\n",
-    "    results = list(\n",
-    "        tqdm(\n",
-    "            p.imap(calculate_p5min_demand_forecast_error_simpler, times),\n",
-    "            total=len(times),\n",
-    "        )\n",
-    "    )\n",
-    "forecast_error = pd.concat(results, axis=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### pure-`pandas` implementation (vectorised code)\n",
-    "\n",
-    "The code below uses functionalities offered by `NEMOSIS`, `NEMSEER` and `pandas` to quickly calculate demand forecast error across a longer period."
+    "The code below uses functionalities offered by `NEMOSIS`, `NEMSEER` and `pandas` to calculate demand forecast error."
    ]
   },
   {
@@ -693,7 +577,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -726,7 +610,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": null,
    "metadata": {
     "tags": [
      "remove-cell"