Update documentation

RTIInternational · Nov 15, 2024 · 46451cd · 46451cd
commit 46451cd
Show file tree

Hide file tree

Showing 280 changed files with 98,660 additions and 0 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 1f703edd5345e855b20148247af3983d
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/.nojekyll b/.nojekyll
diff --git a/_downloads/06afd3ada6c423a261dd6c86e5478975/grouping_and_filtering.ipynb b/_downloads/06afd3ada6c423a261dd6c86e5478975/grouping_and_filtering.ipynb
@@ -0,0 +1,336 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Grouping and Filtering"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once the data has been joined into a single table, we can start to group and filter the data based on the table attributes,\n",
+    "and calculate metrics for specific subsets of the data.  This is the explorative power of TEEHR, and allows us to\n",
+    "better understand model performance. For example, if the joined table contained several model simulations (\"configurations\")\n",
+    "we could group the ``configuration_name`` field to calculate performance metrics for each model configuration.\n",
+    "\n",
+    "We could then include filters to further narrow the population subset such as only considering first order stream locations or\n",
+    "locations below a certain mean slope value. This allows us to gain more insight into the model performance through specific\n",
+    "quantitative analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The grouping and filtering capabilities in TEEHR provide the ability to explore models across\n",
+    "different subsets of the data, allowing us to better understand where and why the model performs well or poorly."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll look at an example to help illustrate the grouping and filtering concepts."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://github.com/RTIInternational/teehr/blob/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_table.png?raw=true\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Consider this joined timeseries table containing:\n",
+    "\n",
+    "* 2 USGS locations\n",
+    "* 3 Model configurations\n",
+    "* 4 Daily timesteps spanning two months\n",
+    "* 1 Location attribute (q95_cms)\n",
+    "* 1 User-defined attribute (month)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "When calculating metrics in TEEHR, we can use the data in this table to calculate metrics over specific subsets or\n",
+    "populations of the data. For example, we could calculate the relative bias for each model configuration for each month."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Grouping\n",
+    "--------\n",
+    "\n",
+    "Let's use this table of joined timeseries values to demonstrate how grouping selected fields affects the results.\n",
+    "\n",
+    "First, we'll calculate the relative bias for each model configuration at each location:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's use this table of joined timeseries values to demonstrate how grouping selected fields affects the results.\n",
+    "\n",
+    "First, we'll calculate the relative bias for each model configuration at each location:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://raw.githubusercontent.com/RTIInternational/teehr/refs/heads/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_1.png\" width=850 height=500>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can demonstrate how this calculation is performed in TEEHR using sample data. First, we'll set up a local directory that will contain our Evaluation, then we'll clone a subset of an existing Evaluation from s3."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "import shutil\n",
+    "\n",
+    "import teehr\n",
+    "\n",
+    "# Define the directory where the Evaluation will be created\n",
+    "test_eval_dir = Path(Path().home(), \"temp\", \"grouping_tutorial\")\n",
+    "shutil.rmtree(test_eval_dir, ignore_errors=True)\n",
+    "\n",
+    "# Create an Evaluation object and create the directory\n",
+    "ev = teehr.Evaluation(dir_path=test_eval_dir, create_dir=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# List the evaluations in the S3 bucket\n",
+    "ev.list_s3_evaluations()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "ev.clone_from_s3(\n",
+    "    evaluation_name=\"p1_camels_daily_streamflow\",\n",
+    "    primary_location_ids=[\"usgs-01013500\", \"usgs-01022500\"],\n",
+    "    start_date=\"1990-10-30 00:00\",\n",
+    "    end_date=\"1990-11-02 23:00\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "from teehr import Metrics as m\n",
+    "\n",
+    "metrics_df = ev.metrics.query(\n",
+    "    group_by=[\"primary_location_id\", \"configuration_name\"],\n",
+    "    include_metrics=[\n",
+    "        m.RelativeBias(),\n",
+    "    ]\n",
+    ").to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metrics_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that if you wanted to include a field in the query result, it must be included in the ``group_by`` list\n",
+    "even if it's not necessary for the grouping operation!\n",
+    "\n",
+    "For example, if we wanted to include ``q95`` in the query result, we would need to include it in the\n",
+    "``group_by`` list:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://github.com/RTIInternational/teehr/blob/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_2.png?raw=true\" width=850>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# Adding q95_cms to the group_by list to include it in the results.\n",
+    "metrics_df = ev.metrics.query(\n",
+    "    group_by=[\"primary_location_id\", \"configuration_name\", \"q95\"],\n",
+    "    include_metrics=[\n",
+    "        m.RelativeBias(),\n",
+    "    ]\n",
+    ").to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metrics_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Filtering\n",
+    "---------\n",
+    "\n",
+    "Next, we'll add filtering to further narrow the population for our metric calculations. Let's say we only\n",
+    "want to consider ``NWM v3.0`` and ``Marrmot`` model configurations:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img src=\"https://github.com/RTIInternational/teehr/blob/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_3.png?raw=true\" width=850>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We need to specify a filter in the ``query`` method to only include the desired model configurations:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "hide-output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# Adding a filter to further limit the population for metrics calculations.\n",
+    "metrics_df = ev.metrics.query(\n",
+    "    group_by=[\"primary_location_id\", \"configuration_name\", \"q95\"],\n",
+    "    include_metrics=[\n",
+    "        m.RelativeBias(),\n",
+    "    ],\n",
+    "    filters = [\n",
+    "        {\n",
+    "            \"column\": \"configuration_name\",\n",
+    "            \"operator\": \"in\",\n",
+    "            \"value\": [\"nwm30_retro\", \"marrmot_37_hbv_obj1\"]\n",
+    "        }\n",
+    "    ]\n",
+    ").to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metrics_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Summary\n",
+    "-------\n",
+    "\n",
+    "Grouping and filtering are powerful tools in TEEHR that allow us to explore the data in more detail and calculate metrics\n",
+    "for specific subsets of the data.\n",
+    "\n",
+    "See the User Guide for more in-depth examples using the code base."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ev.spark.stop()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}