-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Unknown
committed
Nov 15, 2024
0 parents
commit 46451cd
Showing
280 changed files
with
98,660 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 1f703edd5345e855b20148247af3983d | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Empty file.
336 changes: 336 additions & 0 deletions
336
_downloads/06afd3ada6c423a261dd6c86e5478975/grouping_and_filtering.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,336 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Grouping and Filtering" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Once the data has been joined into a single table, we can start to group and filter the data based on the table attributes,\n", | ||
"and calculate metrics for specific subsets of the data. This is the explorative power of TEEHR, and allows us to\n", | ||
"better understand model performance. For example, if the joined table contained several model simulations (\"configurations\")\n", | ||
"we could group the ``configuration_name`` field to calculate performance metrics for each model configuration.\n", | ||
"\n", | ||
"We could then include filters to further narrow the population subset such as only considering first order stream locations or\n", | ||
"locations below a certain mean slope value. This allows us to gain more insight into the model performance through specific\n", | ||
"quantitative analysis." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The grouping and filtering capabilities in TEEHR provide the ability to explore models across\n", | ||
"different subsets of the data, allowing us to better understand where and why the model performs well or poorly." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We'll look at an example to help illustrate the grouping and filtering concepts." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=\"https://github.com/RTIInternational/teehr/blob/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_table.png?raw=true\">" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Consider this joined timeseries table containing:\n", | ||
"\n", | ||
"* 2 USGS locations\n", | ||
"* 3 Model configurations\n", | ||
"* 4 Daily timesteps spanning two months\n", | ||
"* 1 Location attribute (q95_cms)\n", | ||
"* 1 User-defined attribute (month)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "plaintext" | ||
} | ||
}, | ||
"source": [ | ||
"When calculating metrics in TEEHR, we can use the data in this table to calculate metrics over specific subsets or\n", | ||
"populations of the data. For example, we could calculate the relative bias for each model configuration for each month." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Grouping\n", | ||
"--------\n", | ||
"\n", | ||
"Let's use this table of joined timeseries values to demonstrate how grouping selected fields affects the results.\n", | ||
"\n", | ||
"First, we'll calculate the relative bias for each model configuration at each location:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's use this table of joined timeseries values to demonstrate how grouping selected fields affects the results.\n", | ||
"\n", | ||
"First, we'll calculate the relative bias for each model configuration at each location:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=\"https://raw.githubusercontent.com/RTIInternational/teehr/refs/heads/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_1.png\" width=850 height=500>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can demonstrate how this calculation is performed in TEEHR using sample data. First, we'll set up a local directory that will contain our Evaluation, then we'll clone a subset of an existing Evaluation from s3." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"hide-output" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"import shutil\n", | ||
"\n", | ||
"import teehr\n", | ||
"\n", | ||
"# Define the directory where the Evaluation will be created\n", | ||
"test_eval_dir = Path(Path().home(), \"temp\", \"grouping_tutorial\")\n", | ||
"shutil.rmtree(test_eval_dir, ignore_errors=True)\n", | ||
"\n", | ||
"# Create an Evaluation object and create the directory\n", | ||
"ev = teehr.Evaluation(dir_path=test_eval_dir, create_dir=True)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# List the evaluations in the S3 bucket\n", | ||
"ev.list_s3_evaluations()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"hide-output" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"ev.clone_from_s3(\n", | ||
" evaluation_name=\"p1_camels_daily_streamflow\",\n", | ||
" primary_location_ids=[\"usgs-01013500\", \"usgs-01022500\"],\n", | ||
" start_date=\"1990-10-30 00:00\",\n", | ||
" end_date=\"1990-11-02 23:00\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"hide-output" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from teehr import Metrics as m\n", | ||
"\n", | ||
"metrics_df = ev.metrics.query(\n", | ||
" group_by=[\"primary_location_id\", \"configuration_name\"],\n", | ||
" include_metrics=[\n", | ||
" m.RelativeBias(),\n", | ||
" ]\n", | ||
").to_pandas()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"metrics_df" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Note that if you wanted to include a field in the query result, it must be included in the ``group_by`` list\n", | ||
"even if it's not necessary for the grouping operation!\n", | ||
"\n", | ||
"For example, if we wanted to include ``q95`` in the query result, we would need to include it in the\n", | ||
"``group_by`` list:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=\"https://github.com/RTIInternational/teehr/blob/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_2.png?raw=true\" width=850>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"hide-output" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Adding q95_cms to the group_by list to include it in the results.\n", | ||
"metrics_df = ev.metrics.query(\n", | ||
" group_by=[\"primary_location_id\", \"configuration_name\", \"q95\"],\n", | ||
" include_metrics=[\n", | ||
" m.RelativeBias(),\n", | ||
" ]\n", | ||
").to_pandas()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"metrics_df" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Filtering\n", | ||
"---------\n", | ||
"\n", | ||
"Next, we'll add filtering to further narrow the population for our metric calculations. Let's say we only\n", | ||
"want to consider ``NWM v3.0`` and ``Marrmot`` model configurations:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"<img src=\"https://github.com/RTIInternational/teehr/blob/304-update-the-grouping-and-joining-docs-for-v040/docs/images/tutorials/grouping_filtering/grouping_example_3.png?raw=true\" width=850>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We need to specify a filter in the ``query`` method to only include the desired model configurations:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"hide-output" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Adding a filter to further limit the population for metrics calculations.\n", | ||
"metrics_df = ev.metrics.query(\n", | ||
" group_by=[\"primary_location_id\", \"configuration_name\", \"q95\"],\n", | ||
" include_metrics=[\n", | ||
" m.RelativeBias(),\n", | ||
" ],\n", | ||
" filters = [\n", | ||
" {\n", | ||
" \"column\": \"configuration_name\",\n", | ||
" \"operator\": \"in\",\n", | ||
" \"value\": [\"nwm30_retro\", \"marrmot_37_hbv_obj1\"]\n", | ||
" }\n", | ||
" ]\n", | ||
").to_pandas()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"metrics_df" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Summary\n", | ||
"-------\n", | ||
"\n", | ||
"Grouping and filtering are powerful tools in TEEHR that allow us to explore the data in more detail and calculate metrics\n", | ||
"for specific subsets of the data.\n", | ||
"\n", | ||
"See the User Guide for more in-depth examples using the code base." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ev.spark.stop()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.