-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initial TeehrDataset class layout * initial TEEHR Dataset functionality * initial TEEHRDataset functionality * removed test code * removed comment * adds first api and web app that basically works * fixes attr add, ads new test data * remove some frontend code that was holding on * make a couple of small changes to dataclass * adds a few small bug fixes in new and existing code * updates web api/app to use new dataclass * fixes geometry join * adds gitignore for web work * small refactor, update gitignore, fix bug in get_metrics again * adds no qa * homepage draft * separate TEEHRDatasetAPI and TEEHRDatasetDB classes * pydantic v2, add geometry to queries, re-org * adding timeseries queries, fastapi endpoints * tests and cleanup * add filters * add operators endpoint * add timepicker * flex filters * update vscode settings * fix bug in get metrics query * make scripot work with new patterns * add pydantic>2 to req. * 81-integrate poetry (#83) * initial poetry integration * integrating poetry, upgrading pangeo, python3.11 * poetry.lock * revert back to python3.10 * readme update * minor edit * uncommenting dockerfile section after GTS fix * adds a .dockerignore --------- Co-authored-by: Sam Lamont <[email protected]> Co-authored-by: Matt Denno <[email protected]> * add v0.3.0beta to teehr-hub * update build action * v0.3.0b geometry issues (#91) * fixing include_geometry validation * version bump --------- Co-authored-by: Sam Lamont <[email protected]> * hack fix for build process * 88 comments on v030b dataset (#96) * Updated doc strings for teehr dataset class * Docstring updates, time series query deduplication * additional comment * typo * increment beta version * small update to get_timeseries() and get_timeseries_chars() * didn't quite get it fixed with last commit * timeseries_name docstring, profile_query update --------- Co-authored-by: Sam Lamont <[email protected]> Co-authored-by: Matt Denno <[email protected]> * update teehr-hub * fix pydantic 2 issues * remove test db from repo * update test to use temp db * updates release docs, info changelog.md * update teehr-hub config --------- Co-authored-by: Sam Lamont <[email protected]> Co-authored-by: Manuel Alvarado <[email protected]> Co-authored-by: samlamont <[email protected]>
- Loading branch information
1 parent
5618138
commit 92b1f46
Showing
77 changed files
with
15,489 additions
and
907 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
.github | ||
.ipynb_checkpoints | ||
.pytest_cache | ||
.vscode | ||
dashboards | ||
dist | ||
docs | ||
examples | ||
frontend | ||
playground | ||
study_template | ||
teehr-hub | ||
tests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,4 +129,4 @@ dmypy.json | |
.pyre/ | ||
|
||
# Tests output | ||
temp/ | ||
temp/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,221 @@ | ||
""" | ||
This script provides and example of how to create a TEEHR datyabase | ||
and insert joined timeseries, append attributes, and add user | ||
defined fields. | ||
""" | ||
from pathlib import Path | ||
from teehr.database.teehr_dataset import TEEHRDatasetDB | ||
import time | ||
import datetime | ||
|
||
|
||
TEST_STUDY_DIR = Path("/home/matt/temp/huc1802_retro") | ||
PRIMARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "usgs.parquet") | ||
SECONDARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "nwm2*.parquet") | ||
CROSSWALK_FILEPATH = Path(TEST_STUDY_DIR, "geo", "usgs_nwm2*_crosswalk.parquet") # noqa | ||
ATTRIBUTES_FILEPATH = Path(TEST_STUDY_DIR, "geo", "usgs_attr_*.parquet") | ||
GEOMETRY_FILEPATH = Path(TEST_STUDY_DIR, "geo", "usgs_geometry.parquet") | ||
DATABASE_FILEPATH = Path(TEST_STUDY_DIR, "huc1802_retro.db") | ||
|
||
# Test data | ||
# TEST_STUDY_DIR = Path("tests/data/test_study") | ||
# PRIMARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "test_short_obs.parquet") # noqa | ||
# SECONDARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "test_short_fcast.parquet") # noqa | ||
# CROSSWALK_FILEPATH = Path(TEST_STUDY_DIR, "geo", "crosswalk.parquet") | ||
# ATTRIBUTES_FILEPATH = Path(TEST_STUDY_DIR, "geo", "test_attr2.parquet") | ||
# GEOMETRY_FILEPATH = Path(TEST_STUDY_DIR, "geo", "gages.parquet") | ||
# DATABASE_FILEPATH = Path(TEST_STUDY_DIR, "temp_test.db") | ||
|
||
|
||
def describe_inputs(): | ||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
|
||
# Check the parquet files and report some stats to the user (WIP) | ||
df = tds.describe_inputs( | ||
primary_filepath=PRIMARY_FILEPATH, | ||
secondary_filepath=SECONDARY_FILEPATH | ||
) | ||
|
||
print(df) | ||
|
||
|
||
def create_db_add_timeseries(): | ||
|
||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
|
||
# Perform the join and insert into duckdb database | ||
# NOTE: Right now this will re-join and overwrite | ||
print("Creating joined table") | ||
tds.insert_joined_timeseries( | ||
primary_filepath=PRIMARY_FILEPATH, | ||
secondary_filepath=SECONDARY_FILEPATH, | ||
crosswalk_filepath=CROSSWALK_FILEPATH | ||
) | ||
tds.insert_geometry(geometry_filepath=GEOMETRY_FILEPATH) | ||
|
||
|
||
def add_attributes(): | ||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
|
||
# Join (one or more?) table(s) of attributes to the timeseries table | ||
print("Adding attributes") | ||
tds.join_attributes(ATTRIBUTES_FILEPATH) | ||
|
||
|
||
def add_fields(): | ||
|
||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
|
||
# Calculate and add a field based on some user-defined function (UDF). | ||
def test_user_function(arg1: float, arg2: str) -> float: | ||
"""Function arguments are fields in joined_timeseries, and | ||
should have the same data type. | ||
Note: In the data model, attribute values are always str type""" | ||
return float(arg1) / float(arg2) | ||
|
||
parameter_names = ["primary_value", "upstream_area_km2"] | ||
new_field_name = "primary_normalized_discharge" | ||
new_field_type = "FLOAT" | ||
tds.calculate_field(new_field_name=new_field_name, | ||
new_field_type=new_field_type, | ||
parameter_names=parameter_names, | ||
user_defined_function=test_user_function) | ||
|
||
# Calculate and add a field based on some user-defined function (UDF). | ||
def add_month_field(arg1: datetime.datetime) -> int: | ||
"""Function arguments are fields in joined_timeseries, and | ||
should have the same data type. | ||
Note: In the data model, attribute values are always str type""" | ||
return arg1.month | ||
|
||
parameter_names = ["value_time"] | ||
new_field_name = "month" | ||
new_field_type = "INTEGER" | ||
tds.calculate_field(new_field_name=new_field_name, | ||
new_field_type=new_field_type, | ||
parameter_names=parameter_names, | ||
user_defined_function=add_month_field) | ||
|
||
# Calculate and add a field based on some user-defined function (UDF). | ||
def exceed_2yr_recurrence(arg1: float, arg2: float) -> bool: | ||
"""Function arguments are fields in joined_timeseries, and | ||
should have the same data type. | ||
Note: In the data model, attribute values are always str type""" | ||
return float(arg1) > float(arg2) | ||
|
||
parameter_names = ["primary_value", "retro_2yr_recurrence_flow_cms"] | ||
new_field_name = "exceed_2yr_recurrence" | ||
new_field_type = "BOOLEAN" | ||
tds.calculate_field(new_field_name=new_field_name, | ||
new_field_type=new_field_type, | ||
parameter_names=parameter_names, | ||
user_defined_function=exceed_2yr_recurrence) | ||
pass | ||
|
||
|
||
def run_metrics_query(): | ||
|
||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
# schema_df = tds.get_joined_timeseries_schema() | ||
# print(schema_df[["column_name", "column_type"]]) | ||
|
||
# Get metrics | ||
group_by = ["primary_location_id", "configuration"] | ||
order_by = ["primary_location_id"] | ||
include_metrics = ["mean_error", "bias"] | ||
filters = [ | ||
# { | ||
# "column": "primary_location_id", | ||
# "operator": "=", | ||
# "value": "usgs-11337080" | ||
# }, | ||
# { | ||
# "column": "month", | ||
# "operator": "=", | ||
# "value": 1 | ||
# }, | ||
# { | ||
# "column": "upstream_area_km2", | ||
# "operator": ">", | ||
# "value": 1000 | ||
# }, | ||
# { | ||
# "column": "exceed_2yr_recurrence", | ||
# "operator": "=", | ||
# "value": True | ||
# } | ||
] | ||
|
||
t1 = time.time() | ||
df1 = tds.get_metrics( | ||
group_by=group_by, | ||
order_by=order_by, | ||
filters=filters, | ||
include_metrics=include_metrics, | ||
include_geometry=True, | ||
# return_query=True | ||
) | ||
print(df1) | ||
print(f"Database query: {(time.time() - t1):.2f} secs") | ||
|
||
pass | ||
|
||
|
||
def describe_database(): | ||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
df = tds.get_joined_timeseries_schema() | ||
print(df) | ||
|
||
|
||
def run_raw_query(): | ||
|
||
tds = TEEHRDatasetDB(DATABASE_FILEPATH) | ||
query = """ | ||
WITH joined as ( | ||
SELECT | ||
* | ||
FROM joined_timeseries | ||
) | ||
, metrics AS ( | ||
SELECT | ||
joined.primary_location_id,joined.configuration | ||
, sum(primary_value - secondary_value)/count(*) as bias | ||
, sum(absolute_difference)/count(*) as mean_error | ||
FROM | ||
joined | ||
GROUP BY | ||
joined.primary_location_id,joined.configuration | ||
) | ||
SELECT | ||
metrics.* | ||
,gf.geometry as geometry | ||
FROM metrics | ||
JOIN geometry gf | ||
on primary_location_id = gf.id | ||
ORDER BY | ||
metrics.primary_location_id | ||
; | ||
;""" | ||
# query = f""" | ||
# COPY ( | ||
# SELECT * FROM joined_timeseries | ||
# ) | ||
# TO '{str(Path(TEST_STUDY_DIR, "huc1802_retro.parquet"))}' ( | ||
# FORMAT 'parquet', COMPRESSION 'ZSTD', ROW_GROUP_SIZE 100000 | ||
# ) | ||
# ;""" | ||
df = tds.query(query, format="df") | ||
print(df) | ||
|
||
|
||
if __name__ == "__main__": | ||
# create_db_add_timeseries() | ||
# describe_inputs() | ||
# describe_database() | ||
# add_attributes() | ||
# describe_database() | ||
# add_fields() | ||
# describe_database() | ||
# run_metrics_query() | ||
# run_raw_query() | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
module.exports = { | ||
root: true, | ||
env: { browser: true, es2020: true }, | ||
extends: [ | ||
'eslint:recommended', | ||
'plugin:react/recommended', | ||
'plugin:react/jsx-runtime', | ||
'plugin:react-hooks/recommended', | ||
], | ||
ignorePatterns: ['dist', '.eslintrc.cjs'], | ||
parserOptions: { ecmaVersion: 'latest', sourceType: 'module' }, | ||
settings: { react: { version: '18.2' } }, | ||
plugins: ['react-refresh'], | ||
rules: { | ||
'react-refresh/only-export-components': [ | ||
'warn', | ||
{ allowConstantExport: true }, | ||
], | ||
}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,24 @@ | ||
* | ||
!.gitignore | ||
# Logs | ||
logs | ||
*.log | ||
npm-debug.log* | ||
yarn-debug.log* | ||
yarn-error.log* | ||
pnpm-debug.log* | ||
lerna-debug.log* | ||
|
||
node_modules | ||
dist | ||
dist-ssr | ||
*.local | ||
|
||
# Editor directories and files | ||
.vscode/* | ||
!.vscode/extensions.json | ||
.idea | ||
.DS_Store | ||
*.suo | ||
*.ntvs* | ||
*.njsproj | ||
*.sln | ||
*.sw? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# React + Vite | ||
|
||
This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules. | ||
|
||
Currently, two official plugins are available: | ||
|
||
- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/README.md) uses [Babel](https://babeljs.io/) for Fast Refresh | ||
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh |
Oops, something went wrong.