Skip to content

Commit

Permalink
V0.3.0beta (#85)
Browse files Browse the repository at this point in the history
* initial TeehrDataset class layout

* initial TEEHR Dataset functionality

* initial TEEHRDataset functionality

* removed test code

* removed comment

* adds first api and web app that basically works

* fixes attr add, ads new test data

* remove some frontend code that was holding on

* make a couple of small changes to dataclass

* adds a few small bug fixes in new and existing code

* updates web api/app to use new dataclass

* fixes geometry join

* adds gitignore for web work

* small refactor, update gitignore,
fix bug in get_metrics again

* adds no qa

* homepage draft

* separate TEEHRDatasetAPI and TEEHRDatasetDB classes

* pydantic v2, add geometry to queries, re-org

* adding timeseries queries, fastapi endpoints

* tests and cleanup

* add filters

* add operators endpoint

* add timepicker

* flex filters

* update vscode settings

* fix bug in get metrics query

* make scripot work with new patterns

* add pydantic>2 to req.

* 81-integrate poetry (#83)

* initial poetry integration

* integrating poetry, upgrading pangeo, python3.11

* poetry.lock

* revert back to python3.10

* readme update

* minor edit

* uncommenting dockerfile section after GTS fix

* adds a .dockerignore

---------

Co-authored-by: Sam Lamont <[email protected]>
Co-authored-by: Matt Denno <[email protected]>

* add v0.3.0beta to teehr-hub

* update build action

* v0.3.0b geometry issues (#91)

* fixing include_geometry validation

* version bump

---------

Co-authored-by: Sam Lamont <[email protected]>

* hack fix for build process

* 88 comments on v030b dataset (#96)

* Updated doc strings for teehr dataset class

* Docstring updates, time series query deduplication

* additional comment

* typo

* increment beta version

* small update to get_timeseries() and
get_timeseries_chars()

* didn't quite get it fixed with last commit

* timeseries_name docstring, profile_query update

---------

Co-authored-by: Sam Lamont <[email protected]>
Co-authored-by: Matt Denno <[email protected]>

* update teehr-hub

* fix pydantic 2 issues

* remove test db from repo

* update test to use temp db

* updates release docs, info changelog.md

* update teehr-hub config

---------

Co-authored-by: Sam Lamont <[email protected]>
Co-authored-by: Manuel Alvarado <[email protected]>
Co-authored-by: samlamont <[email protected]>
  • Loading branch information
4 people authored Dec 8, 2023
1 parent 5618138 commit 92b1f46
Show file tree
Hide file tree
Showing 77 changed files with 15,489 additions and 907 deletions.
13 changes: 13 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.github
.ipynb_checkpoints
.pytest_cache
.vscode
dashboards
dist
docs
examples
frontend
playground
study_template
teehr-hub
tests
1 change: 0 additions & 1 deletion .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ jobs:
with:
cosign-release: 'v2.1.1' # optional


# Workaround: https://github.com/docker/build-push-action/issues/461
- name: Setup Docker buildx
uses: docker/setup-buildx-action@79abd3f86f79a9d68a23c75a09a9a85889262adf
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,4 @@ dmypy.json
.pyre/

# Tests output
temp/
temp/
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0] - 2023-12-08

### Added
* Adds a dataclass and database that allows preprocessing of joined timeseries and attributes as well as the addition of user defined functions.
* Adds an initial web service API that serves out `timeseries` and `metrics` along with some other supporting data.
* Adds an initial interactive web application using the web service API.

### Changed
* Switches to poetry to manage Python venv
* Upgrades to Pydantic 2+
* Upgrades to Pangeo image `pangeo/pangeo-notebook:2023.09.11`


## [0.2.9] - 2023-12-08

### Added
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@ RUN TEEHR_VERSION=$(cat /teehr/version.txt) && \

# Install TEEHR in the Pangeo Image
# https://hub.docker.com/r/pangeo/pangeo-notebook/tags
FROM pangeo/pangeo-notebook:2023.07.05
# Subsequent images use python=3.11
FROM pangeo/pangeo-notebook:2023.09.11

USER root
ENV DEBIAN_FRONTEND=noninteractive
ENV PATH ${NB_PYTHON_PREFIX}/bin:$PATH

# Needed for apt-key to work
# Needed for apt-key to work -- Is this part needed?
RUN apt-get update -qq --yes > /dev/null && \
apt-get install --yes -qq gnupg2 > /dev/null && \
rm -rf /var/lib/apt/lists/*
Expand Down
27 changes: 15 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,32 @@ assess their skill and performance.
NOTE: THIS PROJECT IS UNDER DEVELOPMENT - EXPECT TO FIND BROKEN AND INCOMPLETE CODE.

## How to Install TEEHR
Install with from source

Install poetry
```bash
$ pipx install poetry
```
Install from source
```bash
# Create and activate python environment, requires python >= 3.10
$ python3 -m venv .venv
$ source .venv/bin/activate
$ python3 -m pip install --upgrade pip

# Build and install from source
$ python3 -m pip install --upgrade build
$ python -m build
$ python -m pip install dist/teehr-0.2.9.tar.gz
$ poetry shell

# Install from source
$ poetry install
```

Install from GitHub
```bash
# Using pip
$ pip install 'teehr @ git+https://github.com/RTIInternational/teehr@[BRANCH_TAG]'

# Using poetry
$ poetry add git+https://github.com/RTIInternational/teehr.git#[BRANCH TAG]
```

Use Docker
```bash
$ docker build -t teehr:v0.2.9 .
$ docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.2.9 jupyter lab --ip 0.0.0.0 $HOME
$ docker build -t teehr:v0.3.0 .
$ docker run -it --rm --volume $HOME:$HOME -p 8888:8888 teehr:v0.3.0 jupyter lab --ip 0.0.0.0 $HOME
```

## Examples
Expand Down
1 change: 1 addition & 0 deletions docs/release_process.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ This document describes the release process which has some manual steps to compl
Create branch with the following updated to the new version (find and replace version number):
- `version.txt`
- `README.md`
- `pyproject.toml`

Update the `CHANGELOG.md` to reflect the changes included in the release.

Expand Down
221 changes: 221 additions & 0 deletions examples/loading/create_database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
"""
This script provides and example of how to create a TEEHR datyabase
and insert joined timeseries, append attributes, and add user
defined fields.
"""
from pathlib import Path
from teehr.database.teehr_dataset import TEEHRDatasetDB
import time
import datetime


TEST_STUDY_DIR = Path("/home/matt/temp/huc1802_retro")
PRIMARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "usgs.parquet")
SECONDARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "nwm2*.parquet")
CROSSWALK_FILEPATH = Path(TEST_STUDY_DIR, "geo", "usgs_nwm2*_crosswalk.parquet") # noqa
ATTRIBUTES_FILEPATH = Path(TEST_STUDY_DIR, "geo", "usgs_attr_*.parquet")
GEOMETRY_FILEPATH = Path(TEST_STUDY_DIR, "geo", "usgs_geometry.parquet")
DATABASE_FILEPATH = Path(TEST_STUDY_DIR, "huc1802_retro.db")

# Test data
# TEST_STUDY_DIR = Path("tests/data/test_study")
# PRIMARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "test_short_obs.parquet") # noqa
# SECONDARY_FILEPATH = Path(TEST_STUDY_DIR, "timeseries", "test_short_fcast.parquet") # noqa
# CROSSWALK_FILEPATH = Path(TEST_STUDY_DIR, "geo", "crosswalk.parquet")
# ATTRIBUTES_FILEPATH = Path(TEST_STUDY_DIR, "geo", "test_attr2.parquet")
# GEOMETRY_FILEPATH = Path(TEST_STUDY_DIR, "geo", "gages.parquet")
# DATABASE_FILEPATH = Path(TEST_STUDY_DIR, "temp_test.db")


def describe_inputs():
tds = TEEHRDatasetDB(DATABASE_FILEPATH)

# Check the parquet files and report some stats to the user (WIP)
df = tds.describe_inputs(
primary_filepath=PRIMARY_FILEPATH,
secondary_filepath=SECONDARY_FILEPATH
)

print(df)


def create_db_add_timeseries():

tds = TEEHRDatasetDB(DATABASE_FILEPATH)

# Perform the join and insert into duckdb database
# NOTE: Right now this will re-join and overwrite
print("Creating joined table")
tds.insert_joined_timeseries(
primary_filepath=PRIMARY_FILEPATH,
secondary_filepath=SECONDARY_FILEPATH,
crosswalk_filepath=CROSSWALK_FILEPATH
)
tds.insert_geometry(geometry_filepath=GEOMETRY_FILEPATH)


def add_attributes():
tds = TEEHRDatasetDB(DATABASE_FILEPATH)

# Join (one or more?) table(s) of attributes to the timeseries table
print("Adding attributes")
tds.join_attributes(ATTRIBUTES_FILEPATH)


def add_fields():

tds = TEEHRDatasetDB(DATABASE_FILEPATH)

# Calculate and add a field based on some user-defined function (UDF).
def test_user_function(arg1: float, arg2: str) -> float:
"""Function arguments are fields in joined_timeseries, and
should have the same data type.
Note: In the data model, attribute values are always str type"""
return float(arg1) / float(arg2)

parameter_names = ["primary_value", "upstream_area_km2"]
new_field_name = "primary_normalized_discharge"
new_field_type = "FLOAT"
tds.calculate_field(new_field_name=new_field_name,
new_field_type=new_field_type,
parameter_names=parameter_names,
user_defined_function=test_user_function)

# Calculate and add a field based on some user-defined function (UDF).
def add_month_field(arg1: datetime.datetime) -> int:
"""Function arguments are fields in joined_timeseries, and
should have the same data type.
Note: In the data model, attribute values are always str type"""
return arg1.month

parameter_names = ["value_time"]
new_field_name = "month"
new_field_type = "INTEGER"
tds.calculate_field(new_field_name=new_field_name,
new_field_type=new_field_type,
parameter_names=parameter_names,
user_defined_function=add_month_field)

# Calculate and add a field based on some user-defined function (UDF).
def exceed_2yr_recurrence(arg1: float, arg2: float) -> bool:
"""Function arguments are fields in joined_timeseries, and
should have the same data type.
Note: In the data model, attribute values are always str type"""
return float(arg1) > float(arg2)

parameter_names = ["primary_value", "retro_2yr_recurrence_flow_cms"]
new_field_name = "exceed_2yr_recurrence"
new_field_type = "BOOLEAN"
tds.calculate_field(new_field_name=new_field_name,
new_field_type=new_field_type,
parameter_names=parameter_names,
user_defined_function=exceed_2yr_recurrence)
pass


def run_metrics_query():

tds = TEEHRDatasetDB(DATABASE_FILEPATH)
# schema_df = tds.get_joined_timeseries_schema()
# print(schema_df[["column_name", "column_type"]])

# Get metrics
group_by = ["primary_location_id", "configuration"]
order_by = ["primary_location_id"]
include_metrics = ["mean_error", "bias"]
filters = [
# {
# "column": "primary_location_id",
# "operator": "=",
# "value": "usgs-11337080"
# },
# {
# "column": "month",
# "operator": "=",
# "value": 1
# },
# {
# "column": "upstream_area_km2",
# "operator": ">",
# "value": 1000
# },
# {
# "column": "exceed_2yr_recurrence",
# "operator": "=",
# "value": True
# }
]

t1 = time.time()
df1 = tds.get_metrics(
group_by=group_by,
order_by=order_by,
filters=filters,
include_metrics=include_metrics,
include_geometry=True,
# return_query=True
)
print(df1)
print(f"Database query: {(time.time() - t1):.2f} secs")

pass


def describe_database():
tds = TEEHRDatasetDB(DATABASE_FILEPATH)
df = tds.get_joined_timeseries_schema()
print(df)


def run_raw_query():

tds = TEEHRDatasetDB(DATABASE_FILEPATH)
query = """
WITH joined as (
SELECT
*
FROM joined_timeseries
)
, metrics AS (
SELECT
joined.primary_location_id,joined.configuration
, sum(primary_value - secondary_value)/count(*) as bias
, sum(absolute_difference)/count(*) as mean_error
FROM
joined
GROUP BY
joined.primary_location_id,joined.configuration
)
SELECT
metrics.*
,gf.geometry as geometry
FROM metrics
JOIN geometry gf
on primary_location_id = gf.id
ORDER BY
metrics.primary_location_id
;
;"""
# query = f"""
# COPY (
# SELECT * FROM joined_timeseries
# )
# TO '{str(Path(TEST_STUDY_DIR, "huc1802_retro.parquet"))}' (
# FORMAT 'parquet', COMPRESSION 'ZSTD', ROW_GROUP_SIZE 100000
# )
# ;"""
df = tds.query(query, format="df")
print(df)


if __name__ == "__main__":
# create_db_add_timeseries()
# describe_inputs()
# describe_database()
# add_attributes()
# describe_database()
# add_fields()
# describe_database()
# run_metrics_query()
# run_raw_query()
pass
20 changes: 20 additions & 0 deletions frontend/teehr/.eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module.exports = {
root: true,
env: { browser: true, es2020: true },
extends: [
'eslint:recommended',
'plugin:react/recommended',
'plugin:react/jsx-runtime',
'plugin:react-hooks/recommended',
],
ignorePatterns: ['dist', '.eslintrc.cjs'],
parserOptions: { ecmaVersion: 'latest', sourceType: 'module' },
settings: { react: { version: '18.2' } },
plugins: ['react-refresh'],
rules: {
'react-refresh/only-export-components': [
'warn',
{ allowConstantExport: true },
],
},
}
26 changes: 24 additions & 2 deletions frontend/teehr/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,24 @@
*
!.gitignore
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*

node_modules
dist
dist-ssr
*.local

# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
8 changes: 8 additions & 0 deletions frontend/teehr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# React + Vite

This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.

Currently, two official plugins are available:

- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/README.md) uses [Babel](https://babeljs.io/) for Fast Refresh
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
Loading

0 comments on commit 92b1f46

Please sign in to comment.