Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge in code to run and score retrospective evaluation #102

Merged
merged 153 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
fa4065e
Postprocess now saves posterior samples as parquet
dylanhmorris Oct 31, 2024
6be6f4e
Merge branch 'main' into dhm-run-scoringutils
dylanhmorris Oct 31, 2024
4a63342
Add scoringutils 2.0 as dep
dylanhmorris Oct 31, 2024
eadc55e
Go back to using latest PMF for now while debug
dylanhmorris Oct 31, 2024
0492642
Checkpoint forecast scoring
dylanhmorris Oct 31, 2024
b05cf30
Working score for single forecast
dylanhmorris Oct 31, 2024
93fb9e4
Rename loop shell scripts, add optional scoring to forecast_state.py
dylanhmorris Oct 31, 2024
559b4b5
extra deps
SamuelBrand1 Oct 31, 2024
44cfea9
Fix incomplete change in variable names
dylanhmorris Oct 31, 2024
2c5ada8
Add baseline to score forecast
dylanhmorris Oct 31, 2024
012c571
Add missing join column
dylanhmorris Oct 31, 2024
6b3f264
add cdc flat forecast
SamuelBrand1 Oct 31, 2024
df6fd5c
Add model to forecast unit
dylanhmorris Oct 31, 2024
1ef8b76
Fix bug
dylanhmorris Oct 31, 2024
495c91c
Add quantile scoring
dylanhmorris Nov 1, 2024
9582f08
Merge CDC baseline
dylanhmorris Nov 1, 2024
afe7fe6
Fix accidental deletion
dylanhmorris Nov 1, 2024
185ae7d
Add remotes for cmu packages
dylanhmorris Nov 1, 2024
92606f4
Preinstall CMU packages
dylanhmorris Nov 1, 2024
b8e6b4b
Add CDC baseline to scoring
dylanhmorris Nov 1, 2024
0de7af4
Cache installation of pyrenew_hew itself in container
dylanhmorris Nov 1, 2024
5546430
Revert containerfile for now
dylanhmorris Nov 1, 2024
55cc99f
add cdc baseline proportion forecast, other cleanup
dylanhmorris Nov 1, 2024
7808e75
Path fixes in scoring
dylanhmorris Nov 1, 2024
75e8b18
Working scoring
dylanhmorris Nov 1, 2024
930a16e
Style cleanup
dylanhmorris Nov 1, 2024
863be5e
Add eval data saving script
dylanhmorris Nov 1, 2024
6c0eed5
Add in DB's eval data pull
dylanhmorris Nov 1, 2024
00aeadd
Only pull up to end of forecast horizon for efficiency
dylanhmorris Nov 1, 2024
c36e08f
Fix missing arg
dylanhmorris Nov 1, 2024
cfa47a8
Remove state_level_report_date from process_state_level_data
dylanhmorris Nov 1, 2024
deb109c
Do not score nowcasts by default
dylanhmorris Nov 1, 2024
8a94099
Clarify horizons
dylanhmorris Nov 1, 2024
f812287
Update setup_job to reflect changes
dylanhmorris Nov 1, 2024
b3d081c
Fix baseline bug
dylanhmorris Nov 1, 2024
bcba858
Clearer location excludes
dylanhmorris Nov 1, 2024
0aa9259
Fix exclusion typo in setup_job
dylanhmorris Nov 1, 2024
d702d87
Create postprocess_scoring.R
SamuelBrand1 Nov 1, 2024
731a026
ignore .rds
SamuelBrand1 Nov 1, 2024
2423dc7
add table summary
SamuelBrand1 Nov 1, 2024
f29c275
tidy postprocessing
SamuelBrand1 Nov 1, 2024
cafce07
add plot of scores against week
SamuelBrand1 Nov 1, 2024
24cbe81
make relative WIS score
SamuelBrand1 Nov 1, 2024
608a263
remove commented line
SamuelBrand1 Nov 1, 2024
3c10c0a
Add score table collation
dylanhmorris Nov 1, 2024
59da66b
Tweaks and bug fixes to collation
dylanhmorris Nov 1, 2024
4d60127
Top level function can save
dylanhmorris Nov 1, 2024
db50639
Merge branch 'dhm-collate-scores' into dhm-run-scoringutils
dylanhmorris Nov 1, 2024
37efbed
Merge remote-tracking branch 'origin/scoring-plots' into dhm-collate-…
dylanhmorris Nov 1, 2024
4f19c96
Preserve metrics when collating
dylanhmorris Nov 1, 2024
cbc5277
Merge branch 'dhm-collate-scores' into dhm-run-scoringutils
dylanhmorris Nov 1, 2024
9cc3609
Some more plots
dylanhmorris Nov 1, 2024
f4c4091
Working figures and tables from collated scores
dylanhmorris Nov 1, 2024
5c04591
Revised forecast plots (#99)
damonbayer Nov 1, 2024
936e237
Vertical rel wis plot, switch flag default
dylanhmorris Nov 1, 2024
6dfafc4
Typo fix and update argv extraction
dylanhmorris Nov 1, 2024
de2ce80
Helper for postprocessing job
dylanhmorris Nov 2, 2024
e89742d
Add new dep for postprocess to hewr, modify loop_postprocess.sh so it…
dylanhmorris Nov 2, 2024
1f6176d
Update forecast_state.py to reflect new postprocess_state_forecast.R
dylanhmorris Nov 2, 2024
67e1f88
Construct container image name programmatically in setup_job.py
dylanhmorris Nov 2, 2024
c3d0fa1
Generalize setup_job.py
dylanhmorris Nov 2, 2024
2e73673
Begin working on plot collation
dylanhmorris Nov 2, 2024
1d496fb
Add plot collation script
dylanhmorris Nov 2, 2024
149264f
Tweaks to collation
dylanhmorris Nov 2, 2024
f51697c
Add logging to collate_plots
dylanhmorris Nov 2, 2024
da6833d
Update score table collation to be pathogen-agnostic
dylanhmorris Nov 2, 2024
849c858
make score table collation less verbose
dylanhmorris Nov 2, 2024
26c3512
Add missing space to warning message
dylanhmorris Nov 2, 2024
708caad
Further cleanup and clarification in collate_score_tables
dylanhmorris Nov 2, 2024
e74f068
Further improve messages to user in collate_score_table.R
dylanhmorris Nov 2, 2024
c798a28
CLI for postprocess_scoring
dylanhmorris Nov 3, 2024
58827b9
fix function name typo
dylanhmorris Nov 4, 2024
09cc971
fix plotting bug
damonbayer Nov 4, 2024
e6c202f
Separate prod and eval setup
dylanhmorris Nov 4, 2024
9e2e814
Overall script description
dylanhmorris Nov 4, 2024
b444c04
Clean up score postprocess
dylanhmorris Nov 4, 2024
b6f00ef
Add national to prep data
dylanhmorris Nov 4, 2024
e0e0224
Fix typo (capital L in pl.col)
dylanhmorris Nov 4, 2024
dffac09
Checks for national aggregation, national pop
dylanhmorris Nov 4, 2024
205138c
add needed get_state_pop_df() call in eval data fetch
dylanhmorris Nov 4, 2024
e3ea58f
Add script for creating hubverse table
dylanhmorris Nov 4, 2024
aefde81
Qualify argparser namespace
dylanhmorris Nov 4, 2024
e7ea87c
Missing quotation marks
dylanhmorris Nov 4, 2024
eabd0a6
Fix a bunch of typo-induced bugs
dylanhmorris Nov 4, 2024
52bb233
Use forecasttools epiweek to date func
dylanhmorris Nov 4, 2024
0a73838
Update nssp_demo/utils.py
dylanhmorris Nov 4, 2024
74f56e7
Update nssp_demo/forecast_state.py
dylanhmorris Nov 4, 2024
67c4c40
Update nssp_demo/postprocess_state_forecast.R
dylanhmorris Nov 4, 2024
c765d2e
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
440b5fd
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
5f702b8
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
2a387c9
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
85cbe71
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
f7e2bc0
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
5c38e88
Update nssp_demo/score_forecast.R
dylanhmorris Nov 4, 2024
6e1148f
Use the map pattern for quiet load in score_forecast.R
dylanhmorris Nov 5, 2024
4ba28c1
Score file save path in collate_score_tables, which also acts as the …
dylanhmorris Nov 5, 2024
7765588
switch to readr::read_rrds
dylanhmorris Nov 5, 2024
d61f36f
Move loop scripts to an iteration_helpers subdir
dylanhmorris Nov 5, 2024
3f41e35
switch to using CRAN scoringutils 2.0
dylanhmorris Nov 5, 2024
ca93bf2
More readable directory filter
dylanhmorris Nov 5, 2024
24ce19e
Eval data type in save eval data
dylanhmorris Nov 5, 2024
bd1ced3
Remove score_nowcast switch
dylanhmorris Nov 5, 2024
52f0033
Fix write_rds function name
dylanhmorris Nov 5, 2024
29a71f3
Fix bug in hubverse table
dylanhmorris Nov 5, 2024
6e00244
Update nssp_demo/postprocess_scoring.R
dylanhmorris Nov 5, 2024
aeb1d08
Update nssp_demo/postprocess_scoring.R
dylanhmorris Nov 5, 2024
28e56f1
Apply suggestions from code review
dylanhmorris Nov 5, 2024
dd8d97c
Update nssp_demo/batch/setup_eval_job.py
dylanhmorris Nov 5, 2024
3ae0c27
Update nssp_demo/collate_score_tables.R
dylanhmorris Nov 5, 2024
0d33133
Autostyle files
dylanhmorris Nov 5, 2024
da6e75a
More hubverse table bug fixes
dylanhmorris Nov 5, 2024
d4f1ee7
Fix disease name switch
dylanhmorris Nov 5, 2024
6d01a15
Deal with fact that forecasttools wants the report date to be the epi…
dylanhmorris Nov 5, 2024
65d4c89
Update nssp_demo/batch/setup_eval_job.py
dylanhmorris Nov 5, 2024
4421534
Make excluded locations command line args in batch job setup scripts
dylanhmorris Nov 5, 2024
5f77e25
Delete postprocess job
dylanhmorris Nov 5, 2024
b12954c
replace nested loop with itertools product
dylanhmorris Nov 5, 2024
a55d1d2
Better argument acceptance for main of setup_prod_job
dylanhmorris Nov 5, 2024
49c3afa
write_rds in score_forecast.R
dylanhmorris Nov 5, 2024
af76cfd
Change pdf merge and save function name
dylanhmorris Nov 5, 2024
1748b42
Simplify subdir search
dylanhmorris Nov 5, 2024
1f4ba6d
Move process_dir() outside of main, document it
dylanhmorris Nov 5, 2024
dc06736
Clean up argparsing in collate_plots
dylanhmorris Nov 5, 2024
6dc7ce1
Log plots
dylanhmorris Nov 6, 2024
d44bae9
underscore to hyphen in excluded-locations flags
dylanhmorris Nov 6, 2024
1eae400
Bug fixes for collate_plots.py
dylanhmorris Nov 6, 2024
87fe067
Add make_observed_data_table.py
dylanhmorris Nov 6, 2024
a2043ae
Add single forecast plot collation
dylanhmorris Nov 6, 2024
3bf8f00
Add missing disease argument
dylanhmorris Nov 6, 2024
8329194
Logic parentheses
dylanhmorris Nov 6, 2024
33dc831
Merge branch 'main' into dhm-run-scoringutils
dylanhmorris Nov 8, 2024
4dd9ef8
Add parsing of dir names to hewr and use for score table collation
dylanhmorris Nov 8, 2024
927bfbd
Pull in forecast scoring updates from other branch
dylanhmorris Nov 8, 2024
df8b1ca
Anonymous function and removal of intermediate variable in collate_sc…
dylanhmorris Nov 8, 2024
ed2b204
Specify extensions separately in score_forecast
dylanhmorris Nov 8, 2024
9b0b2e9
train_data_path and inference_train_data_path --> inference_data_path…
dylanhmorris Nov 8, 2024
5539f6e
Clarify comment about PNGs
dylanhmorris Nov 8, 2024
96f7c77
use fs rather than stringr to check extensions
dylanhmorris Nov 8, 2024
f1cda8f
Use purrr::walk2 to save parquets
dylanhmorris Nov 8, 2024
156c4ec
Coord trans instead of replacing scale
dylanhmorris Nov 8, 2024
30b0313
Update nssp_demo/batch/setup_eval_job.py
dylanhmorris Nov 8, 2024
39394c8
Update nssp_demo/batch/setup_prod_job.py
dylanhmorris Nov 8, 2024
ee1fd98
Use itertools product for setup_prod_job.py
dylanhmorris Nov 8, 2024
d9f0819
Add explanatory comment
dylanhmorris Nov 8, 2024
5ce3003
Demystify magic number
dylanhmorris Nov 8, 2024
dd077a2
Update nssp_demo/collate_score_tables.R
dylanhmorris Nov 8, 2024
44519e1
one more saveRDS-->write_rds
dylanhmorris Nov 8, 2024
26b7c0d
Use tribble
dylanhmorris Nov 8, 2024
3683385
Fix variable name
dylanhmorris Nov 8, 2024
0c29f2f
Fix variable name clash
dylanhmorris Nov 8, 2024
c3f1407
Transform y, not x
dylanhmorris Nov 8, 2024
b28ffda
Try a different approach to y transforms
dylanhmorris Nov 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
*.bin
*.xls
*.xlsx
*.rds

# Documents
*.doc
Expand Down
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ repos:
#####
# Basic file cleanliness
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v5.0.0
hooks:
- id: check-added-large-files
- id: check-yaml
Expand All @@ -13,7 +13,7 @@ repos:
#####
# Python
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.8
rev: v0.7.1
hooks:
# Sort imports
- id: ruff
Expand All @@ -26,13 +26,13 @@ repos:
#####
# R
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.4.3
rev: v0.4.3.9001
hooks:
- id: style-files
- id: lintr
# Secrets
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
rev: v1.5.0
hooks:
- id: detect-secrets
args: ["--baseline", ".secrets.baseline"]
Expand Down
3 changes: 2 additions & 1 deletion Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@ WORKDIR pyrenew-hew
COPY .ContainerBuildRprofile .Rprofile

RUN Rscript -e "install.packages('pak')"
RUN Rscript -e "pak::pkg_install('cmu-delphi/epiprocess@main')"
RUN Rscript -e "pak::pkg_install('cmu-delphi/epipredict@main')"
RUN Rscript -e "pak::local_install('hewr')"

COPY . .

RUN pip install --root-user-action=ignore .
4 changes: 3 additions & 1 deletion hewr/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@ License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Imports:
argparser,
arrow,
cowplot,
dplyr,
fable,
feasts,
forcats,
fs,
ggplot2,
glue,
Expand All @@ -25,6 +26,7 @@ Imports:
purrr,
readr,
scales,
scoringutils (>= 2.0.0),
stringr,
tibble,
tidybayes,
Expand Down
63 changes: 63 additions & 0 deletions hewr/R/parse_path.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
disease_map_lower <- list(
"covid-19" = "COVID-19",
"influenza" = "Influenza"
)

#' Parse the name of a model batch directory
#' (i.e. a directory representing a single
#' report date and disease pair, but potentially
#' with fits for multiple locations), returning
#' a named list of quantities of interest.
#'
#' @param model_batch_dir_name Name of the model batch
#' directory (not the full path to it, just the directory
#' base name) to parse.
#' @return A list of quantities: `disease`, `report_date`,
#' `first_training_date`, and `last_training_date`.
#' @export
parse_model_batch_dir <- function(model_batch_dir_name) {
pattern <- "(.+)_r_(.+)_f_(.+)_t_(.+)"

matches <- stringr::str_match(
model_batch_dir_name,
pattern
)

if (is.na(matches[1])) {
stop(
"Invalid format for model batch directory name; ",
"could not parse. Expected ",
"'<disease>_r_<report_date>_f_<first_training_date>_t_",
"<last_training_date>'."
)
}

return(list(
disease = disease_map_lower[[matches[2]]],
report_date = lubridate::ymd(matches[3]),
first_training_date = lubridate::ymd(matches[4]),
last_training_date = lubridate::ymd(matches[5])
))
}

#' Parse path to a model run directory
#' (i.e. a directory representing a run for a
#' particular location, disease, and reference
#' date, and extract key quantities of interest.
#'
#' @param model_run_dir_path Path to parse.
#' @return A list of parsed attributes:
#' `location`, `disease`, `report_date`,
#' `first_training_date`, and `last_training_date`.
#'
#' @export
parse_model_run_dir <- function(model_run_dir_path) {
batch_dir <- fs::path_dir(model_run_dir_path) |>
fs::path_file()
location <- fs::path_file(model_run_dir_path)

return(c(
list(location = location),
parse_model_batch_dir(batch_dir)
))
}
28 changes: 0 additions & 28 deletions nssp_demo/all_post_process.sh

This file was deleted.

219 changes: 219 additions & 0 deletions nssp_demo/batch/setup_eval_job.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
"""
Set up a multi-location, multi-date,
potentially multi-disease end to end
retrospective evaluation run for pyrenew-hew
on Azure Batch.
"""

import argparse
import datetime
import itertools

import polars as pl
from azure.batch import models
from azuretools.auth import EnvCredentialHandler
from azuretools.client import get_batch_service_client
from azuretools.job import create_job_if_not_exists
from azuretools.task import get_container_settings, get_task_config


def main(
job_id: str,
pool_id: str,
diseases: str,
container_image_name: str = "pyrenew-hew",
container_image_version: str = "latest",
excluded_locations: list[str] = [
"AS",
"GU",
"MO",
"MP",
"PR",
"UM",
"VI",
"WY",
],
) -> None:
"""
job_id
Name for the Batch job.

pool_id
Azure Batch pool on which to run the job.

diseases
Name(s) of disease(s) to run as part of the job,
as a whitespace-separated string. Supported
values are 'COVID-19' and 'Influenza'.

container_image_name:
Name of the container to use for the job.
This container should exist within the Azure
Container Registry account associated to
the job. Default 'pyrenew-hew'.
The container registry account name and endpoint
will be obtained from local environment variables
via a :class``azuretools.auth.EnvCredentialHandler`.

container_image_version
Version of the container to use. Default 'latest'.

excluded_locations
List of two letter USPS location abbreviations to
exclude from the job. Defaults to locations for which
we typically do not have available NSSP ED visit data:
``["AS", "GU", "MO", "MP", "PR", "UM", "VI", "WY"]``.

Returns
-------
None
"""
supported_diseases = ["COVID-19", "Influenza"]

disease_list = diseases.split()
invalid_diseases = set(disease_list) - set(supported_diseases)
if invalid_diseases:
raise ValueError(
f"Unsupported diseases: {', '.join(invalid_diseases)}; "
f"supported diseases are: {', '.join(supported_diseases)}"
)

creds = EnvCredentialHandler()
client = get_batch_service_client(creds)
job = models.JobAddParameter(
id=job_id,
pool_info=models.PoolInformation(pool_id=pool_id),
)
create_job_if_not_exists(client, job, verbose=True)

container_image = (
f"{creds.azure_container_registry_account}."
f"{creds.azure_container_registry_domain}/"
f"{container_image_name}:{container_image_version}"
)
container_settings = get_container_settings(
container_image,
working_directory="containerImageDefault",
mount_pairs=[
{
"source": "nssp-etl",
"target": "/pyrenew-hew/nssp_demo/nssp-etl",
},
{
"source": "nssp-archival-vintages",
"target": "/pyrenew-hew/nssp_demo/nssp-archival-vintages",
},
{
"source": "prod-param-estimates",
"target": "/pyrenew-hew/nssp_demo/params",
},
{
"source": "pyrenew-test-output",
"target": "/pyrenew-hew/nssp_demo/private_data",
},
],
)

base_call = (
"/bin/bash -c '"
"python nssp_demo/forecast_state.py "
"--disease {disease} "
"--state {state} "
"--n-training-days 365 "
"--n-warmup 1000 "
"--n-samples 500 "
"--facility-level-nssp-data-dir nssp_demo/nssp-etl/gold "
"--state-level-nssp-data-dir "
"nssp_demo/nssp-archival-vintages/gold "
"--param-data-dir nssp_demo/params "
"--output-data-dir nssp_demo/private_data "
"--report-date {report_date:%Y-%m-%d} "
"--exclude-last-n-days 2 "
"--score "
"--eval-data-path "
"nssp_demo/nssp-archival-vintages/latest_comprehensive.parquet"
"'"
)

locations = pl.read_csv(
"https://www2.census.gov/geo/docs/reference/state.txt", separator="|"
)

all_locations = (
locations.filter(~pl.col("STUSAB").is_in(excluded_locations))
.get_column("STUSAB")
.to_list()
)

report_dates = [
datetime.date(2023, 10, 11) + datetime.timedelta(weeks=x)
for x in range(30)
]

for disease, report_date, loc in itertools.product(
disease_list, report_dates, all_locations
):
task = get_task_config(
f"{job_id}-{loc}-{disease}-{report_date}",
base_call=base_call.format(
state=loc,
disease=disease,
report_date=report_date,
),
container_settings=container_settings,
)
client.task.add(job_id, task)

return None


parser = argparse.ArgumentParser()

parser.add_argument("job_id", type=str, help="Name for the Azure batch job")
parser.add_argument(
"pool_id",
type=str,
help=("Name of the Azure batch pool on which to run the job"),
)
parser.add_argument(
"diseases",
type=str,
help=(
"Name(s) of disease(s) to run as part of the job, "
"as a whitespace-separated string. Supported "
"values are 'COVID-19' and 'Influenza'."
),
)

parser.add_argument(
"--container-image-name",
type=str,
help="Name of the container to use for the job.",
default="pyrenew-hew",
)

parser.add_argument(
"--container-image-version",
type=str,
help="Version of the container to use for the job.",
default="latest",
)

parser.add_argument(
"--excluded-locations",
type=str,
help=(
"Two-letter USPS location abbreviations to "
"exclude from the job, as a whitespace-separated "
"string. Defaults to a set of locations for which "
"we typically do not have available NSSP ED visit "
"data: 'AS GU MO MP PR UM VI WY'."
),
default="AS GU MO MP PR UM VI WY",
)


if __name__ == "__main__":
args = parser.parse_args()
args.excluded_locations = args.excluded_locations.split()
main(**vars(args))
Loading
Loading