diff --git a/DESCRIPTION b/DESCRIPTION
index d3d100a9..49197843 100755
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -45,17 +45,21 @@ Imports:
 Suggests:
     covidcast,
     epidatr,
+    epipredict,
     ggplot2,
     knitr,
     outbreaks,
+    plotly,
     rmarkdown,
     testthat (>= 3.1.5),
     waldo (>= 0.3.1),
-    withr
+    withr,
+    yaml
 VignetteBuilder: 
     knitr
 Remotes:
     cmu-delphi/epidatr,
+    cmu-delphi/epipredict,
     reconverse/outbreaks,
     glmgen/genlasso
 Config/testthat/edition: 3
diff --git a/vignettes/backtesting.Rmd b/vignettes/backtesting.Rmd
new file mode 100644
index 00000000..7bafadb2
--- /dev/null
+++ b/vignettes/backtesting.Rmd
@@ -0,0 +1,388 @@
+---
+title: Backtesting models
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Backtesting models}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+### Backtesting overview, and why to use `epi_archive`s
+
+The `epiprocess` package provides a few functions that can be used for
+backtesting forecasts and other models: `epi_slide()` (on `epi_df`s), and
+`epix_as_of()` and `epix_slide()` (on `epi_archive`s); however, we recommend
+always using `epi_archive`s, because:
+
+* `epix_as_of()` and `epix_slide()` faithfully reproduce latency in data
+  reporting, while `epi_slide()` does not; and
+* `epix_as_of()` and `epix_slide()` faithfully reproduce the version of each
+  available observation that would have been available in real time, while
+  `epi_slide()` does not.
+
+Using `epi_slide()` or a similar approach instead will typically produce
+**overly optimistic** retrospective forecast evaluations, and might miss some
+run-time errors that would have occurred in real time due to irregular latency
+in data reporting. Thus, it is important to use a versioning-faithful
+"pseudoprospective" approach like `epix_as_of()` and `epix_slide()` for
+backtesting evaluations whenever possible.
+
+Additionally, `epix_as_of()` and `epix_slide()` can be used to build and
+evaluate version-aware models that factor in historical data revisions to try to
+improve their accuracy.
+
+### Smaller reproducibility issues that may remain
+
+Even using version-aware approaches based on `epi_archive`s, there may still be
+some smaller reproducibility issues based on the data set/provider:
+
+* If the `version` tags are not date+times, it is unclear what the input to the
+  forecaster would have looked like at a particular date+time. For example, if
+  `version`s are `Date`s, it can be unclear which (if any) updates in version
+  `2023-01-01` were available at 8 a.m. Eastern Time on `2023-01-01`.
+* Depending on the data source, the `version` tags may not reflect the exact
+  time that the data actually was available to you. For example, an upstream
+  data provider might have some different meaning for its `version` column, such
+  as when it first became available to them, rather than when they made it
+  available to others. There are also technical details that can come into play:
+  sometimes, data is hosted on multiple database or web server "replicas" that
+  need some time to mirror newly-published data, so the effective publication
+  time can be somewhat fuzzy.
+* Even though we generally think of available version data as fixed in stone,
+  this isn't necessarily the case. For example, an upstream version data
+  provider may publish some initial form of version `2023-01-01` data early in
+  the morning of `2023-01-01`, but discover that it contains an error, and issue
+  a "hotfix" later in the day that overwrites the `2023-01-01` version data with
+  a correction (but still calling it the exact same version, `2023-01-01`); in
+  this situation, modeling based on the early-morning version wouldn't be
+  reproducible if backtesting at a later date.
+
+### Pseudoprospective backtesting with `epi_archive`s
+
+Pseudoprospective backtesting can be performed with `epix_as_of()` or
+`epix_slide()`. We'll demonstrate using `epix_slide()` to perform
+pseudoprospective forecasting on COVID-19 incident case rate data from JHU-CSSE
+for a few US states. This data stream had very low reporting latency, and in
+most instances defined incident cases in a way that made revisions theoretically
+impossible (the difference between cumulative reporting between consecutive
+reports), but nevertheless, there were some exceptions such as large revisions.
+
+FIXME discussion here.  CLI if used
+
+```{r}
+library(dplyr)
+library(tidyr)
+library(purrr)
+library(ggplot2)
+library(epidatr)
+library(epiprocess)
+library(epipredict)
+
+
+
+
+
+
+
+
+
+
+# For reproducibility, we'll work with data reporting through some fixed issue
+# we've already observed. For better reproducibility, that issue should be at
+# least two days ago (from today), to incorporate any "hotfixes" to that data
+# and allow up to a day for database replication delays:
+hhs_analysis_issue = as.Date("2024-02-01") - 2L
+# For speed & politness, we have enabled epidatr caching.
+cache_info()[c("max_size", "max_age")]
+# Fetch issue data using `epidatr`:
+hhs_issue_data =
+  pub_covidcast(
+    source = "hhs", signals = "confirmed_admissions_covid_1d",
+    geo_type = "state", time_type = "day",
+    # all issues through the analysis issue
+    issues = epirange(12340101, hhs_analysis_issue)
+  )
+# ^ Data license: Public Domain US Government. Accessed via COVIDcast Epidata API.
+
+# It looks like more regular reporting started around version 2021-09-21, with
+# 2021-09-21 being a little different as well. It's currently not easy to filter
+# diff-based issue data correctly based on the version, so let's just filter out
+# some this reporting based on time values that we expect would be included. We
+# might also consider one of the mandatory reporting start dates. We're working
+# on ways to filter both version and time values more easily from an archive
+# object directly.
+# hhs_start_time_value = as.Date("2021-09-21") # flu, approx
+hhs_start_time_value = as.Date("2020-12-01") # covid, approx, though start of 2021 might also have some reporting quirks
+hhs_archive = hhs_issue_data %>%
+  filter(time_value >= hhs_start_time_value) %>%
+  select(geo_value, time_value, version = issue, admissions = value) %>%
+  as_epi_archive(compactify = TRUE)
+
+
+flusurv_analysis_issue <- as.Date("2019-08-01") %>%
+  MMWRweek::MMWRweek() %>%
+  {.$MMWRyear * 100L + .$MMWRweek}
+
+flusurv_issue_data <-
+  pub_flusurv(
+    locations = "network_all",
+    issues = epirange(123401, flusurv_analysis_issue)
+  )
+
+flusurv_archive <- flusurv_issue_data %>%
+  select(geo_value = location,
+         time_value = epiweek,
+         ## version = release_date,
+         version = issue,
+         starts_with("rate_")) %>%
+  as_epi_archive(compactify = TRUE)
+
+flusurv_issue_data %>%
+  count(lag2 = release_date - issue)
+
+flusurv_issue_data %>%
+  count(release_wday = as.POSIXlt(release_date)$wday)
+
+as.POSIXlt(min(flusurv_issue_data$release_date))$wday
+
+# XXX surprises regarding release_date ranges (not actually turning off during off and early season); investigating:
+
+# any update:
+tibble(date = flusurv_archive$DT$version %>% unique() %>% sort(),
+       datediff = c(NA, diff(date))) %>%
+  print(n=10000L)
+
+flusurv_archive$DT[, .N, by = version] %>%
+  ggplot(aes(version, N)) %>%
+  `+`(geom_point())
+
+# adding for first time:
+unique(flusurv_archive$DT, by = c("geo_value", "time_value")) %>%
+  .$version %>% unique() %>% sort() %>%
+  {tibble(date = ., datediff = c(NA, diff(date)))} %>%
+  print(n=10000L)
+
+# okay, this now makes sense
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# Mondays
+## forecast_dates <- seq(as.Date("2020-11-02"), as.Date("2021-11-30"), by = "6 weeks")
+
+
+
+## archive <- hhs_archive
+archive <- flusurv_archive
+
+forecast_dates <- seq(min(archive$DT$version) + 120L, archive$versions_end,
+                      by = "6 weeks")
+
+horizons <- c(0, 7, 14, 21, 28) # relative to forecast_date
+## horizons <- 1 + c(0, 7, 14, 21, 28) # relative to forecast_date
+
+example_forecaster <- function(snapshot_edf, forecast_date) {
+  if (snapshot_edf %>%
+        distinct(time_value) %>%
+        filter(time_value >= forecast_date - 40L) %>%
+        nrow() < 3L) {
+    return (tibble::tibble())
+  }
+  shared_reporting_latency <- as.integer(forecast_date - max(snapshot_edf$time_value))
+  ## aheads <- horizons + shared_reporting_latency # relative to max time_value
+  # FIXME shifting technique instead
+  horizons %>%
+    map(function(horizon) {
+      snapshot_edf %>%
+        ## flatline_forecaster(
+        ##   "case_rate_7d_av",
+        ##   flatline_args_list(
+        ##     ahead = ahead,
+        ##     quantile_levels = c(0.1, 0.5, 0.9),
+        ##     forecast_date = forecast_date
+        ##   )) %>%
+        arx_forecaster(
+          ## outcome = "case_rate_7d_av", # via JHU-CSSE
+          ## predictors = "percent_cli", # from doctor-visits (claims)
+          ## predictors = c("case_rate_7d_av", "percent_cli"), # cli from doctor-visits (claims)
+          ## outcome = "admissions",
+          outcome = "rate_overall",
+          ## predictors = c("case_rate_7d_av"),
+          ## predictors = "admissions",
+          predictors = "rate_overall",
+          args_list = arx_args_list(
+            # FIXME this is incomplete; latency often varies signficantly by covariate and can't be ignored, so we also need lag adjustment.
+            ## ahead = shared_reporting_latency + horizon,
+            ahead = horizon,
+            quantile_levels = c(0.1, 0.5, 0.9)#,
+            ## forecast_date = forecast_date,
+            ## target_date = forecast_date + horizon
+          )) %>%
+        .$predictions %>%
+        mutate(forecast_date = forecast_date,
+               target_date = forecast_date + horizon)
+    }) %>%
+    bind_rows()
+  ## list()
+}
+
+pseudoprospective_forecasts <-
+  ## archive_cases_dv_subset %>%
+  archive %>%
+  epix_slide(
+    ref_time_values = forecast_dates,
+    before = 365000L, # 1000-year time window --> don't filter out any `time_value`s
+    ~ example_forecaster(.x, .ref_time_value),
+    names_sep = NULL
+    ## as_list_col = TRUE
+  ) %>%
+  select(-time_value)
+
+snapshots <-
+  ## archive_cases_dv_subset %>%
+  archive %>%
+  epix_slide(
+    ref_time_values = forecast_dates,
+    before = 365000L, # 1000-year time window --> don't filter out any `time_value`s
+    ~ .x,
+    as_list_col = TRUE
+  ) %>%
+  rename(forecast_date = time_value) %>%
+  unnest(slide_value)
+
+latest_edf <- archive %>% epix_as_of(.$versions_end)
+
+unfaithful_forecasts <- latest_edf %>%
+  # pretend we get observations about today, on today, with no revisions
+  mutate(version = time_value) %>%
+  as_epi_archive(versions_end = max(forecast_dates)) %>%
+  epix_slide(
+    # pretend version releases are on forecast dates
+    ref_time_values = forecast_dates,
+    before = 365000L, # 1000-year time window --> don't filter out any `time_value`s
+    ~ example_forecaster(.x, .ref_time_value),
+    names_sep = NULL
+  ) %>%
+  select(-time_value)
+
+## plot_geo_values <- c("mi", "id", "ca", "ga", "pa")
+## plot_geo_values <- c("ca")
+plot_geo_values <- c("network_all")
+
+# Plot them, on top of latest COVID-19 case rates
+plt <- 
+  bind_rows(list(
+    pseudoprospective = pseudoprospective_forecasts,
+    unfaithful = unfaithful_forecasts
+  ), .id = "backtester") %>%
+  filter(geo_value %in% plot_geo_values) %>%
+  pivot_quantiles_wider(.pred_distn) %>%
+  ggplot(aes(x = target_date)) +
+  ## facet_wrap(vars(geo_value)) +
+  facet_wrap(vars(geo_value), scales = "free_y") +
+  geom_ribbon(aes(group = interaction(backtester, forecast_date),
+                  fill = backtester,
+                  ymin = `0.1`, ymax = `0.9`),
+              alpha = 0.4) +
+  geom_line(aes(x = time_value, y = value,
+                ## colour = signal,
+                ## colour = forecast_date,
+                ## colour = interaction(signal, forecast_date),
+                colour = as.integer(forecast_date - time_value),
+                ## alpha = as.integer(forecast_date - time_value),
+                ## linetype = signal,
+                group = interaction(signal, forecast_date)),
+            data =
+              snapshots %>%
+              filter(geo_value %in%  plot_geo_values) %>%
+              ## pivot_longer(c(percent_cli, case_rate_7d_av),
+              ## pivot_longer(c(admissions),
+              pivot_longer(c(rate_overall),
+                           names_to = "signal") %>%
+              left_join(
+                latest_edf %>%
+                  ## drop_na(c(percent_cli, case_rate_7d_av)) %>%
+                  ## drop_na(c(admissions)) %>%
+                  drop_na(c(rate_overall)) %>%
+                  summarize(
+                    ## case_rate_7d_av = 1,
+                    ## across(c(percent_cli, case_rate_7d_av), ~ mean(case_rate_7d_av) / mean(.x))
+                    ## across(c(admissions), ~ mean(admissions) / mean(.x))
+                    across(c(rate_overall), ~ mean(rate_overall) / mean(.x))
+                  ) %>%
+                  ## pivot_longer(c(percent_cli, case_rate_7d_av),
+                  ## pivot_longer(c(admissions),
+                  pivot_longer(c(rate_overall),
+                               names_to = "signal",
+                               values_to = "scale_factor"),
+                by = "signal"
+              ) %>%
+              mutate(value = value * scale_factor) %>%
+              ## mutate(forecast_date = as.factor(forecast_date)) %>%
+              {.}
+            ) +
+  ## geom_line(aes(y = case_rate_7d_av),
+  ## geom_line(aes(y = admissions),
+  geom_line(aes(y = rate_overall),
+            data = latest_edf %>%
+              filter(geo_value %in% plot_geo_values) %>%
+              rename(target_date = time_value)) +
+  geom_vline(aes(xintercept = forecast_date),
+             data = function(df) distinct(df, forecast_date),
+             linetype = 2, alpha = 0.5) +
+  ## scale_colour_brewer(palette = "Set2") +
+  ## scale_fill_brewer(palette = "Spectral") +
+  ## scale_fill_brewer(palette = "Set3") +
+  ## scale_colour_viridis_d(direction = -1) +
+  ## scale_colour_viridis_c(option = "H", direction = -1) +
+  scale_colour_continuous(type = "viridis", ) +
+  scale_x_date(minor_breaks = "month", date_labels = "%b %y") +
+  labs(x = "Date", y = "Reported COVID-19 case rates") # +
+## theme(legend.position = "none")
+## +
+##   geom_line(data = x_latest, aes(x = time_value, y = case_rate_7d_av), 
+##                inherit.aes = FALSE, color = "gray50") +
+##   geom_line(aes(y = fc_point)) + geom_point(aes(y = fc_point), size = 0.5) + 
+##   geom_vline(aes(xintercept = time_value), linetype = 2, alpha = 0.5) +
+##   facet_grid(vars(geo_value), vars(as_of), scales = "free") +
+##   scale_x_date(minor_breaks = "month", date_labels = "%b %y") +
+##   labs(x = "Date", y = "Reported COVID-19 case rates") + 
+##   theme(legend.position = "none") 
+plotly::ggplotly(plt)
+
+## archive_cases_dv_subset %>%
+##   epix_slide(ref_time_values = .$DT$version %>% unique() %>% `[`(as.POSIXlt(.)$wday == 1L),
+##              before = 14L, ~ slice_max(.x, time_value, by = geo_value), as_list_col = TRUE) %>%
+##   rename(version = time_value) %>%
+##   unnest(slide_value) %>%
+##   left_join(
+##     archive_cases_dv_subset %>% epix_as_of(.$versions_end),
+##     by = c("geo_value", "time_value")
+##   ) %>%
+##   mutate(reldiff = abs(case_rate_7d_av.x - case_rate_7d_av.y)/(10 + abs(case_rate_7d_av.y))) %>%
+##   mutate(version_lag = as.integer(version - time_value)) %>%
+##   ## filter(version_lag %in% c(7L, 14L)) %>%
+##   ## slice_max(reldiff, n = 10L, by = version_lag) %>%
+##   slice_max(reldiff, n = 10L) %>%
+##   select(version_lag, geo_value, version, case_rate_7d_av.x, case_rate_7d_av.y, reldiff) %>%
+##   print(n = 100L)
+
+```
+
+## Attribution
+This document contains data that is a modified part of the [COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University](https://github.com/CSSEGISandData/COVID-19) as [republished in the COVIDcast Epidata API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html). This data set is licensed under the terms of the [Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0/) by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.
+
+The `archive_cases_dv_subset` object also contains `percent_cli` data is a modified part of the [COVIDcast Epidata API Doctor Visits data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html). This dataset is licensed under the terms of the [Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0/). Copyright Delphi Research Group at Carnegie Mellon University 2020.
diff --git a/vignettes/forecasting-intro.Rmd b/vignettes/forecasting-intro.Rmd
new file mode 100644
index 00000000..ea5c3e2a
--- /dev/null
+++ b/vignettes/forecasting-intro.Rmd
@@ -0,0 +1,208 @@
+```{r}
+# if (Sys.getenv("GITHUB_PAT") == "") {
+#   warning("If you have trouble with `renv` restoration, please set GITHUB_PAT in a `.Renviron` file (see `help(.Rprofile)` for the format and possible locations) and make sure it's .gitignore'd / not shared.")
+# }
+# # Initial setup to get `renv::restore()` to fetch the desired versions for
+# # this demo; doesn't need to be re-run as it's recorded in the `.Rprofile` and
+# # `renv.lock` files and `renv` directory. Instead, to initialize or
+# # synchronize these dependencies with the lockfile, just `renv::restore()`.
+# # Between these updates, the `.Rprofile` will take care of actually using the
+# # `renv` to source the dependencies.
+#
+# install.packages("renv")
+# renv::init()
+# renv::install("tidyverse")
+# renv::install("MMWRweek")
+# renv::install("reconverse/outbreaks")
+# renv::install("cmu-delphi/epidatr")
+# renv::install("cmu-delphi/epiprocess@953717fd2ad6317d130a8cabeaf3d6040aebb59a")
+# renv::install("cmu-delphi/epipredict")
+# renv::snapshot() # for this to record dependencies, we may need to actually
+#                  # use them below or put them in a DESCRIPTION file
+```
+
+Fetch data:
+```{r}
+library(tidyverse)
+library(magrittr)
+library(MMWRweek)
+library(epidatr)
+library(epiprocess)
+library(epipredict)
+
+# flusurv_analysis_issue = MMWRweek(as.Date("2023-04-05") - 7L) %>%
+#   {.$MMWRyear*100L + .$MMWRweek}
+# flusurv_issue_data_path = "flusurv_issue_data.rds"
+# if (!file.exists(flusurv_issue_data_path)) {
+#   flusurv_issue_data =
+#     flusurv("network_all",
+#             epiweeks=epirange(123401, 345601),
+#             issues=epirange(123401, flusurv_analysis_issue)) %>%
+#     fetch_tbl()
+#   saveRDS(flusurv_issue_data, flusurv_issue_data_path)
+#   writeLines("......", "LICENSE_flusurv_data.txt")
+# } else {
+#   flusurv_issue_data = readRDS(flusurv_issue_data)
+# }
+
+ga_hhs_analysis_issue = as.Date("2023-04-05") - 2L
+ga_hhs_issue_data_path = "ga_hhs_issue_data.rds"
+ga_hhs_issue_data =
+  pub_covidcast(
+    "hhs", "confirmed_admissions_influenza_1d",
+    geo_type = "state", time_type = "day",
+    # TODO explain * in comments
+    geo_values = "ga", time_values = "*",
+    issues = epirange(12340101, ga_hhs_analysis_issue)
+  )
+```
+
+Convert to `epi_archive` format:
+```{r}
+ga_hhs_archive = ga_hhs_issue_data %>%
+  select(geo_value, time_value, version = issue, admissions = value) %>%
+  as_epi_archive(compactify = TRUE)
+```
+
+Experiment with AR forecaster on latest data (as of the analysis issue):
+```{r}
+ga_hhs_latest_raw = ga_hhs_archive %>%
+  epix_as_of(.$versions_end)
+ga_hhs_latest_raw
+```
+(Note that this is in the `epi_df` format.)
+
+We forgot about the period in 2020 with a lot of NAs and a few low values (in a
+few states) in the HHS Protect data. We could adjust `time_values` above to
+something like `epirange(20201201, 34560101)` and regenerate the RDS to avoid.
+For now, let's just try filtering them out from the data objects as needed. Some
+NA filtering is automatic in epipredict; this is just an extra precaution that
+some atypical low values don't somehow make their way into the training set.
+
+```{r}
+hhs_start_time_value = as.Date("2020-12-01")
+
+ga_hhs_latest_processed = ga_hhs_latest_raw %>%
+  filter(time_value >= hhs_start_time_value)
+
+ga_hhs_latest_processed %>%
+  ggplot(aes(time_value, admissions)) %>%
+  `+`(geom_line())
+```
+
+Using a canned forecaster:
+```{r}
+fc_info = arx_forecaster(
+  ga_hhs_latest_processed,
+  outcome = "admissions",
+  predictors = c("admissions"),
+  args_list = arx_args_list(
+    ahead = 14L,
+    forecast_date = attr(edf, "metadata")[["as_of"]]
+    # Use defaults for everything else.
+  )
+)
+fc_info
+```
+TODO consider vignette for covid forecasts daily, another parallel for flu weekly, details on deadlines, etc.
+
+Extracting the forecast:
+```{r}
+fc_info %>%
+  extract2("predictions") %>%
+  mutate(.pred_distn = nested_quantiles(.pred_distn)) %>%
+  unnest(.pred_distn)
+```
+TODO spell out arx_forecaster model a little more
+
+Pseudo-prospective version-unaware forecasting:
+```{r}
+forecast_dates = ga_hhs_analysis_issue - c(14L, 7L, 0L)
+ga_hhs_archive %>%
+  epix_slide(
+    ref_time_values = forecast_dates,
+    before = 365000, # train on all past data
+    as_list_col = TRUE,
+    function(edf, gk, v) {
+      edf %>%
+        filter(time_value >= hhs_start_time_value) %>%
+        arx_forecaster(
+          outcome = "admissions",
+          predictors = c("admissions"),
+          args_list = arx_args_list(
+            ahead = 14L,
+            forecast_date = attr(edf, "metadata")[["as_of"]]
+            # Use defaults for everything else.
+          )
+        ) %>%
+        extract2("predictions") %>%
+        mutate(.pred_distn = nested_quantiles(.pred_distn)) %>%
+        unnest(.pred_distn)
+    }
+  ) %>%
+  select(-time_value) %>%
+  unnest(slide_value)
+```
+XXX arx_forecaster default to the 23 taus?
+
+Version-aware forecasting:
+```{r}
+forecast_date = ga_hhs_analysis_issue
+
+forecast_date_archive = ga_hhs_archive %>% epix_as_of(forecast_date, all_versions = TRUE)
+forecast_date_edf = forecast_date_archive %>% epix_as_of(.$versions_end)
+forecast_date_latency = as.integer(forecast_date - max(forecast_date_edf$time_value))
+
+training_and_testing_issues_start = max(min(forecast_date_archive$DT$version),
+                                        hhs_start_time_value + 14L)
+training_and_testing_issues = rev(seq(forecast_date, training_and_testing_issues_start, -7))
+
+version_aware_covariates =
+  ga_hhs_archive %>%
+  epix_slide(
+    ref_time_values = training_and_testing_issues,
+    before = forecast_date_latency + 14L, # include just enough to get the lags we want
+    names_sep = NULL,
+    function(edf, gk, ref_time_value) {
+      edf %>%
+        filter(time_value >= hhs_start_time_value) %>%
+        complete(geo_value, time_value = full_seq(c(time_value, ref_time_value), period=1L)) %>%
+        group_by(geo_value) %>%
+        mutate(x1 = lag(admissions, forecast_date_latency +  0L),
+               x2 = lag(admissions, forecast_date_latency +  7L),
+               x3 = lag(admissions, forecast_date_latency + 14L)) %>%
+        ungroup() %>%
+        filter(time_value == ref_time_value) %>%
+        select(geo_value, x1, x2, x3)
+    }
+  )
+
+fc = full_join(version_aware_covariates, forecast_date_edf, by = c("geo_value", "time_value")) %>%
+  as_epi_df(as_of = forecast_date) %>%
+  na.omit() %>%
+  arx_forecaster(
+    outcome = "admissions",
+    predictors = c("x1","x2","x3"),
+    args_list = arx_args_list(
+      lags = 0L,
+      ahead = 14L,
+      forecast_date = forecast_date,
+      # Use defaults for everything else.
+      )
+  )
+
+fc$epi_workflow %>% extract_fit_engine() # need to re-impl no-intercept model to get clear picture
+fc %>% autoplot()
+```
+
+
+
+
+
+
+
+# other notes
+https://cmu-delphi.github.io/epipredict/articles/sliding.html
+- consider warnings with merge locf.  especially if more meaningful amounts like 5 days
+- explain [[1]] / explain `y` object.  port over the better names from derived vignette
+- k_week_ahead -> k_day_ahead
diff --git a/vignettes/latency_revision_stats.Rmd b/vignettes/latency_revision_stats.Rmd
new file mode 100644
index 00000000..f40bd660
--- /dev/null
+++ b/vignettes/latency_revision_stats.Rmd
@@ -0,0 +1,403 @@
+---
+title: Calculating latency and revision statistics
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Calculating latency and revision statistics}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+# FIXME the `dup` latency definitions are buggy
+
+# Load data
+
+```{r}
+library(epidatr)
+library(epiprocess)
+library(dplyr)
+library(tidyr)
+library(ggplot2)
+library(plotly)
+
+# For reproducibility, we'll work with data reporting through some fixed issue
+# we've already observed. For better reproducibility, that issue should be at
+# least two days ago (from today), to incorporate any "hotfixes" to that data
+# and allow up to a day for database replication delays:
+hhs_influenza_hosp_analysis_issue = as.Date("2023-08-11") - 2L
+
+# We'll query version data for this data set from the Delphi Epidata API using
+# epidatr, caching the result so that we can re-run this chunk repeatedly a bit
+# faster, while offline, and without placing undue load on the API servers.
+hhs_influenza_hosp_issue_data_path = "hhs_influenza_hosp_issue_data.rds"
+if (!file.exists(hhs_influenza_hosp_issue_data_path)) {
+  cat("Fetching.\n")
+  hhs_influenza_hosp_issue_data =
+    pub_covidcast(
+      source = "hhs", signals = "confirmed_admissions_influenza_1d",
+      geo_type = "state", time_type = "day",
+      geo_values = "*",
+      # all `time_value`s:
+      time_values = "*",
+      # all issues through the analysis issue
+      issues = epirange(12340101, hhs_influenza_hosp_analysis_issue)
+    )
+  saveRDS(hhs_influenza_hosp_issue_data, hhs_influenza_hosp_issue_data_path)
+  writeLines(c("License: Public Domain US Government",
+               "Accessed via COVIDcast Epidata API"),
+             "LICENSE_hhs_influenza_hosp_issue_data.txt")
+} else {
+  cat("Using cache.\n")
+  hhs_influenza_hosp_issue_data = readRDS(hhs_influenza_hosp_issue_data_path)
+}
+
+# Use the compactification feature to see when values changed in the data set,
+# not just re-reporting of the same values. (The Delphi Epidata API already
+# compactifies much of the issue data, but not all of it.)
+hhs_influenza_hosp_issue_data %>%
+  select(geo_value, time_value, version = issue, admissions = value) %>%
+  as_epi_archive(compactify = TRUE) %>%
+  .$DT %>%
+  `[`(, .N, by = version) %>%
+  print(topn = 20L)
+
+# It looks like more regular reporting started around version 2021-09-21, with
+# 2021-09-21 being a little different as well. It's currently not easy to filter
+# diff-based issue data correctly based on the version, so let's just filter out
+# some this reporting based on time values that we expect would be included. We
+# might also consider one of the mandatory reporting start dates. We're working
+# on ways to filter both version and time values more easily from an archive
+# object directly.
+hhs_influenza_hosp_start_time_value = as.Date("2021-09-21")
+hhs_influenza_hosp_archive = hhs_influenza_hosp_issue_data %>%
+  filter(time_value >= hhs_influenza_hosp_start_time_value) %>%
+  select(geo_value, time_value, version = issue, admissions = value) %>%
+  mutate(geo_value = factor(geo_value, epidatasets::state_census %>% arrange(pop) %>% pull(abbr))) %>%
+  as_epi_archive(compactify = TRUE)
+```
+
+# Latency statistics
+
+There are several notions of latency, some of which include:
+
+- "Nominal latency": the difference between the max `time_value` available for
+  some location, as of some version
+- "Nonmissing latency": also considers explicitly-recorded `NA`s and any skipped
+  `time_value`s as latent
+- "Nonmissing nonzero latency": considers explicitly-recorded `NA`s, skipped
+  `time_value`s, and zeros as latent
+- "Nonmissing nonduplicate latency": considers `NA`s, skipped `time_value`s, and
+  duplications of the most recent non-`NA` value to be latent
+- "Nonmissing nonzero nonduplicate latency": considers `NA`s, skipped
+  `time_value`s, zeros, and duplications of the most recent non-`NA`, nonzero
+  value to be latent
+
+These definitions are nonexhaustive, and knowledge about underlying reporting
+mechanisms in a data sources are can help inform what an appropriate definition
+of latency is. In epidemiological surveillance data sets, there will always be
+some degree of nominal latency. In the current data set, some repeated values
+also reflect data latency, and there may be other types of latency as well.
+We'll plot all the above definitions of latency to investigate.
+
+We'll calculate all of the above definitions of latency using `epix_slide()`:
+```{r latency-stats, cache=TRUE}
+n_true_at_tail = function(x) {
+  match(FALSE, rev(x), nomatch = length(x) + 1L) - 1L
+}
+
+last_elt_or_na = function(x) {
+  if (length(x) != 0L) {
+    x[[length(x)]]
+  } else {
+    vctrs::vec_cast(NA, x)
+  }
+}
+
+cumsum_from_tail = function(x) {
+  rev(cumsum(rev(x)))
+}
+
+latency_stats = function(window_data, group_key, ref_time_value) {
+  if (nrow(window_data) == 0L) {
+    # assume that this is from an early version that references a time range we
+    # excluded
+    return (tibble::tibble())
+  }
+  window_data %>%
+    group_by(geo_value) %>%
+    complete(time_value = full_seq(time_value, period = 1L)) %>%
+    ungroup() %>%
+    mutate(
+      is_na = is.na(admissions),
+      is_na0 = is_na | admissions == 0L
+    ) %>%
+    group_by(geo_value) %>%
+    mutate(
+      is_na_same = is_na |
+        admissions == last_elt_or_na(admissions[!is_na]),
+      is_na0same = is_na0 |
+        admissions == last_elt_or_na(admissions[!is_na0])
+    ) %>%
+    summarize(
+      nominal_latency = as.integer(ref_time_value - max(time_value)),
+      na_latency = n_true_at_tail(is_na) + nominal_latency,
+      na0_latency = n_true_at_tail(is_na0) + nominal_latency,
+      na_dup_latency =
+        # (don't count NAs before the first copy of the last non-na)
+        sum(cumsum(!is_na[cumsum_from_tail(!is_na_same) == 0L]) != 0L) - 1L + nominal_latency,
+      na0dup_latency =
+        # (don't count NAs and 0s before the first copy of the last non-na0)
+        sum(cumsum(!is_na0[cumsum_from_tail(!is_na0same) == 0L]) != 0L) - 1L + nominal_latency
+    )
+}
+
+hhs_influenza_hosp_latency_by_version_geo =
+  hhs_influenza_hosp_archive %>%
+  epix_slide(
+    latency_stats,
+    before = 27L, # don't calculate latency more than this far back
+    names_sep = NULL # don't add "slide_value_" colname prefixes
+  ) %>%
+  rename(version = time_value)
+```
+
+## Finer-grained visualizations of different notions of latency
+
+Nominal latency, where it is over 2 days:
+```{r, out.width = "100%", out.height="60em"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  mutate(nominal_latency = case_when(
+    nominal_latency <= 2 ~ NA,
+    TRUE ~ nominal_latency
+  )) %>%
+  ggplot(aes(version, geo_value, fill = nominal_latency)) +
+  geom_raster() +
+  scale_fill_viridis_c(na.value = "transparent") +
+  theme_bw() +
+  scale_x_date(date_labels = "%b %Y") +
+  xlab("Version") +
+  ylab("HHS Influenza Hospitalization Nominal Latency")
+ggplotly(plt)
+```
+
+Nonmissing latency, where it is over 2 days:
+```{r, out.width = "60em", out.height="60em"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  mutate(na_latency = case_when(
+    na_latency <= 2 ~ NA,
+    TRUE ~ na_latency
+  )) %>%
+  ggplot(aes(version, geo_value, fill = na_latency)) +
+  geom_raster() +
+  scale_fill_viridis_c(na.value = "transparent") +
+  theme_bw() +
+  scale_x_date(date_labels = "%b %Y") +
+  xlab("Version") +
+  ylab("HHS Influenza Hospitalization Nonmissing Latency")
+ggplotly(plt)
+```
+
+Nonmissing nonzero latency, where it is over 2 days:
+```{r, out.width = "60em", out.height="60em"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  mutate(na0_latency = case_when(
+    na0_latency <= 2 ~ NA,
+    TRUE ~ na0_latency
+  )) %>%
+  ggplot(aes(version, geo_value, fill = na0_latency)) +
+  geom_raster() +
+  scale_fill_viridis_c(na.value = "transparent") +
+  theme_bw() +
+  scale_x_date(date_labels = "%b %Y") +
+  xlab("Version") +
+  ylab("HHS Influenza Hospitalization Nonmissing Nonzero Latency")
+ggplotly(plt)
+```
+
+
+Nonmissing nonduplicate latency, where it is over **3** days:
+```{r, out.width = "60em", out.height="60em"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  mutate(na_dup_latency = case_when(
+    na_dup_latency <= 3 ~ NA,
+    TRUE ~ na_dup_latency
+  )) %>%
+  ggplot(aes(version, geo_value, fill = na_dup_latency)) +
+  geom_raster() +
+  scale_fill_viridis_c(na.value = "transparent") +
+  theme_bw() +
+  scale_x_date(date_labels = "%b %Y") +
+  xlab("Version") +
+  ylab("HHS Influenza Hospitalization Nonmissing Nonduplicate Latency")
+ggplotly(plt)
+```
+
+Nonmissing nonzero nonduplicate latency, where it is over **3** days:
+```{r, out.width = "60em", out.height="60em"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  mutate(na0dup_latency = case_when(
+    na0dup_latency <= 3 ~ NA,
+    TRUE ~ na0dup_latency
+  )) %>%
+  ggplot(aes(version, geo_value, fill = na0dup_latency)) +
+  geom_raster() +
+  scale_fill_viridis_c(na.value = "transparent") +
+  theme_bw() +
+  scale_x_date(date_labels = "%b %Y") +
+  xlab("Version") +
+  ylab("HHS Influenza Hospitalization Nonmissing Nonzero Nonduplicate Latency")
+ggplotly(plt)
+```
+
+## Plotting individual time series snapshots of interest
+
+There's a time span of high latency according to some definitions, e.g., for DE as of 2023-05-11; let's see what the time series looked like as of that version:
+```{r, out.width = "100%"}
+plt =
+  hhs_influenza_hosp_archive %>%
+  epix_as_of(as.Date("2023-05-11"),
+             # plot just a year of data:
+             min_time_value = as.Date("2022-05-12")) %>%
+  filter(geo_value == "de") %>%
+  complete(time_value = full_seq(time_value, period = 1L)) %>%
+  ggplot(aes(time_value, admissions)) +
+  geom_point()
+ggplotly(plt)
+```
+We can see at later time values that the number of admissions alternates between 0 most days and 1 other days.  In some definitions of latency, we consider all but the first 1 in these sections to be "latent".
+
+There's also some interesting patterns where latency appears to decrease in AS, then suddenly jump up; let's visualize some snapshots here:
+```{r}
+plt =
+  hhs_influenza_hosp_archive %>%
+  epix_slide(
+    ~ .x %>% filter(geo_value == "as") %>% rename(atv = time_value),
+    before = 27L, # don't calculate latency more than this far back
+    names_sep = NULL # don't add "slide_value_" colname prefixes
+  ) %>%
+  rename(version = time_value, time_value = atv) %>%
+  filter(version %in% as.Date(c("2023-03-20", "2023-04-02", "2023-04-03"))) %>%
+  ggplot(aes(time_value, admissions, colour = version)) +
+  geom_line()
+ggplotly(plt)
+```
+
+Let's check some counts instead:
+```{r}
+hhs_influenza_hosp_archive %>%
+  epix_slide(
+    ~ .x %>%
+      filter(geo_value == "as") %>%
+      complete(time_value = full_seq(time_value, period = 1L)) %>%
+      rename(atv = time_value),
+    before = 27L, # don't calculate latency more than this far back
+    names_sep = NULL # don't add "slide_value_" colname prefixes
+  ) %>%
+  rename(version = time_value, time_value = atv) %>%
+  filter(version %in% as.Date(c("2023-03-20", "2023-04-02", "2023-04-03"))) %>%
+  count(version)
+```
+
+## Some summary statistics related to the nonmissing latency:
+
+```{r, out.width = "100%"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  ggplot(aes(na_latency)) +
+  geom_histogram(binwidth = 1L)
+ggplotly(plt)
+```
+
+Over time:
+```{r, out.width = "100%"}
+plt =
+  hhs_influenza_hosp_latency_by_version_geo %>%
+  group_by(version) %>%
+  summarize(mean_na_latency = mean(na_latency)) %>%
+  ggplot(aes(version, mean_na_latency)) +
+  geom_line() +
+  expand_limits(y=0) +
+  xlab("Version") +
+  ylab("Mean Nonmissing Latency Across Locations")
+ggplotly(plt)
+```
+
+# Revision statistics
+
+How accurate is `k`-day-old data on average?
+```{r, fig.width = 7, fig.height = 5}
+max_analysis_lag = 30L
+
+hhs_influenza_hosp_version_lag_data =
+  hhs_influenza_hosp_archive %>%
+  epix_slide(
+    ~ .x,
+    before = max_analysis_lag,
+    as_list_col = TRUE
+  ) %>%
+  rename(version = time_value) %>%
+  unnest(slide_value) %>%
+  mutate(version_lag = as.integer(version - time_value))
+
+hhs_influenza_hosp_latest_snapshot =
+  hhs_influenza_hosp_archive %>%
+  epix_as_of(.$versions_end)
+
+plt =
+  left_join(hhs_influenza_hosp_version_lag_data,
+          hhs_influenza_hosp_latest_snapshot %>%
+            filter(time_value <= hhs_influenza_hosp_analysis_issue - max_analysis_lag * 2L),
+          by = c("geo_value", "time_value"),
+          suffix = c("_lagged", "_latest")) %>%
+  tidyr::drop_na(c(admissions_lagged, admissions_latest)) %>%
+  group_by(version_lag) %>%
+  summarize(MAPE1 = 100*mean(abs((admissions_latest - admissions_lagged)/(admissions_latest + 1)))) %>%
+  ggplot(aes(version_lag, MAPE1)) +
+  geom_line()
+ggplotly(plt)
+
+left_join(hhs_influenza_hosp_version_lag_data,
+          hhs_influenza_hosp_latest_snapshot %>%
+            filter(time_value <= hhs_influenza_hosp_analysis_issue - max_analysis_lag * 2L),
+          by = c("geo_value", "time_value"),
+          suffix = c("_lagged", "_latest")) %>%
+  tidyr::drop_na(c(admissions_lagged, admissions_latest)) %>%
+  group_by(version_lag) %>%
+  summarize(MdAPE1 = 100*median(abs((admissions_latest - admissions_lagged)/(admissions_latest + 1)))) %>%
+  ggplot(aes(version_lag, MdAPE1)) +
+  geom_line()
+
+left_join(hhs_influenza_hosp_version_lag_data,
+          hhs_influenza_hosp_latest_snapshot %>%
+            filter(time_value <= hhs_influenza_hosp_analysis_issue - max_analysis_lag * 2L),
+          by = c("geo_value", "time_value"),
+          suffix = c("_lagged", "_latest")) %>%
+  tidyr::drop_na(c(admissions_lagged, admissions_latest)) %>%
+  group_by(version_lag) %>%
+  summarize(Q3APE1 = 100*quantile(abs((admissions_latest - admissions_lagged)/(admissions_latest + 1)), 0.75)) %>%
+  ggplot(aes(version_lag, Q3APE1)) +
+  geom_line()
+
+left_join(hhs_influenza_hosp_version_lag_data,
+          hhs_influenza_hosp_latest_snapshot %>%
+            filter(time_value <= hhs_influenza_hosp_analysis_issue - max_analysis_lag * 2L),
+          by = c("geo_value", "time_value"),
+          suffix = c("_lagged", "_latest")) %>%
+  tidyr::drop_na(c(admissions_lagged, admissions_latest)) %>%
+  group_by(version_lag) %>%
+  reframe(
+    tau = (1:9)/10,
+    QtauAPE1 = 100*quantile(abs((admissions_latest - admissions_lagged)/(admissions_latest + 1)), tau)
+  ) %>%
+  mutate(tau = as.factor(format(tau, digits = 3))) %>%
+  ggplot(aes(version_lag, QtauAPE1, colour = tau, group = tau, label = tau)) +
+  geom_line() +
+  scale_colour_discrete(
+    guide = guide_legend(reverse = TRUE,
+                         override.aes = list(label = ""))
+  )
+```
diff --git a/vignettes/old_dogfooding_debug001.Rmd b/vignettes/old_dogfooding_debug001.Rmd
new file mode 100644
index 00000000..dc53a3a7
--- /dev/null
+++ b/vignettes/old_dogfooding_debug001.Rmd
@@ -0,0 +1,299 @@
+---
+title: Dogfooding `epiprocess` with some data debugging
+date: 2022-10-19
+output: html_document
+---
+
+TODO: compare how things look with slides on `dev` / in PRs
+
+This is an Rmd-ified version/retrospective of some data debugging for the influenza forecasts that went out yesterday.
+
+NOTE: Previously, I saw the primary use cases of `epi_archive` as performing
+      pseudoprospective forecasts and using version data to get more appropriate
+      training sets, but this might also rank up there.
+
+- TODO: evaluate how easy it is to do the former (vs. using `evalcast`). Consider
+  where the `weighted_interval_score` functions should live. Consider ways to
+  simplify querying&merging multiple data sources.
+- Nat is working on tools for the latter.
+
+```{r}
+library(epidatr)
+library(epiprocess) # installed a local version with `ref_time_values` filter removed; issue #153
+library(data.table)
+library(dplyr)
+library(ggplot2)
+```
+
+Fetch hhs data:
+```{r hhs_influenza_tbl, cache=TRUE}
+hhs_influenza_tbl = covidcast(
+  data_source = "hhs",
+  signals = "confirmed_admissions_influenza_1d",
+  time_type = "day",
+  geo_type = "state",
+  time_values = epirange(12340101, 34560101),
+  geo_values = "*",
+  issues = epirange(12340101, 34560101)
+) %>% fetch_tbl()
+
+hhs_influenza_archive = hhs_influenza_tbl %>%
+  rename(version = issue) %>%
+  as_epi_archive()
+```
+Notes:
+
+- `epirange(12340101, 34560101)` pattern: this is a range to cover all dates.
+
+  - MINOR:  Would look better as "*".  I think this is already an issue somewhere.
+  - MINOR: For reproducibility, 34560101 should probably be the date of the analysis or
+    the date before it. A couple awkward things here:
+
+    - `epidatr` / the API let me query version data it didn't have, and
+      repeating this query a week from now will include additional data. This is
+      definitely a bug for `as_of` queries, but for these `issues` queries, it
+      might be debatable.
+
+    - If using the date of the analysis, it's including version data which could
+      be overwritten later in the day, and the analysis might not match later
+      reproductions.
+
+      - Currently we have a (noisy, poorly worded) warning when using the
+        `epi_archive` data for those dates. But the warning wouldn't actually
+        apply to later reproductions when the version data in question has been
+        finalized (barring exceptional circumstances).
+      - We plan to change settings to remove this warning by default, and make
+        it up to the user to change internal archive settings to give a warning
+        if they want it.
+
+- Why a separate `tbl` object?  This is awkward and memory-inefficient.
+
+  - MEDIUM: there's no `filter` or comparable function in `epi_archive`. It
+    feels awkward to have to clone the archive, and overwrite / copy-and-mutate
+    the data table inside (or to extract the data table, filter, and convert
+    back to an `epi_archive`).
+  - MEDIUM: R6 annoyances during development, and for users that save archives
+    and update epiprocess: archive objects include their R6 code; if I load a
+    new version of epiprocess it only affects the S3 wrappers, not the R6
+    methods underneath. This leads to reloading the package appearing to have no
+    affect, and could lead to errors from incompatibilities between updated
+    wrappers and non-updated R6 methods.
+
+```{r}
+ref_time_values = hhs_influenza_archive$DT[as.POSIXlt(version)$wday==1L,unique(version)]
+```
+
+- MAJOR: I ran this on a Monday. I wanted the `ref_time_values` to include that
+  Monday. But `epix_slide` throws it out. I ran on a patched version of
+  epiprocess.
+
+```{r}
+slide_result =
+  hhs_influenza_archive %>%
+  epix_slide(function(edf, grp) {
+    edf %>%
+      mutate(version_lag = attr(., "metadata")$as_of - .data$time_value) %>%
+      group_by(version_lag) %>%
+      summarize(
+        n_zero = sum(!is.na(.data$value) & .data$value == 0L),
+        n_na = sum(is.na(.data$value)),
+        n_nonzero_nonna = n() - .data$n_zero - .data$n_na,
+        n = n(),
+        .groups = "drop"
+      ) %>%
+      {tibble(stats = list(.))}
+  }, n = 60L, group_by=c(), ref_time_values = ref_time_values)
+stats_on_specific_values =
+  slide_result %>%
+  group_by(time_value) %>%
+  slice_head(n = 1L) %>%
+  ungroup() %>%
+  unnest(slide_value_stats) %>%
+  group_by(version_lag) %>%
+  summarize(across(c(n_zero, n_na, n_nonzero_nonna, n), sum),
+            .groups = "drop") %>%
+  mutate(p_zero = .data$n_zero / .data$n,
+         p_na = .data$n_na / .data$n,
+         p_nonzero_nonna = .data$n_nonzero_nonna / .data$n)
+print(stats_on_specific_values, n=100L)
+```
+
+- MAJOR: I struggled to get the computation result in an accepted format. I
+  don't have a record of what all I tried, but some of it was failing to have
+  success with a list-type result. Some was a stupid mistake forgetting to
+  remove the `<column_name> = ` when moving from failing to use that form to
+  providing an `f` function/formula. Some was due to the row count requirement
+  on the computation results, discussed next.
+- MAJOR: computation result output requirements and slide output don't make
+  sense when we don't group by all non-version, non-time key columns. We require
+  the computation output to have either just one row, or one row per distinct
+  combination of the non-grouping, non-time, non-version key columns; however,
+  in the former case, we repeat that single row the latter number of times, but
+  don't attach those non-grouping key columns to the result.
+
+  - In this case, we didn't want that repeating behavior, thus the grouped slice
+    operation.
+  - In fact, we wanted a different number of rows than the expectation, thus the
+    unnesting.
+- MINOR: slide computation wasn't instantaneous. Thus I stored in a
+  `slide_result` object while I tried to figure out the rest of the operations
+  that would be required. (Plus retrying computing the slide result while trying
+  to figure out how to make things work.)
+- MINOR: missing an archive filter function here too, to speed up
+  experimentation / help debugging.
+
+
+```{r}
+hhs_influenza_archive %>%
+  epix_as_of(as.Date("2022-10-17")) %>%
+  filter(.data$geo_value == "nv") %>%
+  select(time_value, value) %>%
+  print(n=2000L)
+```
+- MINOR: missing an archive filter function here too.
+
+```{r}
+stats_on_deltas =
+  hhs_influenza_archive %>%
+  epix_slide(function(edf, grp) {
+    edf %>%
+      arrange(geo_value, time_value) %>%
+      mutate(no_change =
+               lag(.data$time_value) == .data$time_value - 1L &
+               lag(.data$value) == .data$value) %>%
+      group_by(version_lag = as.integer(attr(., "metadata")$as_of - .data$time_value)) %>%
+      summarize(
+        n_no_change = sum(!is.na(.data$no_change) & .data$no_change),
+        n_na_change = sum(is.na(.data$no_change)),
+        n = n(),
+        .groups = "drop"
+      ) %>%
+      {tibble(stats = list(.))}
+  }, n = 60L, group_by=c(), ref_time_values = ref_time_values) %>%
+  group_by(time_value) %>%
+  slice_head(n = 1L) %>%
+  ungroup() %>%
+  unnest(slide_value_stats) %>%
+  group_by(version_lag) %>%
+  summarize(across(starts_with("n"), sum),
+            .groups = "drop") %>%
+  mutate(across(starts_with("n_"), list(prop=~.x/n)))
+print(stats_on_deltas, n=100L)
+```
+
+```{r}
+hhs_influenza_tbl %>%
+  filter(geo_value == "nv") %>%
+  rename(version = issue) %>%
+  as_epi_archive() %>%
+  epix_slide(~ tibble(data=list(.x)), n = 100L,
+             # ref_time_values = seq(as.Date("2022-10-01"), as.Date("2022-10-17"), by="day")) %>%
+             ref_time_values = seq(as.Date("2021-01-01"), as.Date("2022-10-17"), by="day")) %>%
+  transmute(version=time_value, slide_value_data) %>%
+  unnest(slide_value_data) %>%
+  mutate(version_lag = as.integer(version - time_value)) %>%
+  # ggplot(aes(time_value, value, group=version, colour=version)) %>%
+  # `+`(geom_line()) %>%
+  # `+`(scale_colour_continuous(type = "viridis", trans = "date"))
+  # ggplot(aes(time_value, value)) %>%
+  # `+`(geom_line()) %>%
+  # `+`(scale_colour_continuous(type = "viridis")) %>%
+  # `+`(facet_wrap(~ version_lag))
+  filter(time_value >= as.Date("2022-09-01")) %>%
+  ggplot(aes(version, value)) %>%
+  `+`(geom_line()) %>%
+  `+`(scale_colour_continuous(type = "viridis")) %>%
+  `+`(facet_wrap(~ time_value))
+```
+
+- MEDIUM: the `ref_time_values` become the `time_value` column, but I was
+  outputting a (sensible) `time_value` column from the computation.  Munging the column names is a hassle.
+- MINOR: I don't like the `slide_value_`/any name prefix, and no prefix isn't even an option.
+- MINOR: again, missing an archive filter function
+
+Here, I wanted "revision sequences" for some observations in NV.
+```{r}
+hhs_influenza_tbl %>%
+  filter(geo_value == "nv") %>%
+  filter(issue >= as.Date("2022-10-10")) %>%
+  select(issue, time_value, value, lag) %>%
+  arrange(time_value, issue) %>%
+  group_by(time_value) %>%
+  mutate(diff_value = c(NA, diff(value))) %>%
+  ungroup()
+```
+- NOTE: this can't be done with an epi_archive's built-in functions. But I could
+  have just done `hhs_influenza_archive$DT` and used a data table; a separate
+  tibble wasn't required yet.
+
+Try to get a sense of the scale of available data in a couple of locations:
+```{r}
+hhs_influenza_archive %>%
+  epix_as_of(.$versions_end) %>%
+  group_by(geo_value) %>%
+  summarize(tibble(levels = (0:20)/20,
+                   quantiles = quantile(value, levels, na.rm=TRUE))) %>%
+  filter(geo_value %in% c("al", "ar")) %>%
+  print(n=50L)
+```
+
+Compute and plot 7-day sum data:
+```{r}
+hhs_influenza_archive %>%
+  epix_as_of(.$versions_end) %>%
+  filter(geo_value %in% c("al", "ar")) %>%
+  tidyr::complete(geo_value, time_value = tidyr::full_seq(time_value, 1L)) %>%
+  dplyr::group_by(geo_value) %>%
+  dplyr::mutate(value_7dsum = frollsum(value, 7L)) %>%
+  # ggplot(aes(time_value, value)) %>%
+  ggplot(aes(time_value, value_7dsum)) %>%
+  `+`(geom_line()) %>%
+  `+`(facet_wrap(~ geo_value))
+```
+
+- MEDIUM: I had to calculate the 7d sum myself; while this is simple, it's maybe
+  a little tedious, and might be error-prone if there are missing rows. There's
+  no 7d sum in the API. In the future we don't be able to fetch a 7d average
+  issue data either, so that's out for use with `epi_archive`s.
+
+```{r}
+# chng_influenza_tbl = covidcast(
+#   data_source = "chng",
+#   signals = "smoothed_adj_outpatient_flu",
+#   time_type = "day",
+#   geo_type = "state",
+#   time_values = epirange(12340101, 34560101),
+#   geo_values = "*",
+#   issues = epirange(12340101, 34560101)
+# ) %>% fetch_tbl()
+
+# chng_influenza_archive = chng_influenza_tbl %>%
+#   rename(version = issue) %>%
+#   as_epi_archive()
+```
+
+```{r}
+# chng_influenza_tbl %>%
+#   filter(geo_value == "nv") %>%
+#   rename(version = issue) %>%
+#   as_epi_archive() %>%
+#   epix_slide(~ tibble(data=list(.x)), n = 100L,
+#              ref_time_values = seq(as.Date("2022-05-01"), as.Date("2022-10-13"), by="day")) %>%
+#              # ref_time_values = seq(as.Date("2021-01-01"), as.Date("2022-10-17"), by="day")) %>%
+#   transmute(version=time_value, slide_value_data) %>%
+#   unnest(slide_value_data) %>%
+#   mutate(version_lag = as.integer(version - time_value)) %>%
+#   ggplot(aes(time_value, value, group=version, colour=version)) %>%
+#   `+`(geom_line()) %>%
+#   `+`(scale_colour_continuous(type = "viridis", trans = "date"))
+#   # ggplot(aes(time_value, value)) %>%
+#   # `+`(geom_line()) %>%
+#   # `+`(scale_colour_continuous(type = "viridis")) %>%
+#   # `+`(facet_wrap(~ version_lag))
+#   # filter(time_value >= as.Date("2022-09-01")) %>%
+#   # ggplot(aes(version, value)) %>%
+#   # `+`(geom_line()) %>%
+#   # `+`(scale_colour_continuous(type = "viridis")) %>%
+#   # `+`(facet_wrap(~ time_value))
+```
+