Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional inference howto posts #57

Merged
merged 33 commits into from
Oct 3, 2024
Merged

Additional inference howto posts #57

merged 33 commits into from
Oct 3, 2024

Conversation

adamkucharski
Copy link
Member

  • Please check if the PR fulfills these requirements
  • I have read the CONTRIBUTING guidelines
  • A new item has been added to NEWS.md – not available in repo?
  • Tests for the changes have been added (for bug fixes / features) – NA as howto script
  • Docs have been added / updated (for bug fixes / features) – NA as howto script
  • Checks have been run locally and pass– NA as howto script
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    This is a PR with 3 new howto guides:
  • MCMC inference of R and k from cluster sizes for MERS using epichains and MCMCpack: reproduction_number_cluster_size.qmd
  • Reconstruction of infection curves for COVID and Ebola using EpiNow2: estimate_infections.qmd (can also create PR to EpiNow2 to add this as a case study link on their page once merged here)
  • Inference of R for H1N1 pandemic influenza from serological data using finalsize and socialmixr: reproduction_number_serological_data.qmd

@adamkucharski adamkucharski requested a review from avallecam May 28, 2024 13:38
@adamkucharski adamkucharski changed the title Additional inference Additional inference howto posts May 28, 2024
@avallecam
Copy link
Member

Thanks for the contribution! I'll rebase this PR and force-push some commits to see how the checks react

@avallecam avallecam force-pushed the additional-inference branch from b9abb8f to 047ba67 Compare June 5, 2024 13:51
Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

The reproduction_number_cluster_size.qmd and reproduction_number_serological_data.qmd work fine! However, the estimate_infections.qmd is not running, which limited further review. This seems to be due to reading the URL for the UK COVID dashboard API, which may be blocking the building of the whole website.

Could we verify if this is still working for you? If this needs preliminary configurations to run, should we try an alternative dataset or reading approach?

Comment on lines 43 to 48
# Define URL for the UK COVID dashboard API
uk_dashboard_url <- "https://coronavirus.data.gov.uk/api/v1/data?filters=areaType=overview;areaName=United%2520Kingdom&structure=%7B%22areaType%22:%22areaType%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22date%22:%22date%22,%22newDailyNsoDeathsByDeathDate%22:%22newDailyNsoDeathsByDeathDate%22,%22cumDailyNsoDeathsByDeathDate%22:%22cumDailyNsoDeathsByDeathDate%22%7D&format=csv"

# Load data from the UK COVID dashboard API and format as data.frame
load_uk_data_text <- GET(uk_dashboard_url)
uk_data <- read.csv(text = content(load_uk_data_text, "text"), stringsAsFactors = FALSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently get this output from uk_data object. Could you verify if you still get a data frame output?

> uk_data
  [1] X..DOCTYPE.html..html.lang.en.class.govuk.template.__..(...)   
  [2] queries...state..data..success.true    
  [3] data...status.Green    
  [4] geography_name.North.East    
  [5] geography_code.E12000001    
  [6] refresh_date.null.   
  [7] X.status.Green    
  [8] geography_name.North.West    
  [9] geography_code.E12000002    
 [10] refresh_date.null..1    
 [11] X.status.Green.1    
 [12] geography_name.Yorkshire.and.The.Humber    
 [13] geography_code.E12000003    
 [14] refresh_date.null..2    
 [15] X.status.Green.2    
 [16] geography_name.East.Midlands    
 [17] geography_code.E12000004    
 [18] refresh_date.null..3    
 [19] X.status.Green.3    
 [20] geography_name.West.Midlands    
 [21] geography_code.E12000005    
 [22] refresh_date.null..4    
 [23] X.status.Green.4    
 [24] geography_name.East.of.England    
 [25] geography_code.E12000006    
 [26] refresh_date.null..5    
 [27] X.status.Green.5    
 [28] geography_name.London    
 [29] geography_code.E12000007    
 [30] refresh_date.null..6    
 [31] X.status.Green.6    
 [32] geography_name.South.East    
 [33] geography_code.E12000008    
 [34] refresh_date.null..7    
 [35] X.status.Green.7    
 [36] geography_name.South.West    
 [37] geography_code.E12000009    
 [38] refresh_date.null... 
 [39] dataUpdateCount.1    
 [40] dataUpdatedAt.1717579083479    
 [41] error.null    
 [42] errorUpdateCount.0    
 [43] errorUpdatedAt.0    
 [44] fetchFailureCount.0    
 [45] fetchFailureReason.null    
 [46] fetchMeta.null    
 [47] isInvalidated.false    
 [48] status.success    
 [49] fetchStatus.idle.   
 [50] queryKey..weather.alert.list    
 [51] heat.   
 [52] queryHash...weather.alert.list...heat..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to prioritize reusing existing data, similar data may be available in {incidence2} and {cfr}

incidence2::covidregionaldataUK contains one-year data for the UK, and cfr::covid_data contains three-year data for 19 countries, including the UK.

library(tidyverse)

incidence2::covidregionaldataUK %>% 
  # preprocess missing values
  tidyr::replace_na(list(deaths_new = 0)) %>%
  # compute the daily incidence
  incidence2::incidence(
    date_index = "date",
    counts = "deaths_new",
    count_values_to = "confirm",
    date_names_to = "date"
  ) %>%
  # complete range of dates
  incidence2::complete_dates()
#> # incidence:  490 x 3
#> # count vars: deaths_new
#>    date       count_variable confirm
#>    <date>     <fct>            <dbl>
#>  1 2020-01-30 deaths_new           0
#>  2 2020-01-31 deaths_new           0
#>  3 2020-02-01 deaths_new           0
#>  4 2020-02-02 deaths_new           0
#>  5 2020-02-03 deaths_new           0
#>  6 2020-02-04 deaths_new           0
#>  7 2020-02-05 deaths_new           0
#>  8 2020-02-06 deaths_new           0
#>  9 2020-02-07 deaths_new           0
#> 10 2020-02-08 deaths_new           0
#> # ℹ 480 more rows
cfr::covid_data %>% 
  # compute the daily incidence
  incidence2::incidence(
    date_index = "date",
    counts = "deaths",
    groups = "country"
  ) %>% 
  # complete range of dates
  incidence2::complete_dates()
#> # incidence:  20,786 x 4
#> # count vars: deaths
#> # groups:     country
#>    date_index country   count_variable count
#>    <date>     <chr>     <fct>          <dbl>
#>  1 2020-01-03 Argentina deaths             0
#>  2 2020-01-03 Brazil    deaths             0
#>  3 2020-01-03 Colombia  deaths             0
#>  4 2020-01-03 France    deaths             0
#>  5 2020-01-03 Germany   deaths             0
#>  6 2020-01-03 India     deaths             0
#>  7 2020-01-03 Indonesia deaths             0
#>  8 2020-01-03 Iran      deaths             0
#>  9 2020-01-03 Italy     deaths             0
#> 10 2020-01-03 Mexico    deaths             0
#> # ℹ 20,776 more rows

Created on 2024-06-06 with reprex v2.1.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avallecam - Cool. FWIW - In version 2.3.0 I've also added an argument to complete_dates within the function call itself. If useful you can also use named vectors of the form c(new_name = "old_name") to rename on the fly. E.g. for the first example you could do

incidence2::covidregionaldataUK %>% 
  # preprocess missing values
  tidyr::replace_na(list(deaths_new = 0)) %>%
  # compute the daily incidence renaming deaths_new and completing dates
  incidence2::incidence(
    date_index = "date",
    counts = c(deaths = "deaths_new"),
    count_values_to = "confirm",
    date_names_to = "date",
    complete_dates = TRUE
  )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elegant! thanks for sharing the new features


## Steps in detail
- *Example 1* aims to reconstruct SARS-CoV-2 infection dynamics in the UK from daily data on deaths
- First, we load daily data on reported COVID deaths from the [UK COVID dashboard](https://coronavirus.data.gov.uk/details/deaths?areaType=overview&areaName=United%20Kingdom) by copying the 'download data as csv' link, then using the `httr` package to import into R and format as a `data.frame`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to access this links, I get immediately redirected to https://ukhsa-dashboard.data.gov.uk/

Should we try an alternative way? or provide more details on how to access it using the API?

Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this first review, I propose edits to estimate_infections.qmd to make this able to run. Some additional edits not added in this review are:

If you prefer, for best use of our time, I can take charge of this PR, make the edits, and push them directly as commits. (I needed to make them locally to proof that they work)

One suggestion change I want to emphasize is the plotting. One alternative is adding layers from the EpiNow2 outputs. If not, we can use ggplot2 for a custom plot from the output. Any strong opinions about this?

analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
analyses/reconstruct_transmission/estimate_infections.qmd Outdated Show resolved Hide resolved
@adamkucharski
Copy link
Member Author

Thanks for those comments! I've committed the edits. I haven't yet added the following, as wasn't sure of most efficient way:

  • lintr checks
  • code to copy chunks

Feel free to add commits on these, then can merge!

@avallecam
Copy link
Member

avallecam commented Jun 21, 2024

Nice! about your questions:

  • lintr checks

We can do this by installing the {lintr} package, and if using Rstudio, we can run an Rstudio addin called "Lint current file" displaying the Rstudio Markers pane with edit suggestions.

  • code to copy chunks

Yes, sorry the link was not specific. I meant this 4-lines chunk after the yaml

Feel free to add commits on these, then can merge!

Noted. We'll do this after next week

@avallecam avallecam added the priority First in the to-do list label Sep 26, 2024
adamkucharski and others added 18 commits October 3, 2024 18:17
Inferring infection dynamics + inferring R and k from clusters
Following Kucharski et al, PLOS Path, 2014 approach
Also remove WIP social contact file from this branch
Inferring infection dynamics + inferring R and k from clusters
Following Kucharski et al, PLOS Path, 2014 approach
Also remove WIP social contact file from this branch
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
Co-authored-by: Andree Valle Campos <[email protected]>
@avallecam avallecam force-pushed the additional-inference branch from 0609e43 to d9f86f5 Compare October 3, 2024 17:28
@avallecam
Copy link
Member

this is ready to merge. pending change is to update renv.lock to latest {epiparameter} version. but keeping this to do after merging this PR. now in issue #75

@avallecam avallecam self-requested a review October 3, 2024 17:34
Copy link
Member

@avallecam avallecam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested changes added. format now match required format output. The entry reproduction_number_serological_data.qmd may require some object renaming in data_summary and summary_data to clearly differentiate between the observed and estimated values. But we can do this in a separate PR too.

@avallecam avallecam merged commit b08aa93 into main Oct 3, 2024
2 checks passed
@avallecam avallecam deleted the additional-inference branch October 3, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority First in the to-do list
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants