Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #127 #152

Closed
wants to merge 32 commits into from
Closed

Fixes #127 #152

wants to merge 32 commits into from

Conversation

bahadzie
Copy link
Member

@bahadzie bahadzie commented Jan 15, 2024

- Add function to calculate timing and size of epidemic peak
- BUG FIX Update testthat/_snaps/ to pass devtools::check()
@Bisaloo
Copy link
Member

Bisaloo commented Jan 17, 2024

Hi @bahadzie, rather than removing the files in _snaps/ altogether, you should run the tests interactively on your local machine, and run testthat::snapshot_accept('vaccination') if you want to validate the change in the recorded snapshot.

More info in the dedicated testthat vignette: https://testthat.r-lib.org/articles/snapshotting.html

@pratikunterwegs
Copy link
Collaborator

Hi @bahadzie and @Bisaloo - just checking whether this PR is still current and if it needs to be reviewed? @Bisaloo are you reviewing or would it help for me to step in?

@Bisaloo
Copy link
Member

Bisaloo commented Jan 22, 2024

No, this was just a drive by comment based on the check results. I have not reviewed the code. I'll leave it up to your expertise.

@pratikunterwegs
Copy link
Collaborator

pratikunterwegs commented Jan 22, 2024

Okay thanks, we have a P3 meeting scheduled later so will discuss next steps for this PR then. Edit add: of course we'll restore the snapshots.

Copy link
Collaborator

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bahadzie, thanks for making this PR and taking a stab at this issue.

This function works mostly alright for the case of a single demographic group in a model where time increments by 1.0, but would not find the correct epidemic peak time for time increments other than 1.0 (I think), nor for multiple demographic groups. This is because of how timing <- which(wide %in% peak) is calculated - you're getting the index, not the time value. I've attached an example from the vignettes below - the figure shows all three peaks at about 370, but the function calculates peaks of 367, 972, and 1576. I've made some suggestions for edits to the function which should make it more readable and perform as expected.

  • This function can be collapsed, for its simplest case in a single-peak epidemic, by using convenient {data.table} syntax from the suggestion.
  • You should also add tests for this function. There should be tests for
  • the output type, dimensions, and predictable column contents (e.g. demography group names) and types (timing and value should be numerics);
  • time and peak value limits (i.e., 0 < peak time < max time; n_initially_infected $\leq$ peak value < max population size; etc.);
  • predictable cases (e.g., a case where infectiousness = 0.0, so the 'peak' is at $t = 0$; a case where an intervention is applied to one demographic group whose peak is then later than other groups; etc.).
  • You should revert commit 9a349b2, as these snapshots are important and must be retained.
library(epidemics)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

# load contact and population data from socialmixr::polymod
polymod <- socialmixr::polymod
contact_data <- socialmixr::contact_matrix(
  polymod,
  countries = "United Kingdom",
  age.limits = c(0, 20, 40),
  symmetric = TRUE
)
#> Using POLYMOD social contact data. To cite this in a publication, use the 'cite' function
#> Removing participants that have contacts without age information. To change this behaviour, set the 'missing.contact.age' option

# prepare contact matrix
contact_matrix <- t(contact_data$matrix)

# prepare the demography vector
demography_vector <- contact_data$demography$population
names(demography_vector) <- rownames(contact_matrix)

# view contact matrix and demography
contact_matrix
#>                  
#> contact.age.group     [,1]     [,2]     [,3]
#>           [0,20)  7.883663 2.794154 1.565665
#>           [20,40) 3.120220 4.854839 2.624868
#>           40+     3.063895 4.599893 5.005571

demography_vector
#>   [0,20)  [20,40)      40+ 
#> 14799290 16526302 28961159
# initial conditions
initial_i <- 1e-6
initial_conditions <- c(
  S = 1 - initial_i, E = 0, I = initial_i, R = 0, V = 0
)

# build for all age groups
initial_conditions <- rbind(
  initial_conditions,
  initial_conditions,
  initial_conditions
)

# assign rownames for clarity
rownames(initial_conditions) <- rownames(contact_matrix)

# view initial conditions
initial_conditions
#>                S E     I R V
#> [0,20)  0.999999 0 1e-06 0 0
#> [20,40) 0.999999 0 1e-06 0 0
#> 40+     0.999999 0 1e-06 0 0

uk_population <- population(
  name = "UK",
  contact_matrix = contact_matrix,
  demography_vector = demography_vector,
  initial_conditions = initial_conditions
)

uk_population
#> <population> object
#> 
#>  Population name:
#> "UK"
#> 
#>  Demography 
#> [0,20): 14,799,290 (20%)
#> [20,40): 16,526,302 (30%)
#> 40+: 28,961,159 (50%)
#> 
#>  Contact matrix 
#>                  
#> contact.age.group   [0,20)  [20,40)      40+
#>           [0,20)  7.883663 2.794154 1.565665
#>           [20,40) 3.120220 4.854839 2.624868
#>           40+     3.063895 4.599893 5.005571

# run epidemic simulation with no vaccination or intervention
data <- model_default_cpp(
  population = uk_population,
  time_end = 600, increment = 1.0
)

# plot figure of epidemic curve
filter(data, compartment %in% c("exposed", "infectious")) %>%
  ggplot(
    aes(
      x = time,
      y = value,
      col = demography_group,
      linetype = compartment
    )
  ) +
  geom_line() +
  labs(
    x = "Simulation time (days)",
    linetype = "Compartment",
    y = "Individuals"
  )

# get the timing and peak of the infectious compartment
epidemic_peak(data)
#>         timing     peak
#> [0,20)     367 384139.9
#> [20,40)    972 357215.9
#> 40+       1576 474553.3

Created on 2024-01-22 with reprex v2.0.2

R/helpers.R Outdated

#' Get the timing and size of a compartment's peak
#'
#' Get the timing and size of a compartment's peak for all demographic groups.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please update this to clarify that the peak found is the highest peak. For example, in an epidemic with two peaks such as this example of seasonality in infectiousness, the function would find the second peak.
  • Please add a Details section laying out the use case, and the important caveats; e.g., best used for a single peak epidemic etc.

R/helpers.R Outdated
#'
#' @param data A `<data.frame>` of model output, typically
#' the output of [model_default_cpp()] or similar functions.
#' @param compartment The compartment of interest ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param compartment The compartment of interest ;
#' @param compartment A character vector for the compartments of interest.

Making this a character vector allows you to find peaks in multiple compartments at once, e.g. hospitalised and deaths.

R/helpers.R Outdated
#' @param data A `<data.frame>` of model output, typically
#' the output of [model_default_cpp()] or similar functions.
#' @param compartment The compartment of interest ;
#' @return A `<data.frame>` of the timing and peak of the selected compartment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention the column names in this data.frame.

R/helpers.R Outdated
#'
#' # run epidemic simulation with no vaccination or intervention
#' data <- model_default_cpp(
#' population = uk_population
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' population = uk_population
#' population = uk_population, time_end = 600

Increasing the time_end allows the true peak to be found, as the default value time_end = 100 is too low for the epidemic to peak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

R/helpers.R Outdated
colnames(data),
must.include = c("time", "demography_group", "compartment", "value")
)
checkmate::assert_character(compartment, len = 1L)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checkmate::assert_character(compartment, len = 1L)
checkmate::assert_character(
compartments, min.len = 1,
any.missing = FALSE, unique = TRUE,
null.ok = FALSE
)

Allow more than one compartment to be added, allow only unique compartments, and prevent missing values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

R/helpers.R Outdated
Comment on lines 347 to 361
age_groups <- unique(data[["demography_group"]])
n_groups <- length(age_groups)
columns <- c("demography_group", "time", "value")
# compartment of interest
long_data <- data[data[["compartment"]] == compartment, columns]
nrows <- length(long_data[, "value"])
# long to wide conversion
wide <- matrix(long_data[, "value"], nrows / n_groups, n_groups, byrow = TRUE)
colnames(wide) <- age_groups

col_max <- function(data) sapply(as.data.frame(data), max, na.rm = TRUE)
peak <- col_max(as.data.frame(wide))
timing <- which(wide %in% peak)

cbind(timing, peak)
Copy link
Collaborator

@pratikunterwegs pratikunterwegs Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
age_groups <- unique(data[["demography_group"]])
n_groups <- length(age_groups)
columns <- c("demography_group", "time", "value")
# compartment of interest
long_data <- data[data[["compartment"]] == compartment, columns]
nrows <- length(long_data[, "value"])
# long to wide conversion
wide <- matrix(long_data[, "value"], nrows / n_groups, n_groups, byrow = TRUE)
colnames(wide) <- age_groups
col_max <- function(data) sapply(as.data.frame(data), max, na.rm = TRUE)
peak <- col_max(as.data.frame(wide))
timing <- which(wide %in% peak)
cbind(timing, peak)
data.table::setDT(data)
data = data[compartment %in% compartments,
list(time = data.table::first(time[value == max(value)]),
value = data.table::first(max(value))),
by = c("demography_group", "compartment")
]
data.table::setDF(data)[]

This code using {data.table} should give you the highest peak time and value for each demographic group. Suggest using first() or similar to prevent multiple maximum values from being found.

Copy link
Member Author

@bahadzie bahadzie Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Although I get a note about undefined global functions or variables compartment, time and value.

Not sure how to adjust the {data.table} syntax to pass {lintr} rules

R/helpers.R Outdated
#'
#' # get the timing and peak of the exposed compartment
#' epidemic_peak(data, compartment = "exposed")
epidemic_peak <- function(data, compartment = "infectious") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
epidemic_peak <- function(data, compartment = "infectious") {
epidemic_peak <- function(data, compartments = "infectious") {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

 - Use data.table syntax from suggestion
 - Add tests for function
 - Revert snapshots from previous commit
Fixes #125: Calculate expected vaccinations and check model correctness
Copy link
Collaborator

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bahadzie, thanks for adding these changes. Could you:

  1. Update the PR description to say what the changes are and ideally tag the issues being fixed;
  2. Rebase this branch on main;
  3. Update your version of {testthat} (v3.2.1 appears latest) as this seems to be causing changes to the test snapshots; you would also need to revert changes to the test snapshots, and then check if any snapshots really need to be updated. In general, I would be very conservative in accepting changes to a test snapshot, and I would always mention why in the PR or in the commit message;
  4. Update the code in epidemic_peak() to access columns using one of the two methods mentioned in this SO post by Matt Dowle. In today's meeting I said I preferred setting variables to NULL at the top of the function (Dowle's option 2), but I see that in new_infections() I have gone for option 1 instead which uses get().

Once 2 -- 4 are done, we shall get a better idea of whether checks pass - I expect they will.

adamkucharski and others added 7 commits February 7, 2024 18:02
- Add function to calculate timing and size of epidemic peak
Fixes #125
- Calculate expected vaccinations and check model correctness
- Addresses all PR review comments
@bahadzie
Copy link
Member Author

Hi @bahadzie, thanks for adding these changes. Could you:

1. Update the PR description to say what the changes are and ideally tag the issues being fixed;

2. Rebase this branch on `main`;

3. Update your version of {testthat} (v3.2.1 appears latest) as this seems to be causing changes to the test snapshots; you would also need to revert changes to the test snapshots, and then check if any snapshots really need to be updated. In general, I would be very conservative in accepting changes to a test snapshot, and I would always mention why in the PR or in the commit message;

4. Update the code in `epidemic_peak()` to access columns using [one of the two methods mentioned in this SO post](https://stackoverflow.com/a/8096882/7410366) by Matt Dowle. In today's meeting I said I preferred setting variables to `NULL` at the top of the function (Dowle's option 2), but I see that in `new_infections()` I have gone for option 1 instead which uses `get()`.

Once 2 -- 4 are done, we shall get a better idea of whether checks pass - I expect they will.

@bahadzie bahadzie closed this Feb 12, 2024
@bahadzie bahadzie reopened this Feb 12, 2024
@bahadzie
Copy link
Member Author

Hi @bahadzie, thanks for adding these changes. Could you:

1. Update the PR description to say what the changes are and ideally tag the issues being fixed;

2. Rebase this branch on `main`;

3. Update your version of {testthat} (v3.2.1 appears latest) as this seems to be causing changes to the test snapshots; you would also need to revert changes to the test snapshots, and then check if any snapshots really need to be updated. In general, I would be very conservative in accepting changes to a test snapshot, and I would always mention why in the PR or in the commit message;

4. Update the code in `epidemic_peak()` to access columns using [one of the two methods mentioned in this SO post](https://stackoverflow.com/a/8096882/7410366) by Matt Dowle. In today's meeting I said I preferred setting variables to `NULL` at the top of the function (Dowle's option 2), but I see that in `new_infections()` I have gone for option 1 instead which uses `get()`.

Once 2 -- 4 are done, we shall get a better idea of whether checks pass - I expect they will.

Done

@pratikunterwegs
Copy link
Collaborator

Thanks @bahadzie - I think you might have rebased on an older commit; could you rebase on the current head of origin/main? That would probably resolve the flagged conflict.

@bahadzie
Copy link
Member Author

Hi @pratikunterwegs I've rebased on main.

It's currently running the Github Actions. I'll keep you posted.

@pratikunterwegs
Copy link
Collaborator

Thanks, will take a look tomorrow, checks are passing.

Copy link
Collaborator

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these inputs @bahadzie. Just looking at @BlackEdder's comments on #125, I think the expectation is for a function like expected_vaccinations(population, vaccination), which calculates the expected number of vaccinated individuals in each age group, before/without running an epidemic model. This would only work for a single dose model, so not Vacamole - although perhaps I'm wrong here, and this could be done as $\nu_1 \nu_2 * S$?

The calculation of an effective vaccination rate should be an internal step, as it determines the transition rates within the model, so it's not expected to an exported function as far as I understand.

@BlackEdder, could you come in and say whether it's necessary to calculate an effective vaccination rate internally? Currently, users pass cohort-specific, per-interval (1 day) vaccination rates ($\nu_i$), which is applied only to susceptibles: $V_i = \nu_i S_i$. If I understand correctly, your suggestion is to rescale $\nu$ so that if $k$ doses are allocated per age group per day, those are all 'used up' in the model, even when the number of available vaccination targets falls. Or have I not understood correctly? We can continue this discussion in a new issue if more useful.

@pratikunterwegs
Copy link
Collaborator

Just looking at this PR again, I think it might be better to restrict this PR to the epidemic peak function by resetting this PR to before the vaccination rate fn is added, I expect that should be 8bee333. I think #202 should resolve the effective vaccination issue documented in #198.

pratikunterwegs added a commit that referenced this pull request Jun 5, 2024
Co-authored-by: Banky Ahadzie <[email protected]>
pratikunterwegs added a commit that referenced this pull request Jun 5, 2024
Co-authored-by: Banky Ahadzie <[email protected]>
@pratikunterwegs
Copy link
Collaborator

Key commits from this branch copied into #240 and merged into main.

@pratikunterwegs pratikunterwegs deleted the bahadzie/issue127 branch June 10, 2024 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants