Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add episode on contact matrices #63

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

amanda-minter
Copy link
Collaborator

This PR adds an episode on contact matrices, fixes issues #47 and #30 and incorporates the edits made in the closed PR #52.

Copy link

github-actions bot commented Nov 7, 2024

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

  • 🎯 correct output
  • 🖼️ correct figures
  • ❓ new warnings
  • ‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/epiverse-trace/tutorials-late/compare/md-outputs..md-outputs-PR-63

The following changes were observed in the rendered markdown documents:

 config.yaml                |   4 +-
 contact-matrices.md (new)  | 349 +++++++++++++++++++++++++++++++++++++++++++++
 md5sum.txt                 |   7 +-
 modelling-interventions.md |   4 +-
 simulating-transmission.md |  17 ++-
 5 files changed, 368 insertions(+), 13 deletions(-)
What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2024-12-05 15:17:50 +0000

github-actions bot pushed a commit that referenced this pull request Nov 7, 2024
github-actions bot pushed a commit that referenced this pull request Nov 7, 2024
Copy link
Member

@adamkucharski adamkucharski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking good – just had a few minor suggestions, and some comments about consistency in contact indices (which could arguably go either way, but we should probably go with popular formulation!)

episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/simulating-transmission.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
github-actions bot pushed a commit that referenced this pull request Nov 12, 2024
C is now the model term for the contact matrix, which is the transpose of `contact_matrix$data`
github-actions bot pushed a commit that referenced this pull request Nov 14, 2024
Copy link
Contributor

@sbfnk sbfnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice - left a few more comments.

episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved
episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved

Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. Normalisation means converting to a value to be between 0 and 1. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. $C[i,j]$ is defined as contacts to $i$ from $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "defined as contacts to $i$ from $j$" - I don't think this from/to notion is useful as it implies that it's one of the two that initiates contact - see also discussion at from/to, contacting/contacted socialcontactdata/contactmatrix#14
  • you could (if you don't think it adds confusion) point to the split argument in contact_matrix which does the normalisation for you, although it's definitely also useful to show here how to do iot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've altered the text slightly to 'represents the contacts between populations $i$ and $j$' so it is more generic, wording this episode has been tricky - if you have any more suggestions for clarity let me know!

I've also added a callout on using split, I think it is a useful addition to know the normalisation can be done within the function contact_data(). Related to this, I think there could be come confusion about where normalisation takes place in different analyses e.g. in epidemics it happens within the model function call , I've added a callout box to the first tutorial on using epidemics to highlight that the contact matrix normalisation does not need to be done.

Copy link
Contributor

@sbfnk sbfnk Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading the various uses here how about speaking about contacts of group i with group j (which I though you used somewhere but now I can't find it) which to me does not imply any directionality but makes it clear that in rows we're specifically looking at group i. So perhaps adopt this one throughout?

github-actions bot pushed a commit that referenced this pull request Nov 28, 2024
github-actions bot pushed a commit that referenced this pull request Nov 28, 2024
github-actions bot pushed a commit that referenced this pull request Nov 29, 2024
github-actions bot pushed a commit that referenced this pull request Nov 29, 2024
github-actions bot pushed a commit that referenced this pull request Nov 29, 2024
Copy link
Contributor

@sbfnk sbfnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested a consistent notation throughout. @amanda-minter do you think this works? I feel I've gone around this too many times to be a good judge any more.

One way or the other it would be good I think to make sure the terminology is the same in these sections and perhaps define somewhere early e.g. "We call $C[i, j]$ the average number of contacts of group $i$ with group $j$ the number of contacts between the two groups, averaged across all individuals in group $i$."

episodes/contact-matrices.Rmd Outdated Show resolved Hide resolved

Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).
In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts of population $j$ with population $i$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: I think this is the wrong way around - C[i, j] in socialmixr is as in the equations above I think (but please correct me if I'm wrong).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the same then why do we need to transpose the matrix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always find $i$ and $j$ a potential headache (which it's why this will be so useful to have written down!)

Taking step back, we just need the FOI to be defined sensibly, i.e.: $\sum_j C_{i,j} I_j/N_j$. So $C_{i,j}$ should be contacts that group $j$ (the infectious ones) make with group $i$ (the susceptible ones) - this is from equation A3 in Wallinga et al (2006)

The contact_matrix() function gives the following structure:

#> $matrix
#>          contact.age.group
#> age.group      [0,1)     [1,5)   [5,15)      15+
#>    [0,1)  0.40000000 0.8000000 1.266667 5.933333
#>    [1,5)  0.11250000 1.9375000 1.462500 5.450000
#>    [5,15) 0.02450980 0.5049020 7.946078 6.215686
#>    15+    0.03230337 0.3581461 1.290730 9.594101

So $C[i,j]$ is contacts made by group $i$ with group $j$? Which I think would mean it needs transposing?

For completeness (and just to remind myself), {epidemics} has this internal processing in .prepare_population(), which normalises by dominant eigenvalue and scales based on $w_{tot}/w_i$ (to use the Walling et al notation):

contact_matrix <- (contact_matrix / max(Re(eigen(contact_matrix)$values))) /
    x[["demography_vector"]]

Copy link
Contributor

@sbfnk sbfnk Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just need the FOI to be defined sensibly, i.e.: $\sum_j C_{i,j} I_j/N_j$. So $C_{i,j}$ should be contacts that group $j$ (the infectious ones) make with group $i$ (the susceptible ones)

Shouldn't this be the other way round, i.e. $C_{ij}$ here is the average number of contacts with group $j$ that a suspectible in group $i$ has (then multiplied with the probability that the contact is infectious, i.e. $I_j/N_j$)?

Here's an example:

There are two groups, $i$ and $j$. There are 1 person in group $i$ ($N_i=1$) and 100 people in group $j$ ($N_j = 100$). Person $i$ meets all 100 people in group $j$ every day: according to socialmixr notation that means $C_{ij}=100$ and $C_{ji}=1$ (on the scale of days). The total number of contacts between the two groups per day is $C_{ij} N_i = C_{ji} N_j = 100$.

Sticking with this notation:
The FOI on person $i$ is proportional to 100 * (proportion of $j$ that is ill), or $C_{ij} I_j / N_j$
The FOI on people in group $j$ is proportional to 1 * (1 if $i$ is ill, 0 otherwise), or $C_{ji} I_i / N_i$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always find $i$ and $j$ a potential headache (which it's why this will be so useful to have written down!)

Can definitely agree on that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful illustration, thanks. That does make more sense given it's FOI onto the susceptible (and maybe we should include as an example once converge on answer!) Suspect I was thinking in terms of 'outbound' contacts (as in the next generation matrix and R estimation case) rather than 'inbound' contacts for the SIR model.

I feel I'm still missing something with regards to the Wallinga framing though, because those are the contacts made per person by group $j$ with group $i$ (i.e. $m_{ij}$ to use the Wallinga et al notation), rather than per capita contact rate $c_{ij} = m_ij / w_i$ where $w_i$ is the proportion of population in group $i$, and equal to $\beta_{ij}$ but for a scaling factor. So in above example, the contact rate $c_{ij} = 1 \times 101/1 \approx 100$ and $c_{ji} = 100 \times 101/100 \approx 100$.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticking with $C_{ij} = m_{ji}$ and $N_i=\omega_i$ and $I_i=y_i$ (LHS: socialmixr, RHS: Wallinga) we have for the term in the sum above:

$$ C_{ij} \frac{I_j}{N_j} = m_{ji} \frac{y_j}{\omega_j} = m_{ij} \frac{\omega_j}{\omega_i} \frac{y_j}{\omega_j} = m_{ij} \frac{y_j}{\omega_i} $$

Where we've used the symmetry $m_{ji} \omega_i = m_{ij} \omega_j$. That's the same as the equation under "Log-likelihood functions for the transmission parameters" in the Wallinga et al. paper (noting that we've swapped the index in the denominator).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. So it basically comes down to whether the $N_j$ (or symmetry-derived equivalent) is wrapped into the $\beta_{ij}$ term. If defined separately, i.e. $\sum_j C_{i,j} I_j/N_j$ as above, then as you say, contact rate should be defined from-S-to-I.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is useful for the purpose of teaching but I find it easier sometimes to work from the symmetric encounter matrix, i.e. the number of encounters between group $i$ and group $j$ per unit time. If we call this $E_{i,j}$ then it is symmetric $E_{i,j}= E_{j,i}$ and so is the term in the force of infection which is proportional to $\frac{E_{ij}I_j}{N_iN_j}$. This highlights that the row vs. column notation is purely about how the matrix is normalised i.e. whether we write it as $\frac{E_{ij}}{N_i}$ or $\frac{E_{ij}}{N_j}$ (which thus determines which of the $N$ terms remains in the force of infection) and not about contacts from/to etc.

github-actions bot pushed a commit that referenced this pull request Dec 5, 2024
github-actions bot pushed a commit that referenced this pull request Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants