add episode on contact matrices #63

amanda-minter · 2024-11-07T14:32:06Z

This PR adds an episode on contact matrices, fixes issues #47 and #30 and incorporates the edits made in the closed PR #52.

github-actions · 2024-11-07T14:32:20Z

Thank you!

Thank you for your pull request 😃

🤖 This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

🎯 correct output
🖼️ correct figures
❓ new warnings
‼️ new errors

Rendered Changes

🔍 Inspect the changes: https://github.com/epiverse-trace/tutorials-late/compare/md-outputs..md-outputs-PR-63

The following changes were observed in the rendered markdown documents:

 compare-interventions.md                         |    4 +-
 config.yaml (gone)                               |   83 -
 contact-matrices.md (new)                        |  349 +++++
 fig/model-choices-rendered-unnamed-chunk-3-1.png |  Bin 19654 -> 16153 bytes
 fig/model-choices-rendered-unnamed-chunk-4-1.png |  Bin 9157 -> 9187 bytes
 md5sum.txt                                       |   31 +-
 modelling-interventions.md                       |    4 +-
 renv.lock (gone)                                 | 1756 ----------------------
 simulating-transmission.md                       |   17 +-
 9 files changed, 379 insertions(+), 1865 deletions(-)

What does this mean?

If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible.

This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

⏱️ Updated at 2025-01-29 11:04:25 +0000

adamkucharski

Thanks, this is looking good – just had a few minor suggestions, and some comments about consistency in contact indices (which could arguably go either way, but we should probably go with popular formulation!)

episodes/contact-matrices.Rmd

episodes/simulating-transmission.Rmd

episodes/contact-matrices.Rmd

sbfnk

Looks really nice - left a few more comments.

episodes/contact-matrices.Rmd

sbfnk · 2024-11-26T16:55:55Z

episodes/contact-matrices.Rmd

+
+Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. Normalisation means converting to a value to be between 0 and 1. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.
+
+In the case of the above model, we want to define $\beta  C_{i,j}$ so that the model has a specified valued of $R_0$. $C[i,j]$ is defined as contacts to $i$ from $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$  (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).


"defined as contacts to $i$ from $j$" - I don't think this from/to notion is useful as it implies that it's one of the two that initiates contact - see also discussion at from/to, contacting/contacted socialcontactdata/contactmatrix#14

you could (if you don't think it adds confusion) point to the split argument in contact_matrix which does the normalisation for you, although it's definitely also useful to show here how to do iot.

I've altered the text slightly to 'represents the contacts between populations $i$ and $j$' so it is more generic, wording this episode has been tricky - if you have any more suggestions for clarity let me know!

I've also added a callout on using split, I think it is a useful addition to know the normalisation can be done within the function contact_data(). Related to this, I think there could be come confusion about where normalisation takes place in different analyses e.g. in epidemics it happens within the model function call , I've added a callout box to the first tutorial on using epidemics to highlight that the contact matrix normalisation does not need to be done.

Re-reading the various uses here how about speaking about contacts of group i with group j (which I though you used somewhere but now I can't find it) which to me does not imply any directionality but makes it clear that in rows we're specifically looking at group i. So perhaps adopt this one throughout?

sbfnk

I've suggested a consistent notation throughout. @amanda-minter do you think this works? I feel I've gone around this too many times to be a good judge any more.

One way or the other it would be good I think to make sure the terminology is the same in these sections and perhaps define somewhere early e.g. "We call $C[i, j]$ the average number of contacts of group $i$ with group $j$ the number of contacts between the two groups, averaged across all individuals in group $i$."

episodes/contact-matrices.Rmd

sbfnk · 2024-12-02T18:13:05Z

episodes/contact-matrices.Rmd

+
+Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.
+
+In the case of the above model, we want to define $\beta  C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$  (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).


Suggested change

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts of population $j$ with population $i$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).

PS: I think this is the wrong way around - C[i, j] in socialmixr is as in the equations above I think (but please correct me if I'm wrong).

If it's the same then why do we need to transpose the matrix?

Always find $i$ and $j$ a potential headache (which it's why this will be so useful to have written down!)

Taking step back, we just need the FOI to be defined sensibly, i.e.: $\sum_j C_{i,j} I_j/N_j$. So $C_{i,j}$ should be contacts that group $j$ (the infectious ones) make with group $i$ (the susceptible ones) - this is from equation A3 in Wallinga et al (2006)

The contact_matrix() function gives the following structure:

#> $matrix #> contact.age.group #> age.group [0,1) [1,5) [5,15) 15+ #> [0,1) 0.40000000 0.8000000 1.266667 5.933333 #> [1,5) 0.11250000 1.9375000 1.462500 5.450000 #> [5,15) 0.02450980 0.5049020 7.946078 6.215686 #> 15+ 0.03230337 0.3581461 1.290730 9.594101

So $C[i,j]$ is contacts made by group $i$ with group $j$? Which I think would mean it needs transposing?

For completeness (and just to remind myself), {epidemics} has this internal processing in .prepare_population(), which normalises by dominant eigenvalue and scales based on $w_{tot}/w_i$ (to use the Walling et al notation):

contact_matrix <- (contact_matrix / max(Re(eigen(contact_matrix)$values))) / x[["demography_vector"]]

we just need the FOI to be defined sensibly, i.e.: $\sum_j C_{i,j} I_j/N_j$. So $C_{i,j}$ should be contacts that group $j$ (the infectious ones) make with group $i$ (the susceptible ones)

Shouldn't this be the other way round, i.e. $C_{ij}$ here is the average number of contacts with group $j$ that a suspectible in group $i$ has (then multiplied with the probability that the contact is infectious, i.e. $I_j/N_j$)?

Here's an example:

There are two groups, $i$ and $j$. There are 1 person in group $i$ ($N_i=1$) and 100 people in group $j$ ($N_j = 100$). Person $i$ meets all 100 people in group $j$ every day: according to socialmixr notation that means $C_{ij}=100$ and $C_{ji}=1$ (on the scale of days). The total number of contacts between the two groups per day is $C_{ij} N_i = C_{ji} N_j = 100$.

Sticking with this notation:
The FOI on person $i$ is proportional to 100 * (proportion of $j$ that is ill), or $C_{ij} I_j / N_j$
The FOI on people in group $j$ is proportional to 1 * (1 if $i$ is ill, 0 otherwise), or $C_{ji} I_i / N_i$

Thanks, that makes sense. So it basically comes down to whether the $N_j$ (or symmetry-derived equivalent) is wrapped into the $\beta_{ij}$ term. If defined separately, i.e. $\sum_j C_{i,j} I_j/N_j$ as above, then as you say, contact rate should be defined from-S-to-I.

Not sure if this is useful for the purpose of teaching but I find it easier sometimes to work from the symmetric encounter matrix, i.e. the number of encounters between group $i$ and group $j$ per unit time. If we call this $E_{i,j}$ then it is symmetric $E_{i,j}= E_{j,i}$ and so is the term in the force of infection which is proportional to $\frac{E_{ij}I_j}{N_iN_j}$. This highlights that the row vs. column notation is purely about how the matrix is normalised i.e. whether we write it as $\frac{E_{ij}}{N_i}$ or $\frac{E_{ij}}{N_j}$ (which thus determines which of the $N$ terms remains in the force of infection) and not about contacts from/to etc.

I've gone through the {epidemics} and {finalsize} examples a bit more, thinking about how these steps are introduced in vignettes. The challenge is that these packages allow user to specify in terms of $R_0$ (which is useful), but that means normalising the matrix, then converting back into the correct form for the contact rate you describe above @sbfnk . This is where I think the transpose comes in, so that it's switching between $C_{ij}=R_{ij}$ for the eigenvalue normalisation and $C_{ij}/N_j$ (i.e. contact rate per capita) for the model.

But because the result is the symmetric per capita matrix that goes into the model, the end result is equivalent:

# 1. Current epidemics and finalsize implementation contact_matrix <- t(contact_data[["matrix"]]) contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))/demography_vector # 2. Per-capita formulation # Normalise the matrix to ensure correct R0 in model scaling_factor <- 1 / max(Re(eigen(contact_data[["matrix"]])$values)) contact_matrix2 <- contact_data[["matrix.per.capita"]]*scaling_factor

So in terms of implications for this episode, the main thing is just to make sure that the notation for the matrix in the model is in terms of contacts susceptibles make with infectives?

Bringing it back to those two key distinctions (which we could tweak to mention, give above discusssion!):

Convert contact matrix into expected number of secondary cases The R calculation involves an infective meeting susceptibles

Convert contact matrix into contact rates The epidemic model involves a susceptible meeting infectives

Also tagging @rozeggo and @BlackEdder for info.

Do we actually need to distinguish between the two? Given that the eigenvalues will be invariant under transpose (1) could be done in either orientation (as could (2) if the index of the normalising population size is swapped).

There's probably a simpler framing we could use, but think need to explain the two steps (eigenvalue calc, then normalisation by demography over correct matrix dimension), otherwise could lead people to assume the following is correct?

# 1. Current epidemics and finalsize implementation contact_matrix <- contact_data[["matrix"]] # Edit: no transpose contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))/demography_vector

Perhaps the below is clearest, because doesn't involve any explicit transposes or normalisation? Just need to explain intuitively what the two matrices represent?

# 2. Per-capita formulation # Normalise the matrix to ensure correct R0 in model scaling_factor <- 1 / max(Re(eigen(contact_data[["matrix"]])$values)) contact_matrix2 <- contact_data[["matrix.per.capita"]]*scaling_factor

adamkucharski · 2025-01-20T10:35:51Z

episodes/contact-matrices.Rmd

+
+Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.
+
+In the case of the above model, we want to define $\beta  C_{i,j}$ so that the model has a specified valued of $R_0$. The entry of the contact matrix $C[i,j]$ represents the contacts between populations $i$ and $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$  (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).


Suggestion for final version of this paragraph (wrapping up above discussions). I've edited so $C[i,j]$ =contact_data$matrix[j,i] as this is how it's defined in line 165 above (and for a training episode, makes it easier to follow the logic).

In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. If the entry of the contact matrix $C[i,j]$ represents the contacts of population $i$ with $j$, it is equivalent tocontact_data$matrix[i,j], and the maximum eigenvalue of this matrix represents the typical magnitude of contacts, not typical magnitude of transmission. We must therefore normalise the matrix $C$ so the maximum eigenvalue is one; we call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ as a model input is calculated from $R_0$, the scaling factor and the value of $\gamma$ (i.e. mathematically we use the fact that the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is equal to $\beta / \gamma$).

moving some contact matrix content from `simulating-transmission.Rmd` to`contact-matrices.Rmd`

fixes #47

added section on socialmixr, including how to download surveys.

Co-authored-by: Sebastian Funk <[email protected]>

@adamkucharski

majority text suggestion from @adamkucharski

@adamkucharski

text by @adamkucharski

now that the pre-transpose matrix is defined as Cij, the post transpose matrix used in the ODEs should be Cji

Co-authored-by: Adam Kucharski <[email protected]>

C is now the model term for the contact matrix, which is the transpose of `contact_matrix$data`

Co-authored-by: Sebastian Funk <[email protected]>

amanda-minter requested review from sbfnk, adamkucharski and avallecam November 7, 2024 14:32

github-actions bot pushed a commit that referenced this pull request Nov 7, 2024

differences for PR #63

82a48e8

amanda-minter requested a review from Degoot-AM November 7, 2024 15:40

github-actions bot pushed a commit that referenced this pull request Nov 7, 2024

differences for PR #63

cc89750

adamkucharski reviewed Nov 8, 2024

View reviewed changes

github-actions bot pushed a commit that referenced this pull request Nov 12, 2024

differences for PR #63

bd0f82e

github-actions bot pushed a commit that referenced this pull request Nov 14, 2024

differences for PR #63

01a9ef2

github-actions bot pushed a commit that referenced this pull request Nov 15, 2024

differences for PR #63

8cc45cd

amanda-minter marked this pull request as ready for review November 15, 2024 08:51

amanda-minter mentioned this pull request Nov 15, 2024

Clarify how pre-processing steps are related to model_default equations epiverse-trace/epidemics#257

Open

sbfnk approved these changes Nov 26, 2024

View reviewed changes

github-actions bot pushed a commit that referenced this pull request Nov 28, 2024

differences for PR #63

fd41ba4

github-actions bot pushed a commit that referenced this pull request Nov 28, 2024

differences for PR #63

aef3ed9

github-actions bot pushed a commit that referenced this pull request Nov 29, 2024

differences for PR #63

99d34bd

github-actions bot pushed a commit that referenced this pull request Nov 29, 2024

differences for PR #63

8d8582b

github-actions bot pushed a commit that referenced this pull request Nov 29, 2024

differences for PR #63

42736f7

sbfnk reviewed Dec 2, 2024

View reviewed changes

github-actions bot pushed a commit that referenced this pull request Dec 5, 2024

differences for PR #63

af70bce

github-actions bot pushed a commit that referenced this pull request Dec 5, 2024

differences for PR #63

7deafed

adamkucharski reviewed Jan 20, 2025

View reviewed changes

github-actions bot pushed a commit that referenced this pull request Jan 21, 2025

differences for PR #63

34f674d

avallecam mentioned this pull request Jan 28, 2025

Set up dependabot #78

Merged

amanda-minter added 5 commits January 29, 2025 10:57

add template for new episode

a9d785b

add episode to appear first

2ba2315

move some content between episodes

ba1c45a

moving some contact matrix content from `simulating-transmission.Rmd` to`contact-matrices.Rmd`

add content on SIR versus age structure SIR

937ecd9

fixes #47

add callout on how to normalise a matrix

cc5b670

amanda-minter and others added 26 commits January 29, 2025 10:57

add section on socialmixr

a385554

added section on socialmixr, including how to download surveys.

edit and move normalisation callout

d9062bc

add link to simulating transmission

eb0c077

edits to text

ef53dbb

add list of example analyses

09c4c61

delete trailing whitespace

ff81e66

Apply suggestions from code review

4f0502e

Co-authored-by: Sebastian Funk <[email protected]>

add callout on synthetic matrices

13ad53c

add edits by @adamkucharski

630ffd9

add additional detail on normalisation

db83d88

majority text suggestion from @adamkucharski

lint file

4210300

add text on contact matrix conversions

fafedb0

text by @adamkucharski

fix contact matrix notation

5d54b01

now that the pre-transpose matrix is defined as Cij, the post transpose matrix used in the ODEs should be Cji

minor edit to text

8a82cc2

update teaching times

f51a894

Apply suggestions from code review

3a35d3b

Co-authored-by: Adam Kucharski <[email protected]>

update contact matrix notation

63e830f

C is now the model term for the contact matrix, which is the transpose of `contact_matrix$data`

update equations for model_default in relevant episodes

c5b4e77

Update episodes/contact-matrices.Rmd

d820963

Co-authored-by: Sebastian Funk <[email protected]>

Update contact-matrices.Rmd

4aaad03

add callout on splitting contact matrices using {socialmixr}

c1e151b

Update contact-matrices.Rmd

f018952

add callout to simulating transmission on normalisation

9aadf08

Update episodes/contact-matrices.Rmd

7e82965

Co-authored-by: Sebastian Funk <[email protected]>

add callout on notation

1653fcd

Update contact-matrices.Rmd

86d52fd

github-actions bot pushed a commit that referenced this pull request Jan 29, 2025

differences for PR #63

965943f

avallecam merged commit 4b89549 into epiverse-trace:main Jan 30, 2025
4 checks passed

adamkucharski mentioned this pull request Feb 4, 2025

Move contact matrix transpose inside model function? epiverse-trace/epidemics#259

Open

sbfnk mentioned this pull request Mar 3, 2025

from/to, contacting/contacted socialcontactdata/contactmatrix#14

Open


		Rather than just using the raw number of contacts, we can instead normalise the contact matrix to make it easier to work in terms of $R_0$. Normalisation means converting to a value to be between 0 and 1. In particular, we normalise the matrix by scaling it so that if we were to calculate the average number of secondary cases based on this normalised matrix, the result would be 1 (in mathematical terms, we are scaling the matrix so the largest eigenvalue is 1). This transformation scales the entries but preserves their relative values.

		In the case of the above model, we want to define $\beta C_{i,j}$ so that the model has a specified valued of $R_0$. $C[i,j]$ is defined as contacts to $i$ from $j$, which is equivalent to `contact_data$matrix[j,i]` so the first step is to transpose the contact data matrix (`contact_data$matrix`) so the row/column entries are now in the form $C[i,j]$. Then we normalise the matrix $C$ so the maximum eigenvalue is one and call this matrix $C_normalised$. Because the rate of recovery is $\gamma$, individuals will be infectious on average for $1/\gamma$ days. So $\beta$ is calculated from the scaling factor and the value of $\gamma$ (i.e. mathematically we have the dominant eigenvalue of the matrix $R_0 \times C_{normalised}$ is $\beta / \gamma$).

add episode on contact matrices #63

add episode on contact matrices #63

Uh oh!

Conversation

amanda-minter commented Nov 7, 2024

Uh oh!

github-actions bot commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thank you!

Rendered Changes

Uh oh!

adamkucharski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbfnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbfnk Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbfnk left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbfnk Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamkucharski Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2024 •

edited

Loading

sbfnk Dec 2, 2024 •

edited

Loading

sbfnk left a comment •

edited

Loading

sbfnk Dec 20, 2024 •

edited

Loading

adamkucharski Jan 20, 2025 •

edited

Loading