-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
143 lines (101 loc) · 6.4 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
fig.align = "center",
dpi = 300,
cache.path = "README_cache/"
)
```
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/dvir)](https://CRAN.R-project.org/package=dvir)
[![](https://cranlogs.r-pkg.org/badges/grand-total/dvir?color=yellow)](https://cran.r-project.org/package=dvir)
[![](https://cranlogs.r-pkg.org/badges/last-month/dvir?color=yellow)](https://cran.r-project.org/package=dvir)
<!-- badges: end -->
# Disaster Victim Identification in R
The purpose of **dvir** is to implement state-of-the-art algorithms for DNA-based disaster victim identification (DVI). In particular, **dvir** performs *joint* identification of multiple victims.
The methodology and algorithms of **dvir** are described in [Vigeland & Egeland (2021): DNA-based Disaster Victim Identification](https://www.researchsquare.com/article/rs-296414/v1).
The **dvir** package is part of the [pedsuite](https://magnusdv.github.io/pedsuite/), a collection of R packages for pedigree analysis. Much of the machinery behind **dvir** is imported from other pedsuite packages, especially **pedtools** for handling pedigree data, and **forrel** for the calculation of likelihood ratios. A comprehensive presentation of these packages, and much more, can be found in the book [Pedigree Analysis in R](https://shop.elsevier.com/books/pedigree-analysis-in-r/vigeland/978-0-12-824430-2).
## Installation
To get the current official version of **dvir**, install from CRAN as follows:
```{r, eval = FALSE}
install.packages("dvir")
```
Alternatively, the latest development version may be obtained from GitHub:
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("magnusdv/dvir")
```
## Tutorial example
In the following we will use a toy DVI example from the [paper](https://www.researchsquare.com/article/rs-296414/v1) (see above) to illustrate how to use **dvir**.
To get started, we load the **dvir** package.
```{r}
library(dvir)
```
### Introduction
We consider the DVI problem shown below, in which three victim samples (V1, V2, V3) are to be matched against three missing persons (M1, M2, M3) belonging to two different families.
```{r example-plot1, echo = FALSE, fig.height = 2.9, out.width = "85%"}
plotDVI(example2, marker = 1, titles = c("PM data", "AM data"), cex = 1.1, frames = FALSE)
```
The hatched symbols indicate genotyped individuals. In this simple example we consider only a single marker, with 10 equifrequent alleles denoted 1, 2,..., 10. The available genotypes are shown in the figure.
DNA profiles from victims are generally referred to as _post mortem_ (PM) data, while the _ante mortem_ (AM) data contains profiles from the reference individuals R1 and R2.
### Assignments
A possible solution to the DVI problem is called an _assignment_. In our toy example, there are _a priori_ 14 possible assignments, which can be listed as follows:
```{r, echo = F}
generatePairings(example2) |> expand.grid.nodup()
```
Each row indicates the missing persons corresponding to V1, V2 and V3 (in that order) with `*` meaning _not identified_. For example, the first line contains the _null model_ corresponding to none of the victims being identified, while the last line gives the assignment where `(V1, V2, V3) = (M1, M2, M3)`,
For each assignment `a` we can calculate the likelihood, denoted `L(a)`. The null likelihood is denoted `L0`.
### Goals
We consider the following to be two of the main goals in the analysis of a DVI case with multiple missing persons:
1) Rank the assignments according to how likely they are.
We measure this by calculating the LR comparing each assignment `a` to the null model: `LR = L(a)/L0`.
1) Find the *posterior pairing probabilities* `P(Vi = Mj | data)` for all combinations of i and j,
and the *posterior non-pairing probabilities* `P(Vi = '*' | data)` for all i.
### The data
The pedigrees and genotypes for this toy example are available within **dvir** as a built-in dataset, under the name `example2`.
```{r}
example2
```
Internally, all DVI datasets in **dvir** have the structure of a list, with elements `pm` (the victim data), `am` (the reference data) and `missing` (a vector naming the missing persons):
We can inspect the data by printing each object. For instance, in this case `am` is a list of two pedigrees:
```{r}
example2$am
```
Note that the two pedigrees are printed in so-called _ped format_, with columns `id` (ID label), `fid` (father), `mid` (mother), `sex` (1 = male; 2 = female) and `L1` (genotypes at locus `L1`).
The code generating this dataset can be found in the github repository of **dvir**, more specifically here: https://github.com/magnusdv/dvir/blob/master/data-raw/example2.R.
A great way to inspect a DVI dataset is to plot it with the function `plotDVI()`.
```{r example-plot2, fig.height = 2.5, fig.width = 6, out.width = "80%"}
plotDVI(example2)
```
The `plotDVI()` function offers many parameters for tweaking the plot; see the help page `?plotDVI()` for details.
### Joint identification
The `jointDVI()` function performs joint identification of all three victims, given the data. It returns a data frame ranking all assignments with nonzero likelihood:
```{r}
jointRes = jointDVI(example2, verbose = FALSE)
# Print the result
jointRes
```
The output shows that the most likely joint solution is (V1, V2, V3) = (M1, M2, M3), with an LR of 250 compared
to the null model.
The function `plotSolution()` shows the most likely solution:
```{r solution, fig.height = 3, fig.width = 7, out.width = "75%"}
plotSolution(example2, jointRes, marker = 1)
```
By default, the plot displays the assignment in the first row of `jointRes`. To examine the second most likely, add `k = 2` (and so on to go further down the list).
### Posterior pairing probabilities
Next, we compute the posterior pairing (and non-pairing) probabilities. This is done by feeding the output from `jointDVI()` into the function `Bmarginal()`.
```{r}
Bmarginal(jointRes, example2$missing, prior = NULL)
```
Here we used a default flat prior for simplicity, assigning equal prior probabilities to all assignments.
we see that the posterior pairing probabilities for the most likely solution are
* _P_(V1 = M1 | data) = 0.88,
* _P_(V2 = M2 | data) = 0.95,
* _P_(V3 = M2 | data) = 0.83.
----