forked from epiverse-trace/linelist
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
232 lines (159 loc) · 7.45 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r readmesetup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# *linelist*: Tagging and Validating Epidemiological Data <img src="man/figures/linelist_hex.png" align="right" width="120" />
<!-- badges: start -->
[![Digital Public Good](https://raw.githubusercontent.com/epiverse-trace/linelist/main/man/figures/dpg_badge.png)](https://digitalpublicgoods.net/registry/linelist.html)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![cran-check](https://cranchecks.info/badges/summary/linelist)](https://cran.r-project.org/web/checks/check_results_linelist.html)
[![R-CMD-check](https://github.com/epiverse-trace/linelist/workflows/R-CMD-check/badge.svg)](https://github.com/epiverse-trace/linelist/actions)
[![codecov](https://codecov.io/gh/epiverse-trace/linelist/branch/main/graph/badge.svg?token=JGTCEY0W02)](https://codecov.io/gh/epiverse-trace/linelist)
[![lifecycle-experimental](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-experimental.svg)](https://www.reconverse.org/lifecycle.html#experimental)
[![month-download](https://cranlogs.r-pkg.org/badges/linelist)](https://cran.r-project.org/package=linelist)
[![total-download](https://cranlogs.r-pkg.org/badges/grand-total/linelist)](https://cran.r-project.org/package=linelist)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6556047.svg)](https://doi.org/10.5281/zenodo.6556047)
<!-- badges: end -->
*linelist* provides a safe entry point to the *Epiverse* software ecosystem,
adding a foundational layer through *tagging*, *validation*, and *safeguarding*
epidemiological data, to help make data pipelines more straightforward and
robust.
## Installation
### Stable version
Our stable versions are released on CRAN, and can be installed using:
```{r, eval=FALSE}
install.packages("linelist", build_vignettes = TRUE)
```
### Development version
The development version of *linelist* can be installed from
[GitHub](https://github.com/) with:
```{r, eval=FALSE}
if (!require(remotes)) {
install.packages("remotes")
}
remotes::install_github("epiverse-trace/linelist", build_vignettes = TRUE)
```
## Usage
<img src="man/figures/linelist_infographics.png" width="60%" />
*linelist* works by tagging key epidemiological data in a `data.frame` or a
`tibble` to facilitate and strengthen data pipelines. The resulting object is a
`linelist` object, which extends `data.frame` (or `tibble`) by providing three
types of features:
1. a **tagging system** to identify key data, enabling access to these data using
their tags rather than actual names, which may change over time and across
datasets
2. **validation** of the tagged variables (making sure they are present and of the
right type/class)
3. **safeguards** against accidental losses of tagged variables in common data
handling operations
The short example below illustrates these different features. See the
[Documentation](#Documentation) section for more in-depth examples and details
about `linelist` objects.
```{r error = TRUE, fig.height = 4, fig.width = 8, out.width = "60%"}
# load packages and a dataset for the example
# -------------------------------------------
library(pacman)
p_load(dplyr)
p_load(magrittr)
p_load(outbreaks)
p_load(incidence2)
p_load(linelist)
dataset <- outbreaks::mers_korea_2015$linelist
head(dataset)
# check known tagged variables
# ----------------------------
tags_names()
# build a linelist
# ----------------
x <- dataset %>%
tibble() %>%
make_linelist(date_onset = "dt_onset", # date of onset
date_reporting = "dt_report", # date of reporting
occupation = "age" # mistake
)
x
tags(x) # check available tags
# validation of tagged variables
# ------------------------------
## (this flags a likely mistake: occupation should not be an integer)
validate_linelist(x)
# change tags: fix mistakes, add new ones
# ---------------------------------------
x <- x %>%
set_tags(occupation = NULL, # tag removal
gender = "sex", # new tag
outcome = "outcome"
)
# safeguards against actions losing tags
# --------------------------------------
## attemping to remove geographical info but removing dates by mistake
x_no_geo <- x %>%
select(-(5:8))
## for stronger pipelines, trigger errors upon loss
x_no_geo <- x %>%
lost_tags_action("error") %>%
select(-(5:8))
x_no_geo <- x %>%
select(-(5:7))
## to revert to default behaviour (warning upon error)
lost_tags_action()
# access content by tags, and build downstream pipelines
# ------------------------------------------------------
x_no_geo %>%
select_tags(date_onset, outcome)
x_no_geo %>%
tags_df()
x_no_geo %>%
tags_df() %>%
incidence("date_onset", groups = c("gender", "outcome")) %>%
facet_plot(facets = "gender", fill = outcome)
```
## Documentation
More detailed documentation can be found at:
https://epiverse-trace.github.io/linelist/
In particular:
* A general introduction to *linelist* ([link](https://epiverse-trace.github.io/linelist/articles/linelist_introduction.html))
* The reference manual ([link](https://epiverse-trace.github.io/linelist/reference/index.html))
## Getting help
To ask questions or give us some feedback, please use the github
[issues](https://github.com/epiverse-trace/linelist/issues) system.
## Data privacy
Case line lists may contain personally identifiable information (PII). While
*linelist* provides a way to store this data in R, it does not currently provide
tools for data anonymization. The user is responsible for respecting individual
privacy and ensuring PII is handled with the required level of confidentiality,
in compliance with applicable laws and regulations for storing and sharing PII.
Note that PII is rarely needed for common analytics tasks, so that in many
instances it may be advisable to remove PII from the data before sharing them
with analytics teams.
## Development
### Lifecycle
This package is currently *experimental*, as defined by the [RECON software
lifecycle](https://www.reconverse.org/lifecycle.html). This means that essential
features and mechanisms should not change drastically, but depending on user
feedback, functions may be renamed, arguments may change, some functionalities
may be added, etc.
### Contributions
Contributions are welcome via [pull requests](https://github.com/epiverse-trace/linelist/pulls).
Contributors to the project include:
* Thibaut Jombart (author)
* David Mascarina (logo)
* Emma Marty (communication)
### Code of Conduct
Please note that the linelist project is released with a
[Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
### Notes
This package is a reboot of the RECON package
[linelist](https://github.com/reconhub/linelist). Unliked its predecessor, the
new package focuses on the implementation of a *linelist* class. The data
cleaning features of the original package will eventually be re-implemented for
*linelist* objects, albeit likely in a separate package.