Skip to content

Commit ebb3990

Browse files
authored
Merge pull request #1 from ecohealthalliance/dev
Merge dev to master for LEMIS v2.0.0 update
2 parents 85eb4cc + 6a1157f commit ebb3990

27 files changed

+2548
-196
lines changed

.circleci/config.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,18 @@ jobs:
33
build:
44
working_directory: ~/main
55
docker:
6-
- image: rocker/verse:3.4.0
6+
- image: rocker/verse:latest
77
steps:
88
- checkout
99
- restore_cache:
1010
keys:
11-
- deps1-{{ .Branch }}-{{ checksum "DESCRIPTION" }}-{{ checksum ".circleci/config.yml" }}
12-
- deps1-{{ .Branch }}
13-
- deps1-
11+
- deps1-$R_VERSION-{{ .Branch }}-{{ checksum "DESCRIPTION" }}-{{ checksum ".circleci/config.yml" }}
12+
- deps1-$R_VERSION-{{ .Branch }}
13+
- deps1-$R_VERSION
1414
- run:
1515
command: |
16-
R -e "devtools::install_deps(dependencies = TRUE)"
17-
R -e "devtools::install_github('MangoTheCat/goodpractice')"
16+
R -e "devtools::install_deps(dependencies=TRUE)"
17+
R -e "devtools::install_cran('goodpractice')"
1818
- run:
1919
command: |
2020
R -e 'devtools::check()'

DESCRIPTION

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
Package: lemis
22
Type: Package
33
Title: The LEMIS Wildlife Trade Database
4-
Version: 1.0.0
4+
Version: 2.0.0
55
Authors@R: c(
66
person("Noam", "Ross", , "[email protected]", role = c("aut", "cre")),
77
person("Allison", "White", , "[email protected]", role = c("aut")),
88
person("Carlos", "Zambrana-Torrelio", , "[email protected]", role = c("aut")),
9+
person("Evan", "Eskew", , "[email protected]", role = c("aut")),
910
person("EcoHealth Alliance", role="cph"))
1011
Description: Provides cleaned data and metadata from the US Fish and Wildlife
1112
Service's data on wildlife and wildlife product imports and exports.
@@ -23,7 +24,7 @@ Imports:
2324
htmlwidgets,
2425
stringi
2526
Remotes:
26-
ropenscilabs/datastorr@i14-private-repos,
27+
ropenscilabs/datastorr,
2728
krlmlr/fstplyr
2829
Roxygen: list(markdown = TRUE)
2930
RoxygenNote: 6.0.1
@@ -37,5 +38,6 @@ Suggests:
3738
testthat,
3839
lintr,
3940
taxize,
40-
tidyverse
41+
dplyr,
42+
readr
4143
VignetteBuilder: knitr

NEWS.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
# lemis 2.0.0
2+
3+
This is the first major update to the **lemis** package. Changes include:
4+
5+
* Addition of data from late 2013 and all of 2014.
6+
* A major reorganization of **lemis** data processing for release. Briefly, all data importation and cleaning steps are now automated with scripts located in the `data-raw/` subdirectory. While this should be mostly irrelevant to the end user, it means that all data processing code is now fully contained within the **lemis** package repository. This should make it easier to incorporate future data into the pipeline.
7+
* Improved error handling for non-standard data values. Previously, some records with non-standard values in specific fields were dropped from the data. Now the **lemis** cleaning workflow incorporates error checking for non-standard values across all the fields of data for which valid values are described in USFWS spreadsheets. Non-standard values are converted to `non-standard value` (as opposed to being converted to `NA` or dropped) and a `cleaning_notes` column has been added to describe the original value.
8+
* Note that data versions from previous releases can still be had with `lemis_data('v1.0.0')`
9+
10+
## Bug fixes
11+
12+
* Browsable tables in HTML should now work under systems with all pandoc versions.
13+
14+
## Minor changes
15+
16+
* Reduced dependencies by removing tidyverse package
17+
* Updated test infrastructure to R 3.5.1
18+
119
# lemis 1.0.0
220

321
* Initial release

R/codes.R

Lines changed: 27 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
#' if(in_pkgdown) {
2020
#' mytext <- c('In RStudio, this help file includes a searchable table of values.')
2121
#' } else {
22-
#' mytext <- lemis:::rd_datatable(lemis:::lemis_codes())
22+
#' mytext <- lemis:::rd_datatable(lemis::lemis_codes())
2323
#' }
2424
#' mytext
2525
#' }
@@ -49,7 +49,7 @@ lemis_codes <- function() {
4949
#' if(in_pkgdown) {
5050
#' mytext <- c('In RStudio, this help file includes a searchable table of values.')
5151
#' } else {
52-
#' mytext <- lemis:::rd_datatable(lemis:::lemis_metadata())
52+
#' mytext <- lemis:::rd_datatable(lemis::lemis_metadata())
5353
#' }
5454
#' mytext
5555
#' }
@@ -90,35 +90,42 @@ tabular <- function(df, col_names = TRUE, ...) {
9090

9191
paste(
9292
"\\tabular{", paste(col_align, collapse = ""), "}{\n ",
93-
contents, "\n}\n", sep = ""
93+
contents, "\n}\n",
94+
sep = ""
9495
)
9596
}
9697

97-
98-
#'@importFrom DT datatable
99-
#'@noRd
100-
rd_datatable <- function(df, width="100%", ...) {
101-
wrap_widget(datatable(df, width=width, ...))
98+
#' @importFrom DT datatable
99+
#' @noRd
100+
rd_datatable <- function(df, width = "100%", ...) {
101+
wrap_widget(datatable(df, width = width, ...))
102102
}
103103

104-
#'@importFrom stringi stri_subset_regex
105-
#'@importFrom htmlwidgets saveWidget
106-
#'@noRd
104+
#' @importFrom stringi stri_subset_regex
105+
#' @importFrom htmlwidgets saveWidget
106+
#' @noRd
107107
wrap_widget <- function(widget) {
108-
tmp <- tempfile(fileext=".html")
109-
saveWidget(widget, tmp)
110-
widg <- paste(stringi::stri_subset_regex(readLines(tmp), "^</?(!DOCTYPE|meta|body|html|head|title)",negate=TRUE), collapse="\n")
111-
paste('\\out{', escape_rd(widg), '}\n', sep="\n")
108+
tmp <- tempfile(fileext = ".html")
109+
htmlwidgets::saveWidget(widget, tmp)
110+
widg <- paste(
111+
stringi::stri_subset_regex(readLines(tmp),
112+
"^</?(!DOCTYPE|meta|body|html|head|title)",
113+
negate = TRUE),
114+
collapse = "\n")
115+
paste("\\out{", escape_rd(widg), "}\n", sep = "\n")
112116
}
113117

114-
#'@importFrom stringi stri_replace_all_fixed
115-
#'@noRd
118+
#' @importFrom stringi stri_replace_all_fixed
119+
#' @noRd
116120
escape_rd <- function(x) {
117121
stri_replace_all_fixed(
118122
stri_replace_all_fixed(
119123
stri_replace_all_fixed(
120124
stri_replace_all_fixed(x, "\\", "\\\\"),
121-
"%", "\\%"),
122-
"{", "\\{"),
123-
"}", "\\}")
125+
"%", "\\%"
126+
),
127+
"{", "\\{"
128+
),
129+
"}", "\\}"
130+
)
124131
}

R/datastorr.R

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
#' use a different path).
2222
#'
2323
#' @export
24-
lemis_data <- function(version=NULL, path=NULL) {
24+
lemis_data <- function(version = NULL, path = NULL) {
2525
datastorr::github_release_get(lemis_info(path), version)
2626
}
2727

@@ -35,19 +35,19 @@ lemis_data <- function(version=NULL, path=NULL) {
3535
#' for the most recent github version.
3636
#'
3737
#' @importFrom datastorr github_release_versions
38-
lemis_versions <- function(local=TRUE, path=NULL) {
38+
lemis_versions <- function(local = TRUE, path = NULL) {
3939
datastorr::github_release_versions(lemis_info(path), local)
4040
}
4141

4242
#' @export
4343
#' @rdname lemis_data
44-
lemis_version_current <- function(local=TRUE, path=NULL) {
44+
lemis_version_current <- function(local = TRUE, path = NULL) {
4545
datastorr::github_release_version_current(lemis_info(path), local)
4646
}
4747

4848
#' @export
4949
#' @rdname lemis_data
50-
lemis_del <- function(version, path=NULL) {
50+
lemis_del <- function(version, path = NULL) {
5151
datastorr::github_release_del(lemis_info(path), version)
5252
}
5353

@@ -70,6 +70,6 @@ lemis_info <- function(path) {
7070
#' @title Make a data release.
7171
#' @param ... Parameters passed through to \code{\link{github_release_create}}
7272
#' @param path Path to the data (see \code{\link{lemis}}).
73-
lemis_release <- function(..., path=NULL) {
73+
lemis_release <- function(..., path = NULL) {
7474
datastorr::github_release_create(lemis_info(path), ...)
7575
}

R/sysdata.rda

-62 Bytes
Binary file not shown.

README.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@ output: github_document
44

55
<!-- README.md is generated from README.Rmd. Please edit that file -->
66

7-
[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=svg&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
8-
97
```{r setup, include = FALSE}
108
knitr::opts_chunk$set(
119
collapse = TRUE,
@@ -18,6 +16,8 @@ library(magrittr)
1816

1917
# lemis
2018

19+
[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=shield&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
20+
2121
```{r authors, echo = FALSE, results = 'asis'}
2222
unclass(desc::desc_get_authors(here::here("DESCRIPTION"))) %>%
2323
purrr::keep(~"aut" %in% .$role) %>%
@@ -60,7 +60,7 @@ Our [paper (Smith et. al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC53
6060

6161
## About
6262

63-
Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues)
63+
Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues).
6464

6565
**lemis** is developed at [EcoHealth Alliance](https://github.com/ecohealthalliance). Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
6666

README.md

Lines changed: 54 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,48 @@
11

22
<!-- README.md is generated from README.Rmd. Please edit that file -->
3-
[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=svg&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
43

5-
lemis
6-
=====
4+
# lemis
75

8-
Authors: *Noam Ross, Allison White, Carlos Zambrana-Torrelio*
6+
[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=shield&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
97

10-
The **lemis** package provides access to U.S. Fish and Wildlife Service (USFWS) data on wildlife and wildlife product imports to and exports from the United States. This data was obtained via more than 14 years of Freedom of Information Act (FOIA) requests by EcoHealth Alliance.
8+
Authors: *Noam Ross, Allison White, Carlos Zambrana-Torrelio, Evan
9+
Eskew*
1110

12-
Installation
13-
------------
11+
The **lemis** package provides access to U.S. Fish and Wildlife Service
12+
(USFWS) data on wildlife and wildlife product imports to and exports
13+
from the United States. This data was obtained via more than 14 years of
14+
Freedom of Information Act (FOIA) requests by EcoHealth Alliance.
15+
16+
## Installation
1417

1518
Install the **lemis** package with this command:
1619

1720
``` r
1821
source("https://install-github.me/ecohealthalliance/lemis")
1922
```
2023

21-
As this is currently a private repository, you must have a GitHub personal access token set up to install and use the package. Instructions for this can be found [here](http://happygitwithr.com/github-pat.html#step-by-step).
24+
As this is currently a private repository, you must have a GitHub
25+
personal access token set up to install and use the package.
26+
Instructions for this can be found
27+
[here](http://happygitwithr.com/github-pat.html#step-by-step).
2228

23-
Usage
24-
-----
29+
## Usage
2530

26-
The main function in **lemis** is `lemis_data()`. This returns the main cleaned LEMIS database as a **dplyr** tibble.
31+
The main function in **lemis** is `lemis_data()`. This returns the main
32+
cleaned LEMIS database as a **dplyr** tibble.
2733

28-
**lemis** makes use of [**datastorr**](https://github.com/ropenscilabs/datastorr) to manage data download. The first time you run `lemis_data()`, the package will download the most recent version of the database (~160 MB at present). Subsequent calls will load the database from storage on your computer.
34+
**lemis** makes use of
35+
[**datastorr**](https://github.com/ropenscilabs/datastorr) to manage
36+
data download. The first time you run `lemis_data()`, the package will
37+
download the most recent version of the database (~160 MB at present).
38+
Subsequent calls will load the database from storage on your computer.
2939

30-
The LEMIS database is stored as an efficiently compressed [`.fst` file](https://github.com/fstpackage/fst), and loading it loads it a [remote dplyr source](https://github.com/krlmlr/fstplyr). This means that it does not load fully into memory but can be filtered and manipulated on-disk. If you wish to manipulate it as a data frame, simply call `dplyr::collect()` to load it fully into memory, like so:
40+
The LEMIS database is stored as an efficiently compressed [`.fst`
41+
file](https://github.com/fstpackage/fst), and loading it loads it a
42+
[remote dplyr source](https://github.com/krlmlr/fstplyr). This means
43+
that it does not load fully into memory but can be filtered and
44+
manipulated on-disk. If you wish to manipulate it as a data frame,
45+
simply call `dplyr::collect()` to load it fully into memory, like so:
3146

3247
``` r
3348
all_lemis <- lemis_data() %>%
@@ -36,15 +51,31 @@ all_lemis <- lemis_data() %>%
3651

3752
Note that the full database will be ~1 GB in memory.
3853

39-
`lemis_codes()` returns a data frame with descriptions of the codes used by USFWS in the various columns of `lemis_data()`. This is useful for lookup or joining with the main data for more descriptive outputs. The `?lemis_code` help file also has a searchable table of these codes.
40-
41-
Our [paper (Smith et. al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357285/) provides a broader introduction to this data and its relevance to infectious disease. See the [vignette](https://github.com/ecohealthalliance/lemis/tree/master/inst/doc/the-lemis-database.md) for a more in-depth tutorial and example use cases for the package. See the [developer README](https://github.com/ecohealthalliance/lemis/tree/master/data-raw/README.md) for more on the data cleaning process.
42-
43-
About
44-
-----
45-
46-
Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues)
47-
48-
**lemis** is developed at [EcoHealth Alliance](https://github.com/ecohealthalliance). Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
54+
`lemis_codes()` returns a data frame with descriptions of the codes used
55+
by USFWS in the various columns of `lemis_data()`. This is useful for
56+
lookup or joining with the main data for more descriptive outputs. The
57+
`?lemis_code` help file also has a searchable table of these codes.
58+
59+
Our [paper (Smith et.
60+
al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357285/)
61+
provides a broader introduction to this data and its relevance to
62+
infectious disease. See the
63+
[vignette](https://github.com/ecohealthalliance/lemis/tree/master/inst/doc/the-lemis-database.md)
64+
for a more in-depth tutorial and example use cases for the package. See
65+
the [developer
66+
README](https://github.com/ecohealthalliance/lemis/tree/master/data-raw/README.md)
67+
for more on the data cleaning process.
68+
69+
## About
70+
71+
Please give us feedback or ask questions by filing
72+
[issues](https://github.com/ecohealthalliance/lemis/issues).
73+
74+
**lemis** is developed at [EcoHealth
75+
Alliance](https://github.com/ecohealthalliance). Please note that this
76+
project is released with a [Contributor Code of
77+
Conduct](CODE_OF_CONDUCT.md). By participating in this project, you
78+
agree to abide by its
79+
terms.
4980

5081
[![http://www.ecohealthalliance.org/](inst/figs/eha-footer.png)](http://www.ecohealthalliance.org/)

data-raw/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# Data processing for **lemis**
22

3-
The scripts in this directory process the data for the **lemis** package.
3+
The scripts in this directory import, clean, and process the data for the **lemis** package. While the initial version of **lemis** relied on the related [WildDB](https://github.com/ecohealthalliance/WildDB/) repository for data cleaning, as of **lemis** v2.0.0, these processing steps have been moved to the `data-raw/` subdirectory of the **lemis** package repository for consistency and ease of use. The **lemis** data preparation workflow requires execution of three scripts in succession:
44

5-
`scrape_codes.R` uses the **tabulizer** package to extract codes from
6-
the PDF codebook to generate `lemis_codes()` and `lemis_metadata()`.
5+
- `import_lemis.R` downloads a local copy of all raw LEMIS files, which are kept on an [Amazon Web Services S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/eha.wild.db/). These data consist of multiple Excel files for a given year of FOIA requests. In addition, each spreadsheet file may contain different numbers of sheets with relevant LEMIS data. `import_lemis.R` imports all of this raw data (in `data-raw/by_year/`) and merges all data into yearly CSV files (in `data-raw/csv_by_year/`).
76

8-
Raw LEMIS files are kept on AWS S3 at <https://s3.console.aws.amazon.com/s3/buckets/eha.wild.db/>. The `cleaned_data` directory there contains the data after extracting from XLS and going through some basic processing. The processing changes from year to year and so is not automated for all files. It is described in detail at <https://github.com/ecohealthalliance/WildDB/tree/master/scripts/data_cleaning>.
7+
- `clean_lemis.R` merges the yearly LEMIS CSV files into a single dataframe of all LEMIS data and performs various cleaning steps. Following generation of the single cleaned LEMIS data file, the local copies of intermediate files in `data-raw/by_year/` and `data-raw/csv_by_year/` can be safely deleted (since they can always be regenerated from `import_lemis.R` and `clean_lemis.R`).
98

10-
`process_lemis.R` imports the cleaned LEMIS data and performs a few more processing tasks before compressing the data into an `.fst` file. These should be moved upstream into the base data cleaning at the next iteration.
9+
- `process_lemis.R` processes the cleaned LEMIS data for use in a **lemis** package release by compressing the data into an `.fst` file.
1110

12-
Once the `.fst` file is generated, it can be attached to this package as a release using `datastorr::github_release_create`. Please read the help file for this function before doing so. Also, before release, one should update the package version in `DESCRIPTION` and commit all changes to GitHub.
11+
In addition, there is the script `scrape_codes.R`, which uses the **tabulizer** package to extract codes from the PDF codebook to generate `lemis_codes()` and `lemis_metadata()`.
1312

14-
v1.0.0 of **lemis** has the 2000-2013 data set.
13+
Once the `.fst` file is generated, it can be attached to the package as a release using `datastorr::github_release_create()`. Please read the help file for this function before doing so. Also, before release, one should update the package version in `DESCRIPTION` and commit all changes to GitHub.
14+
15+
v2.0.0 of **lemis** has the 2000-2014 data set.

0 commit comments

Comments
 (0)