eveskew
diff --git a/‎.circleci/config.yml
Lines changed: 6 additions & 6 deletions b/‎.circleci/config.yml
Lines changed: 6 additions & 6 deletions
diff --git a/‎DESCRIPTION
Lines changed: 5 additions & 3 deletions b/‎DESCRIPTION
Lines changed: 5 additions & 3 deletions
diff --git a/‎NEWS.md
Lines changed: 18 additions & 0 deletions b/‎NEWS.md
Lines changed: 18 additions & 0 deletions
diff --git a/‎R/codes.R
Lines changed: 27 additions & 20 deletions b/‎R/codes.R
Lines changed: 27 additions & 20 deletions
diff --git a/‎R/datastorr.R
Lines changed: 5 additions & 5 deletions b/‎R/datastorr.R
Lines changed: 5 additions & 5 deletions
diff --git a/‎R/sysdata.rda
-62 Bytes b/‎R/sysdata.rda
-62 Bytes
diff --git a/‎README.Rmd
Lines changed: 3 additions & 3 deletions b/‎README.Rmd
Lines changed: 3 additions & 3 deletions
diff --git a/‎README.md
Lines changed: 54 additions & 23 deletions b/‎README.md
Lines changed: 54 additions & 23 deletions
diff --git a/‎data-raw/README.md
Lines changed: 8 additions & 7 deletions b/‎data-raw/README.md
Lines changed: 8 additions & 7 deletions
@@ -3,18 +3,18 @@ jobs:
   build:
     working_directory: ~/main
     docker:
-      - image: rocker/verse:3.4.0
+      - image: rocker/verse:latest
     steps:
       - checkout
       - restore_cache:
           keys:
-            - deps1-{{ .Branch }}-{{ checksum "DESCRIPTION" }}-{{ checksum ".circleci/config.yml" }}
-            - deps1-{{ .Branch }}
-            - deps1-
+            - deps1-$R_VERSION-{{ .Branch }}-{{ checksum "DESCRIPTION" }}-{{ checksum ".circleci/config.yml" }}
+            - deps1-$R_VERSION-{{ .Branch }}
+            - deps1-$R_VERSION
       - run:
           command: |
-            R -e "devtools::install_deps(dependencies = TRUE)"
-            R -e "devtools::install_github('MangoTheCat/goodpractice')"
+            R -e "devtools::install_deps(dependencies=TRUE)"
+            R -e "devtools::install_cran('goodpractice')"
       - run:
           command: |
             R -e 'devtools::check()'
 
@@ -1,11 +1,12 @@
 Package: lemis
 Type: Package
 Title: The LEMIS Wildlife Trade Database
-Version: 1.0.0
+Version: 2.0.0
 Authors@R: c(
     person("Noam", "Ross", , "[email protected]", role = c("aut", "cre")),
     person("Allison", "White", , "[email protected]", role = c("aut")),
     person("Carlos", "Zambrana-Torrelio", , "[email protected]", role = c("aut")),
+    person("Evan", "Eskew", , "[email protected]", role = c("aut")),
     person("EcoHealth Alliance", role="cph"))
 Description: Provides cleaned data and metadata from the US Fish and Wildlife
     Service's data on wildlife and wildlife product imports and exports.
@@ -23,7 +24,7 @@ Imports:
   htmlwidgets,
   stringi
 Remotes:
-  ropenscilabs/datastorr@i14-private-repos,
+  ropenscilabs/datastorr,
   krlmlr/fstplyr
 Roxygen: list(markdown = TRUE)
 RoxygenNote: 6.0.1
@@ -37,5 +38,6 @@ Suggests:
     testthat,
     lintr,
     taxize,
-    tidyverse
+    dplyr,
+    readr
 VignetteBuilder: knitr
@@ -1,3 +1,21 @@
+# lemis 2.0.0
+
+This is the first major update to the **lemis** package. Changes include:
+
+* Addition of data from late 2013 and all of 2014.
+* A major reorganization of **lemis** data processing for release. Briefly, all data importation and cleaning steps are now automated with scripts located in the `data-raw/` subdirectory. While this should be mostly irrelevant to the end user, it means that all data processing code is now fully contained within the **lemis** package repository. This should make it easier to incorporate future data into the pipeline.
+* Improved error handling for non-standard data values. Previously, some records with non-standard values in specific fields were dropped from the data. Now the **lemis** cleaning workflow incorporates error checking for non-standard values across all the fields of data for which valid values are described in USFWS spreadsheets. Non-standard values are converted to `non-standard value` (as opposed to being converted to `NA` or dropped) and a `cleaning_notes` column has been added to describe the original value.
+* Note that data versions from previous releases can still be had with `lemis_data('v1.0.0')`
+
+## Bug fixes
+
+* Browsable tables in HTML should now work under systems with all pandoc versions.
+
+## Minor changes
+
+* Reduced dependencies by removing tidyverse package
+* Updated test infrastructure to R 3.5.1
+
 # lemis 1.0.0
 
 * Initial release
@@ -19,7 +19,7 @@
 #'     if(in_pkgdown) {
 #'       mytext <- c('In RStudio, this help file includes a searchable table of values.')
 #'     } else {
-#'       mytext <- lemis:::rd_datatable(lemis:::lemis_codes())
+#'       mytext <- lemis:::rd_datatable(lemis::lemis_codes())
 #'     }
 #'     mytext
 #'   }
@@ -49,7 +49,7 @@ lemis_codes <- function() {
 #'     if(in_pkgdown) {
 #'       mytext <- c('In RStudio, this help file includes a searchable table of values.')
 #'     } else {
-#'       mytext <- lemis:::rd_datatable(lemis:::lemis_metadata())
+#'       mytext <- lemis:::rd_datatable(lemis::lemis_metadata())
 #'     }
 #'     mytext
 #'   }
@@ -90,35 +90,42 @@ tabular <- function(df, col_names = TRUE, ...) {
 
   paste(
     "\\tabular{", paste(col_align, collapse = ""), "}{\n  ",
-    contents, "\n}\n", sep = ""
+    contents, "\n}\n",
+    sep = ""
   )
 }
 
-
-#'@importFrom DT datatable
-#'@noRd
-rd_datatable <- function(df, width="100%", ...) {
-  wrap_widget(datatable(df, width=width, ...))
+#' @importFrom DT datatable
+#' @noRd
+rd_datatable <- function(df, width = "100%", ...) {
+  wrap_widget(datatable(df, width = width, ...))
 }
 
-#'@importFrom stringi stri_subset_regex
-#'@importFrom htmlwidgets saveWidget
-#'@noRd
+#' @importFrom stringi stri_subset_regex
+#' @importFrom htmlwidgets saveWidget
+#' @noRd
 wrap_widget <- function(widget) {
-  tmp <- tempfile(fileext=".html")
-  saveWidget(widget, tmp)
-  widg <- paste(stringi::stri_subset_regex(readLines(tmp), "^</?(!DOCTYPE|meta|body|html|head|title)",negate=TRUE), collapse="\n")
-  paste('\\out{', escape_rd(widg), '}\n', sep="\n")
+  tmp <- tempfile(fileext = ".html")
+  htmlwidgets::saveWidget(widget, tmp)
+  widg <- paste(
+    stringi::stri_subset_regex(readLines(tmp),
+                               "^</?(!DOCTYPE|meta|body|html|head|title)",
+                               negate = TRUE),
+    collapse = "\n")
+  paste("\\out{", escape_rd(widg), "}\n", sep = "\n")
 }
 
-#'@importFrom stringi stri_replace_all_fixed
-#'@noRd
+#' @importFrom stringi stri_replace_all_fixed
+#' @noRd
 escape_rd <- function(x) {
   stri_replace_all_fixed(
     stri_replace_all_fixed(
       stri_replace_all_fixed(
         stri_replace_all_fixed(x, "\\", "\\\\"),
-        "%", "\\%"),
-      "{", "\\{"),
-    "}", "\\}")
+        "%", "\\%"
+      ),
+      "{", "\\{"
+    ),
+    "}", "\\}"
+  )
 }
@@ -21,7 +21,7 @@
 #'   use a different path).
 #'
 #' @export
-lemis_data <- function(version=NULL, path=NULL) {
+lemis_data <- function(version = NULL, path = NULL) {
   datastorr::github_release_get(lemis_info(path), version)
 }
 
@@ -35,19 +35,19 @@ lemis_data <- function(version=NULL, path=NULL) {
 #'   for the most recent github version.
 #'
 #' @importFrom datastorr github_release_versions
-lemis_versions <- function(local=TRUE, path=NULL) {
+lemis_versions <- function(local = TRUE, path = NULL) {
   datastorr::github_release_versions(lemis_info(path), local)
 }
 
 #' @export
 #' @rdname lemis_data
-lemis_version_current <- function(local=TRUE, path=NULL) {
+lemis_version_current <- function(local = TRUE, path = NULL) {
   datastorr::github_release_version_current(lemis_info(path), local)
 }
 
 #' @export
 #' @rdname lemis_data
-lemis_del <- function(version, path=NULL) {
+lemis_del <- function(version, path = NULL) {
   datastorr::github_release_del(lemis_info(path), version)
 }
 
@@ -70,6 +70,6 @@ lemis_info <- function(path) {
 #' @title Make a data release.
 #' @param ... Parameters passed through to \code{\link{github_release_create}}
 #' @param path Path to the data (see \code{\link{lemis}}).
-lemis_release <- function(..., path=NULL) {
+lemis_release <- function(..., path = NULL) {
   datastorr::github_release_create(lemis_info(path), ...)
 }
@@ -4,8 +4,6 @@ output: github_document
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
 
-[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=svg&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
-
 ```{r setup, include = FALSE}
 knitr::opts_chunk$set(
   collapse = TRUE,
@@ -18,6 +16,8 @@ library(magrittr)
 
 # lemis
 
+[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=shield&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
+
 ```{r authors, echo = FALSE, results = 'asis'}
 unclass(desc::desc_get_authors(here::here("DESCRIPTION"))) %>% 
   purrr::keep(~"aut" %in% .$role) %>% 
@@ -60,7 +60,7 @@ Our [paper (Smith et. al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC53
 
 ## About
 
-Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues)
+Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues).
 
 **lemis** is developed at [EcoHealth Alliance](https://github.com/ecohealthalliance). Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
 
 
@@ -1,33 +1,48 @@
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
-[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=svg&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
 
-lemis
-=====
+# lemis
 
-Authors: *Noam Ross, Allison White, Carlos Zambrana-Torrelio*
+[![CircleCI](https://circleci.com/gh/ecohealthalliance/lemis.svg?style=shield&circle-token=23cd13e8d5276a8100a83984982d065d1773fd77)](https://circleci.com/gh/ecohealthalliance/lemis)
 
-The **lemis** package provides access to U.S. Fish and Wildlife Service (USFWS) data on wildlife and wildlife product imports to and exports from the United States. This data was obtained via more than 14 years of Freedom of Information Act (FOIA) requests by EcoHealth Alliance.
+Authors: *Noam Ross, Allison White, Carlos Zambrana-Torrelio, Evan
+Eskew*
 
-Installation
-------------
+The **lemis** package provides access to U.S. Fish and Wildlife Service
+(USFWS) data on wildlife and wildlife product imports to and exports
+from the United States. This data was obtained via more than 14 years of
+Freedom of Information Act (FOIA) requests by EcoHealth Alliance.
+
+## Installation
 
 Install the **lemis** package with this command:
 
 ``` r
 source("https://install-github.me/ecohealthalliance/lemis")
 ```
 
-As this is currently a private repository, you must have a GitHub personal access token set up to install and use the package. Instructions for this can be found [here](http://happygitwithr.com/github-pat.html#step-by-step).
+As this is currently a private repository, you must have a GitHub
+personal access token set up to install and use the package.
+Instructions for this can be found
+[here](http://happygitwithr.com/github-pat.html#step-by-step).
 
-Usage
------
+## Usage
 
-The main function in **lemis** is `lemis_data()`. This returns the main cleaned LEMIS database as a **dplyr** tibble.
+The main function in **lemis** is `lemis_data()`. This returns the main
+cleaned LEMIS database as a **dplyr** tibble.
 
-**lemis** makes use of [**datastorr**](https://github.com/ropenscilabs/datastorr) to manage data download. The first time you run `lemis_data()`, the package will download the most recent version of the database (~160 MB at present). Subsequent calls will load the database from storage on your computer.
+**lemis** makes use of
+[**datastorr**](https://github.com/ropenscilabs/datastorr) to manage
+data download. The first time you run `lemis_data()`, the package will
+download the most recent version of the database (~160 MB at present).
+Subsequent calls will load the database from storage on your computer.
 
-The LEMIS database is stored as an efficiently compressed [`.fst` file](https://github.com/fstpackage/fst), and loading it loads it a [remote dplyr source](https://github.com/krlmlr/fstplyr). This means that it does not load fully into memory but can be filtered and manipulated on-disk. If you wish to manipulate it as a data frame, simply call `dplyr::collect()` to load it fully into memory, like so:
+The LEMIS database is stored as an efficiently compressed [`.fst`
+file](https://github.com/fstpackage/fst), and loading it loads it a
+[remote dplyr source](https://github.com/krlmlr/fstplyr). This means
+that it does not load fully into memory but can be filtered and
+manipulated on-disk. If you wish to manipulate it as a data frame,
+simply call `dplyr::collect()` to load it fully into memory, like so:
 
 ``` r
 all_lemis <- lemis_data() %>% 
@@ -36,15 +51,31 @@ all_lemis <- lemis_data() %>%
 
 Note that the full database will be ~1 GB in memory.
 
-`lemis_codes()` returns a data frame with descriptions of the codes used by USFWS in the various columns of `lemis_data()`. This is useful for lookup or joining with the main data for more descriptive outputs. The `?lemis_code` help file also has a searchable table of these codes.
-
-Our [paper (Smith et. al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357285/) provides a broader introduction to this data and its relevance to infectious disease. See the [vignette](https://github.com/ecohealthalliance/lemis/tree/master/inst/doc/the-lemis-database.md) for a more in-depth tutorial and example use cases for the package. See the [developer README](https://github.com/ecohealthalliance/lemis/tree/master/data-raw/README.md) for more on the data cleaning process.
-
-About
------
-
-Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues)
-
-**lemis** is developed at [EcoHealth Alliance](https://github.com/ecohealthalliance). Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
+`lemis_codes()` returns a data frame with descriptions of the codes used
+by USFWS in the various columns of `lemis_data()`. This is useful for
+lookup or joining with the main data for more descriptive outputs. The
+`?lemis_code` help file also has a searchable table of these codes.
+
+Our [paper (Smith et.
+al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357285/)
+provides a broader introduction to this data and its relevance to
+infectious disease. See the
+[vignette](https://github.com/ecohealthalliance/lemis/tree/master/inst/doc/the-lemis-database.md)
+for a more in-depth tutorial and example use cases for the package. See
+the [developer
+README](https://github.com/ecohealthalliance/lemis/tree/master/data-raw/README.md)
+for more on the data cleaning process.
+
+## About
+
+Please give us feedback or ask questions by filing
+[issues](https://github.com/ecohealthalliance/lemis/issues).
+
+**lemis** is developed at [EcoHealth
+Alliance](https://github.com/ecohealthalliance). Please note that this
+project is released with a [Contributor Code of
+Conduct](CODE_OF_CONDUCT.md). By participating in this project, you
+agree to abide by its
+terms.
 
 [![http://www.ecohealthalliance.org/](inst/figs/eha-footer.png)](http://www.ecohealthalliance.org/)
@@ -1,14 +1,15 @@
 # Data processing for **lemis**
 
-The scripts in this directory process the data for the **lemis** package.
+The scripts in this directory import, clean, and process the data for the **lemis** package. While the initial version of **lemis** relied on the related [WildDB](https://github.com/ecohealthalliance/WildDB/) repository for data cleaning, as of **lemis** v2.0.0, these processing steps have been moved to the `data-raw/` subdirectory of the **lemis** package repository for consistency and ease of use. The **lemis** data preparation workflow requires execution of three scripts in succession:
 
-`scrape_codes.R` uses the **tabulizer** package to extract codes from
-the PDF codebook to generate `lemis_codes()` and `lemis_metadata()`. 
+- `import_lemis.R` downloads a local copy of all raw LEMIS files, which are kept on an [Amazon Web Services S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/eha.wild.db/). These data consist of multiple Excel files for a given year of FOIA requests. In addition, each spreadsheet file may contain different numbers of sheets with relevant LEMIS data. `import_lemis.R` imports all of this raw data (in `data-raw/by_year/`) and merges all data into yearly CSV files (in `data-raw/csv_by_year/`).
 
-Raw LEMIS files are kept on AWS S3 at <https://s3.console.aws.amazon.com/s3/buckets/eha.wild.db/>. The `cleaned_data` directory there contains the data after extracting from XLS and going through some basic processing. The processing changes from year to year and so is not automated for all files. It is described in detail at <https://github.com/ecohealthalliance/WildDB/tree/master/scripts/data_cleaning>.
+- `clean_lemis.R` merges the yearly LEMIS CSV files into a single dataframe of all LEMIS data and performs various cleaning steps. Following generation of the single cleaned LEMIS data file, the local copies of intermediate files in `data-raw/by_year/` and `data-raw/csv_by_year/` can be safely deleted (since they can always be regenerated from `import_lemis.R` and `clean_lemis.R`).
 
-`process_lemis.R` imports the cleaned LEMIS data and performs a few more processing tasks before compressing the data into an `.fst` file. These should be moved upstream into the base data cleaning at the next iteration.
+- `process_lemis.R` processes the cleaned LEMIS data for use in a **lemis** package release by compressing the data into an `.fst` file.
 
-Once the `.fst` file is generated, it can be attached to this package as a release using `datastorr::github_release_create`. Please read the help file for this function before doing so. Also, before release, one should update the package version in `DESCRIPTION` and commit all changes to GitHub.
+In addition, there is the script `scrape_codes.R`, which uses the **tabulizer** package to extract codes from the PDF codebook to generate `lemis_codes()` and `lemis_metadata()`. 
 
-v1.0.0 of **lemis** has the 2000-2013 data set.
+Once the `.fst` file is generated, it can be attached to the package as a release using `datastorr::github_release_create()`. Please read the help file for this function before doing so. Also, before release, one should update the package version in `DESCRIPTION` and commit all changes to GitHub.
+
+v2.0.0 of **lemis** has the 2000-2014 data set.