You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the first major update to the **lemis** package. Changes include:
4
+
5
+
* Addition of data from late 2013 and all of 2014.
6
+
* A major reorganization of **lemis** data processing for release. Briefly, all data importation and cleaning steps are now automated with scripts located in the `data-raw/` subdirectory. While this should be mostly irrelevant to the end user, it means that all data processing code is now fully contained within the **lemis** package repository. This should make it easier to incorporate future data into the pipeline.
7
+
* Improved error handling for non-standard data values. Previously, some records with non-standard values in specific fields were dropped from the data. Now the **lemis** cleaning workflow incorporates error checking for non-standard values across all the fields of data for which valid values are described in USFWS spreadsheets. Non-standard values are converted to `non-standard value` (as opposed to being converted to `NA` or dropped) and a `cleaning_notes` column has been added to describe the original value.
8
+
* Note that data versions from previous releases can still be had with `lemis_data('v1.0.0')`
9
+
10
+
## Bug fixes
11
+
12
+
* Browsable tables in HTML should now work under systems with all pandoc versions.
13
+
14
+
## Minor changes
15
+
16
+
* Reduced dependencies by removing tidyverse package
Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues)
63
+
Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues).
64
64
65
65
**lemis** is developed at [EcoHealth Alliance](https://github.com/ecohealthalliance). Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
The **lemis** package provides access to U.S. Fish and Wildlife Service (USFWS) data on wildlife and wildlife product imports to and exports from the United States. This data was obtained via more than 14 years of Freedom of Information Act (FOIA) requests by EcoHealth Alliance.
8
+
Authors: *Noam Ross, Allison White, Carlos Zambrana-Torrelio, Evan
9
+
Eskew*
11
10
12
-
Installation
13
-
------------
11
+
The **lemis** package provides access to U.S. Fish and Wildlife Service
12
+
(USFWS) data on wildlife and wildlife product imports to and exports
13
+
from the United States. This data was obtained via more than 14 years of
14
+
Freedom of Information Act (FOIA) requests by EcoHealth Alliance.
As this is currently a private repository, you must have a GitHub personal access token set up to install and use the package. Instructions for this can be found [here](http://happygitwithr.com/github-pat.html#step-by-step).
24
+
As this is currently a private repository, you must have a GitHub
25
+
personal access token set up to install and use the package.
The main function in **lemis** is `lemis_data()`. This returns the main cleaned LEMIS database as a **dplyr** tibble.
31
+
The main function in **lemis** is `lemis_data()`. This returns the main
32
+
cleaned LEMIS database as a **dplyr** tibble.
27
33
28
-
**lemis** makes use of [**datastorr**](https://github.com/ropenscilabs/datastorr) to manage data download. The first time you run `lemis_data()`, the package will download the most recent version of the database (~160 MB at present). Subsequent calls will load the database from storage on your computer.
34
+
**lemis** makes use of
35
+
[**datastorr**](https://github.com/ropenscilabs/datastorr) to manage
36
+
data download. The first time you run `lemis_data()`, the package will
37
+
download the most recent version of the database (~160 MB at present).
38
+
Subsequent calls will load the database from storage on your computer.
29
39
30
-
The LEMIS database is stored as an efficiently compressed [`.fst` file](https://github.com/fstpackage/fst), and loading it loads it a [remote dplyr source](https://github.com/krlmlr/fstplyr). This means that it does not load fully into memory but can be filtered and manipulated on-disk. If you wish to manipulate it as a data frame, simply call `dplyr::collect()` to load it fully into memory, like so:
40
+
The LEMIS database is stored as an efficiently compressed [`.fst`
41
+
file](https://github.com/fstpackage/fst), and loading it loads it a
42
+
[remote dplyr source](https://github.com/krlmlr/fstplyr). This means
43
+
that it does not load fully into memory but can be filtered and
44
+
manipulated on-disk. If you wish to manipulate it as a data frame,
45
+
simply call `dplyr::collect()` to load it fully into memory, like so:
31
46
32
47
```r
33
48
all_lemis<- lemis_data() %>%
@@ -36,15 +51,31 @@ all_lemis <- lemis_data() %>%
36
51
37
52
Note that the full database will be ~1 GB in memory.
38
53
39
-
`lemis_codes()` returns a data frame with descriptions of the codes used by USFWS in the various columns of `lemis_data()`. This is useful for lookup or joining with the main data for more descriptive outputs. The `?lemis_code` help file also has a searchable table of these codes.
40
-
41
-
Our [paper (Smith et. al. 2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357285/) provides a broader introduction to this data and its relevance to infectious disease. See the [vignette](https://github.com/ecohealthalliance/lemis/tree/master/inst/doc/the-lemis-database.md) for a more in-depth tutorial and example use cases for the package. See the [developer README](https://github.com/ecohealthalliance/lemis/tree/master/data-raw/README.md) for more on the data cleaning process.
42
-
43
-
About
44
-
-----
45
-
46
-
Please give us feedback or ask questions by filing [issues](https://github.com/ecohealthalliance/lemis/issues)
47
-
48
-
**lemis** is developed at [EcoHealth Alliance](https://github.com/ecohealthalliance). Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project, you agree to abide by its terms.
54
+
`lemis_codes()` returns a data frame with descriptions of the codes used
55
+
by USFWS in the various columns of `lemis_data()`. This is useful for
56
+
lookup or joining with the main data for more descriptive outputs. The
57
+
`?lemis_code` help file also has a searchable table of these codes.
The scripts in this directory process the data for the **lemis** package.
3
+
The scripts in this directory import, clean, and process the data for the **lemis** package. While the initial version of **lemis** relied on the related [WildDB](https://github.com/ecohealthalliance/WildDB/) repository for data cleaning, as of **lemis** v2.0.0, these processing steps have been moved to the `data-raw/` subdirectory of the **lemis** package repository for consistency and ease of use. The **lemis** data preparation workflow requires execution of three scripts in succession:
4
4
5
-
`scrape_codes.R` uses the **tabulizer** package to extract codes from
6
-
the PDF codebook to generate `lemis_codes()` and `lemis_metadata()`.
5
+
-`import_lemis.R` downloads a local copy of all raw LEMIS files, which are kept on an [Amazon Web Services S3 bucket](https://s3.console.aws.amazon.com/s3/buckets/eha.wild.db/). These data consist of multiple Excel files for a given year of FOIA requests. In addition, each spreadsheet file may contain different numbers of sheets with relevant LEMIS data. `import_lemis.R` imports all of this raw data (in `data-raw/by_year/`) and merges all data into yearly CSV files (in `data-raw/csv_by_year/`).
7
6
8
-
Raw LEMIS files are kept on AWS S3 at <https://s3.console.aws.amazon.com/s3/buckets/eha.wild.db/>. The `cleaned_data` directory there contains the data after extracting from XLS and going through some basic processing. The processing changes from year to year and so is not automated for all files. It is described in detail at <https://github.com/ecohealthalliance/WildDB/tree/master/scripts/data_cleaning>.
7
+
-`clean_lemis.R` merges the yearly LEMIS CSV files into a single dataframe of all LEMIS data and performs various cleaning steps. Following generation of the single cleaned LEMIS data file, the local copies of intermediate files in `data-raw/by_year/`and `data-raw/csv_by_year/` can be safely deleted (since they can always be regenerated from `import_lemis.R` and `clean_lemis.R`).
9
8
10
-
`process_lemis.R`imports the cleaned LEMIS data and performs a few more processing tasks before compressing the data into an `.fst` file. These should be moved upstream into the base data cleaning at the next iteration.
9
+
-`process_lemis.R`processes the cleaned LEMIS data for use in a **lemis** package release by compressing the data into an `.fst` file.
11
10
12
-
Once the `.fst` file is generated, it can be attached to this package as a release using `datastorr::github_release_create`. Please read the help file for this function before doing so. Also, before release, one should update the package version in `DESCRIPTION` and commit all changes to GitHub.
11
+
In addition, there is the script `scrape_codes.R`, which uses the **tabulizer** package to extract codes from the PDF codebook to generate `lemis_codes()` and `lemis_metadata()`.
13
12
14
-
v1.0.0 of **lemis** has the 2000-2013 data set.
13
+
Once the `.fst` file is generated, it can be attached to the package as a release using `datastorr::github_release_create()`. Please read the help file for this function before doing so. Also, before release, one should update the package version in `DESCRIPTION` and commit all changes to GitHub.
0 commit comments