Data for this repo is staged in /data.
Final data is also stored in gdrive here.
The final data produced by this ingestion is visualized further using this shiny data dashboard.
Data from WIN is pulled manually for each program and put into data/.
This data is staged at this box.com link.
Additional data is provided in custom formats by some providers:
- AOML SFER data harvested from this github repo (private)
- Older historical data (from STORET) has been collected into this box.com folder.
- newer FIU data from a custom file format
- MiamiBeach data is a custom format
- AOML_FBBB :
- Missing Units for most analytes
- BBAP, BROWARD, DEP, DERM, FIU_WQMP, PALMBEACH :
- source data has rows with missing critical fields when Activity.Type is "Blank", "Replicate", or similar. These rows are dropped by getWINData().
- This filtering is also applied to other WIN datasets.
- source data has rows with missing critical fields when Activity.Type is "Blank", "Replicate", or similar. These rows are dropped by getWINData().
- DEP :
- small number of rows missing Lat+Lon
- STORET data (BROWARD_STORET, DERM_BBWQ_STORET, PALMBEACH_STORET)
- no latitude+longitude included in raw source files
- FIU_WQMP_RECENT (
data/FIU_recent_all.csv)- no lat+lon included in raw source files
- raw source file has no units
- WIN dataset PALMBEACH
*has lat+lon provided only in
DD MM SS, conversion to decimal not implemented - SFER data has no units
- Miami Beach some sites have an extra '#' in front (site1 and #site1)
- getData files attempt to align all columns to WIN column names
- for column mappings between projects see relevant
R/get*Data.RandR/align_*_df.Rfiles
- for column mappings between projects see relevant
- most exported .csv files do not contain all columns. Many more are returned by getData. For all data see the allDataRaw.csv
Chlorophyll a values are special because some are corrected for pheophytin.
At time of writing corrected/uncorrected is not known for some programs:
- FIU_Estuaries
- MiamiBeach
- SFER
For these programs the chlorophyll_a values are not included.
Rscript -e "testthat::test_dir(here::here('tests/testthat'))"or
testthat::test_dir(here::here('tests/testthat'))- Add provider data files to
./data/. - Check
R/getListOfPrograms.R - If custom file reader is needed
- create file
get{provider}Data.R- map columns to standard column names
DEP.Result.ID Activity.ID year month day time Activity.Start.Date.Time lat_deg lat_min Org.Decimal.Latitude lon_deg lon_min Org.Decimal.Longitude Monitoring.Location.ID Activity.Type Activity.Depth Activity.Depth.Unit Activity.Depth.Top.Bottom.Unit Sample.Collection.Type Activity.Top.Depth Activity.Bottom.Depth Value.Qualifier Result.Comments DEP.Analyte.Name DEP.Result.Value.Number DEP.Result.Unit
- map columns to standard column names
- include relevant logic in
getData.R- new
get{provider}Datacall - analyte name mappings
- new