Skip to content

USF-IMARS/FCRWQDC_data_ingest

Repository files navigation

Data

Data for this repo is staged in /data.

Final data is also stored in gdrive here.

The final data produced by this ingestion is visualized further using this shiny data dashboard.

Data from WIN is pulled manually for each program and put into data/. This data is staged at this box.com link.

Additional data is provided in custom formats by some providers:

  • AOML SFER data harvested from this github repo (private)
  • Older historical data (from STORET) has been collected into this box.com folder.
  • newer FIU data from a custom file format
  • MiamiBeach data is a custom format

Known Issues

  • AOML_FBBB :
    • Missing Units for most analytes
  • BBAP, BROWARD, DEP, DERM, FIU_WQMP, PALMBEACH :
    • source data has rows with missing critical fields when Activity.Type is "Blank", "Replicate", or similar. These rows are dropped by getWINData().
      • This filtering is also applied to other WIN datasets.
  • DEP :
    • small number of rows missing Lat+Lon
  • STORET data (BROWARD_STORET, DERM_BBWQ_STORET, PALMBEACH_STORET)
    • no latitude+longitude included in raw source files
  • FIU_WQMP_RECENT (data/FIU_recent_all.csv )
    • no lat+lon included in raw source files
    • raw source file has no units
  • WIN dataset PALMBEACH *has lat+lon provided only in DD MM SS, conversion to decimal not implemented
  • SFER data has no units
  • Miami Beach some sites have an extra '#' in front (site1 and #site1)

Notes about the final data

  • getData files attempt to align all columns to WIN column names
    • for column mappings between projects see relevant R/get*Data.R and R/align_*_df.R files
  • most exported .csv files do not contain all columns. Many more are returned by getData. For all data see the allDataRaw.csv

chlorophyll a

Chlorophyll a values are special because some are corrected for pheophytin.

At time of writing corrected/uncorrected is not known for some programs:

  • FIU_Estuaries
  • MiamiBeach
  • SFER

For these programs the chlorophyll_a values are not included.

tests

Rscript -e "testthat::test_dir(here::here('tests/testthat'))"

or

testthat::test_dir(here::here('tests/testthat'))

Common Workflows

Add a Provider

  1. Add provider data files to ./data/.
  2. Check R/getListOfPrograms.R
  3. If custom file reader is needed
  • create file get{provider}Data.R
    • map columns to standard column names DEP.Result.ID Activity.ID year month day time Activity.Start.Date.Time lat_deg lat_min Org.Decimal.Latitude lon_deg lon_min Org.Decimal.Longitude Monitoring.Location.ID Activity.Type Activity.Depth Activity.Depth.Unit Activity.Depth.Top.Bottom.Unit Sample.Collection.Type Activity.Top.Depth Activity.Bottom.Depth Value.Qualifier Result.Comments DEP.Analyte.Name DEP.Result.Value.Number DEP.Result.Unit
  • include relevant logic in getData.R
    • new get{provider}Data call
    • analyte name mappings

About

data ingestion and initial analysis from FL WIN water quality database

Resources

Stars

Watchers

Forks

Languages