Skip to content

Latest commit

 

History

History
64 lines (44 loc) · 10.4 KB

Web_Services.md

File metadata and controls

64 lines (44 loc) · 10.4 KB

Web Services for the BiGCZ

Most of this content is from Emilio and was written/updated in 2015. Exceptions are noted below, specially in sections 1 and 3.

1. ODM2 Web Services

These web services are currently being developed by SDSC (Choonhan and Dave).

ODM2REST API / web services

ODM2 WOFpy WOF 1.x endpoints

2. WOF/WaterML 1.x Access from CZOData

See discussions and information at the CZOData/CZOWOFWaterML1 repo.

3. IEDA and National Geothermal Data System listing from Steve Richards, 6/17/2016.

4. IGSN/SESAR web services, MG-RAST

Web services

The vizer-based, BiGCZ Portal proof-of-concept prototype (pre-pre-alpha!) demonstrates an initial approach at ingesting IGSN/SESAR (samples) and MG-RAST (metagenomic data) web services. For IGSN's, here's an initial (and now old, from 2014-10) IPython notebook demonstrating service requests and parsing.

BiGCZ GeoJSON files as intermediates between web service responses and vizer

(GeoJSON discussion/notes originally from here; it dates back to abt. 2015-01, so it needs updating.) A handy reason for converting to GeoJSON as an intermediate format is that github provides native map rendering of GeoJSON files, so you get a simple map visualization w/o doing any extra work (and the markers are clickable, so you can explore the attributes). These files can also be downloaded and easily opened in desktop GIS clients like QGIS. http://geojson.io is another nice online tool that enables interactive creation and examination of GeoJSON data. (Note: vizer doesn't consume GeoJSON directly; it uses a different, custom JSON format)

  • IGSN samples GeoJSON: I've examined the web service responses (requested by IGSN number; here's an example) for the Shale Hills CZO IGSN's that Megan sent me in November. A little less than half of the ~1,700 IGSN's generated XML parsing errors with the standard Python XML parser I used, and my limited testing suggests these are due to invalid (not properly escaped) "<" and ">" characters in the XML responses; I ignored those. Of the remaining ~950 IGSN's, ~400 had no latitude and longitude entries ("Not Provided"). In order to examine the remaining IGSN sample responses, I first converted them into a standard GeoJSON structure, before converting and subsetting to what we'll use in the initial visualization portal test/pilot. The GeoJSON file available here has all ~550 IGSN's with lat & lon entries.
  • MG-RAST metagenomic GeoJSON: Started with an MG-RAST metagenome MIxS request, issued on the browser with a limit of 10,000 records (Folker had mentioned in November that there were probably ~ 20,000 total records from that request; the request was http://api.metagenomics.anl.gov//metagenome?verbosity=mixs&limit=100, except for using limit=10000. Looks like curl on the shell would've worked, but I tried wget and didn't succeed). The MIxS sequence metadata provides a manageable amount of metadata for initial exploration. I then eliminated records with invalid latitude or longitude values (372 records). Then kept only records with country in ('USA', 'United States of America') (5393). The geographical distribution of locations (points all over the world, but mostly in the USA) showed that country was not the geographical location, but more likely the home base of the project PI; so I applied a rough bounding box filter to retain only sites within the USA lower 48 (if (lon > -125.68 and lon < -65.04) and (lat > 24.53 and lat < 50.06)). The MG-RAST collection includes all kinds of genomic sequences, including ones from sources that are not of interest to the BiGCZ project (eg, from human tissue); after exploring source, material, biome and similar "type" metadata vocabularies in the responses, I further appplied a filter based on env_package_type: env_package_type in ('air', 'built environment', 'microbial mat|biofilm', 'plant-associated', 'sediment', 'soil', 'water'). These last two filters (bounding box and environment package type) greatly reduced the number of records to a final total of 883. They include a substantial number of marine sites.

I did these requests and processing on IPython notebooks. I can share those eventually, after I've cleaned them up; right now they're very messy.

5. ulmo, pyoos, and other general data access libraries

  • Check out ulmo and pyoos. The former is focused mainly on hydrological and met data, and the latter on oceanographic data; but they definitely overlap, and have a couple of duplicated readers.
  • "SciPy discussions on Python water & met data access (including IOOS SOS) client libraries". A burst of enthusiasm of mine from July 2013. Still relevant, and cool developments have taken place since then.
  • There are now many ulmo and pyoos Jupyter notebooks from the community that can be shared and highlighted. I'll do this later.
  • 11/6: Stub for update soon: I recently created a Jupyter notebook that demos both ulmo and pyoos together, then combines time series responses from both (at least on a plot). I'll link to it once I've finalized the notebook and pushed it online.