Skip to content
K.-Michael Aye edited this page Jun 6, 2014 · 28 revisions

Format

The target format is the Hierarchical Data Format (HDF) in version 5, a well established data format with good reading routines for Python, Matlab and IDL.

Parsing

The first step is just a straight forward parsing of the CSV output of the Mongo database dump. While parsing, values of 'null' are being replaced by numpy.NaN. I made the conscious decision to NOT replace None in the marking column by NaN because that detail is in itself useable data.

The acquisition_date column is currently being parsed to a python datetime.This has been made optional by calling the reduction routine with the option --raw_times.

Filtering / Cleaning

Application

The application is called planet4_reduction.py and when called with -h for help, it provides the following output:

usage: planet4_reduction.py [-h] [--raw_times] [--keep_dirt] csv_fname

positional arguments:
  csv_fname    Provide the filename of the database dump csv-file here.

optional arguments:
      -h, --help   show this help message and exit
  --raw_times  Do not parse the times into a Python datetime object. For the
           stone-age. ;) Default: parse into datetime object.
  --keep_dirt  Do not filter for dirty data. Keep everything. Default: Do the
               filtering.

Reduction levels

I produce different versions of the reduced dataset, increasing in reduction, resulting in smaller and faster to read files.

For all file names the date part indicates the date of the database dump which is delivered every by Stuart.

Level 1

All data is included apart from what was removed in above filtering step.

Product file name is planet_four_level_1_20xx-xx-xx.h5

Level 2

This product is reduced to the data records that are finished in Planet4 terms, which is currently defined has having 30 individual analyses performed on a specific Planet4 subframe.

Product file name is planet_four_level_2_20xx-xx-xx.h5

Level 3

This product is reduced further from Level 2 by only including data records with markings!='None'. In other words, each data record of this data product has marking data in it.

Product file name is planet_four_level_3_20xx-xx-xx.h5

Clone this wiki locally