Skip to content

Processing while ingesting

Anonymous edited this page May 1, 2016 · 9 revisions

DaCHS allows you to process your data while it is being ingested -- i.e., inside your RD; and that can be done in different levels, from variables type casting to units conversion to complex,multiple variables computing. For the simple/common ones (e.g, units conversion), DaCHS even provides you an API; for more complex tasks, you can define your custom python code.

In your RD there are a couple of places where you can trigger such processing calls.

One of these places is the apply element, inside data's rowmaker element, the other one is from the grammar element, through the rowfilter [1]:

apply
apply elements allow you to embed python code. The way to do it is by wrapping your code fragment in a code element, as shown right below. The namespace in place provides data from grammar through the vars dictionary; where key:value can be not only accessed but added/modified.
  <apply name="myProcessing">
    <code>
      if something is False:
          pass
    </code>
  </apply>

TODO: apply element can have a procDef argument [2]

TODO: namespace, variables/structures available to manipulate

rowfilter
rowfilter is like apply (i.e, same structure), but meant to be used only inside a grammar element. You can then access (only) whatever comes from the grammar: dictionaries named row are available to consume. The way to access such data is through a call to row[key] [*]. At the end of rowfilter's code block there should be a yield call, at least. In reality, you can place as many yield calls as you feel like, generating a number of rows accordingly.
  <rowfilter name="myRowGen">
    <code>
      <!-- do something -->
      yield row
    </code>
  </rowfilter>

As apply, rowfilter elements can be declared more than once, or none at all; being executed in sequence.

The (row dictionary) keywords available are given by the specific grammar in use and (if) by preceding procedures.

For instance, if you consider declaring two rowfilter elements, where the first one calls the (procedure definition) procDef = //products#define, the second rowfilter block will see row containing the grammar source data plus the keywords defined in //products#define.

setup & bind
Besides the code element used to declare the python code fragment, one can use the setup element as a particular code block for (global) namespace setup -- to be used on subsequent code blocks. Worth of noticing, the bind element is a handling element to a direct solution in assigning (new) values to variables in the namespace.

Mind

[1] The structure of an RD : data
[2] procDef internal link
[*] key is...?
Clone this wiki locally