-
Notifications
You must be signed in to change notification settings - Fork 13
Domain model
The following documentations aims to outline the way in which SpenDB stores and queries data. Note that this documentation is aimed at developers who want to modify core functions of the platform. For anyone who wants to simply load data or use publicly accessible APIs, see the help under Web API.
SpenDB provides a multi-tenant OLAP-style data store, a concept sometimes referred to as a data mart. The system aims to allow the addition of further datasets at run-time and via the web, while keeping loaded data immutable. For each dataset, a reference is managed within the core data model, and a specific fact table is generated to keep the actual data.
While OpenSpending used a star schema representation with multiple tables generated per dataset, SpenDB does not attempt to normalise the data as it is loaded.
SpenDB keeps an extensive set of metadata for each Dataset
. The metadata is used to query the physical model and to provide general information about the dataset.
The metadata is stored in a specified object structure, which is often represented as JSON. The basic layout is this:
"dataset": {
... basic dataset attributes ...
},
"model": {
"dimensions": {
... dimension descriptions ...
},
"measures": {
... measure descriptions ...
}
}
Each of these sections is documented below.
The core Dataset
attributes are very generic and easily explained:
"dataset": {
"name": "machine-name",
"label": "Nicer, human-friendly Title",
"currency": "EUR",
"description": "This can be Markdown-formatted"
}
The name
of the dataset will be part of each URL that refers to it, so it makes sense to choose a concise term without any special characters, such as spaces, symbols or text with accents or umlauts.
currency
is expected to be a valid, three-letter currency code, e.g. EUR or USD. All measures are by default assumed to be specified in this currency, unless otherwise noted.
The second section of the model, model
, serves a duplicate function: it is both used to define how the data should be modelled in SpenDB and how values for each attribute can be located within the generated fact table. Future versions of SpenDB may break this up, defining both a model
and mapping
.
For dimensions and measures, a simple model format is available:
"model": {
"measures": {
"amount": {
"label": "Amount paid",
"description": "...",
"column": "amt",
"type": "float"
}
},
"dimensions": {
"time": {
"label": "Time of transaction",
"description": "...",
"attributes": {
"year": {
"type": "integer",
"column": "year_paid"
}
}
},
"transaction": {
"type": "value",
"label": "Transaction ID",
"description": "12-digit identifier for each entry.",
"attributes": {
"label": {
"type": "string",
"column": "tx_id",
"key": true
}
}
}
}
}
The model above defines three concepts, one measure and two dimensions. The meaning of type
, label
and description
are somewhat self-explanatory. column
is used to define the source column, based on a normalised form of the source data file column name.
A valid input CSV file for the model defined above might look like this:
tx_id,year_paid,amt
D38DEF-ZZ,2008,5044.0
AAA372-39,2011,43.5
(missing),2009,2854922.0
When loading a :py:class:~.Dataset
, SpenDB will generate a single table (and columns) to represent the data. A table called <dataset_name>__fact
will be generated for each dataset with an id
column. The id
is auto-generated for each entry.
SpenDB also gives special importance to a set of other attributes of compound dimensions so that it makes sense to define as many of them as possible:
-
name
must be a unique, identifying key for each member of the dimension. -
label
is assumed to be a human-readable identifier that will be used as a title and heading for the dimension member pahe and references to the member in general. -
uri
contains a unified URI for the entity mentioned in this dimension, e.g. an OpenCorporates URI or a reference to a classificiation scheme. -
color
will be used when the dimension member is included in visualizations. If no color is set, it will be selected from a pre-defined palette. -
parent
is reserved for future use.
If you are stuck with the installation or usage of SpenDB, please: search for existing bug reports or file an issue, or message the spendb mailing list.