This app helps to keep data in OpenSpending as fresh as possible, by pro-actively fetching fiscal data from various data-portals and other official sources. By linking the data in OpenSpending to the exact source, we also gain more credibility for the data stored in the platform.
We use datapackage-pipelines
and specifically datapackage-pipelines-fiscal
to create and run data pipelines from fiscal.source-spec.yaml
files. For more information about source-spec files and the appropriate format to use, see the datapackage-pipelines-fiscal
repository README.
fiscal.source-spec.yaml
files define data sources, and processing steps to perform on the data, before uploading created datapackages to OpenSpending. The application looks for source-spec files in /source-specs
. So, where are they? Source-spec files are now kept in a separate repository, os-source-specs. The production deployment of os-data-importers uses a small repository-agent application to keep these files up to date on the server.
To update a datapackage on OpenSpending, either because the source-spec has changed, or because the data itself has been updated, you must make a change to the fiscal.source-spec.yaml
file and commit it to the master branch of the os-source-specs repository. If only the data has changed, but not the source-spec, the easiest change to make is to increment the source-spec's revision
property. Once committed to the repository, the repository-agent will check for changes every 5 minutes, and pull the most recent version. os-data-importers will see that the source-spec has been updated (it is marked as 'dirty'), and will run the pipeline to update the data.
A few factors determine when your pipeline will next be run.
os-data-importers is restarted: if the os-data-importers application is restarted, usually because it has been redeployed, all pipelines will be re-run.
fiscal.source-spec.yaml has been updated: os-data-importers checks for new and updated source-spec content from its known repositories every 5 minutes. If your fiscal.source.spec.yaml file has been updated, the pipeline will be marked as dirty, and rerun.
The env var $OS_DPP_DISABLE_PIPELINES='True'
will prevent pipeline schedulers from being initialised. This is useful if you want to retain the pipeline server endpoint for your application, but not run the actual pipelines (e.g. for a staging server).
A docker-compose.dev.yaml
file is provided to start up os-data-importers with a few necessary services (Redis, Postgresql, Elasticsearch, and the repository-agent) without having to run the entire OpenSpending suite. This can be useful during source-spec development to iterate on the specs. Run like this:
# Start the services os-data-importers uses first:
$ docker-compose -f docker-compose.dev.yaml up -d es redis fakes3 db
# Give it a few seconds for the services to become available, then
# start up os-data-importers and the repository-agent:
$ docker-compose -f docker-compose.dev.yaml up os-data-importers repository-agent
You can access the pipelines dashboard at: http://localhost:5000
.
If you're developing specs locally, you can point the repository agent to a local version of the specs repository on the host machine using a volume. Note, the local path on the host machine must be a valid git repository. Create a docker-compose.local.yaml
file and add the following:
version: "3.4"
services:
repository-agent:
environment:
REPO_AGENT_REPOS: /localrepo#simple # points to 'simple' branch of local repo
volumes:
- /path/to/local/source-spec/repo/os-source-specs:/localrepo
Start os-data-importers and repository-agent with the local file:
docker-compose -f docker-compose.dev.yaml -f docker-compose.local.yaml up os-data-importers repository-agent
Settings in docker-compose.local.yaml
will override settings in docker-compose.dev.yaml
. docker-compose.local.yaml
is ignored by git and won't be commited.