etl pipeline, graphical explorer and general toolbox for investigations with follow the money data
Tutorial: https://docs.investigraph.dev/tutorial/
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
investigraph is an ETL framework that allows research teams to build their own data catalog themselves as easily and reproducable as possible. The investigraph frameworks provides logic for extracting, transforming and loading any data source into followthemoney entities.
For most common data source formats, this process is possible without programming knowledge, by means of an easy yaml
specification interface. However, if it turns out that a specific dataset can not be parsed with the built-in logic, a developer can plug in custom python scripts at specific places within the pipeline to fulfill even the most edge cases in data processing.
- standardized process to convert different data sets into a uniform and thus comparable format
- control of this process for non-technical people
- Creation of an own (internal) data catalog
- Regular, automatic updates of the data
- A growing community that makes more and more data sets accessible
- Access to a public (open source) data catalog operated by "investigraph"
- investigraph-etl - etl style pipeline framework for followthemoney data based on prefect.io
- investigraph-eu - Catalog of european datasets powered by investigraph
- runpandarun - A simple interface written in python for reproducible i/o workflows around tabular data via pandas
- ftmq - An attempt towards a followthemoney query dsl
- investigraph-site - Landing page for investigraph (next.js app)
- investigraph-api - public API instance to use as a test playground
- runpandarun - A simple interface written in python for reproducible i/o workflows around tabular data via pandas
- ftm-joy-ui - React components based on Joy UI for rendering follow the money stuff
- ftmstore-fastapi - Lightweight API that exposes a ftm store to a public endpoint. Will be improved during this project.
This project builds on top of great technology. Contributions to 3rd party libraries are listed below.
This documentation can be rendered via mkdocs using the mkdocs-material theme.
Local developement:
pip install -r requirements.txt
mkdocs serve
Follow the documentation at mkdocs-material
mkdocs build
Media Tech Lab Bayern batch #3