Tool set for OMNILab data warehouse.
OMNILab data warehouse is designed with three layers:
-
Layer0: Original raw data from multiple sources (NFS).
-
Layer1: Independent wide tables for each source after simple ETLing (HDFS).
-
Layer2: A bunch of small tables (models) after combining multiple data at Layer1 to fit different applications (HDFS).
In most scenarios, data administrators hold the access right to Layer0 and Layer1, and data users have accesses to the small tables at Layer2 to meet their requirements.
The data users can also contribute new data models to Layer2 when they develop a new type of table from application. In this process, other data sources may be involved to generate the new model. At this time, the user should contact admin to add new data sources to Layer0 or Layer1.
-
etlers
: source code of ETL tools. -
deploy
: folder to dploy binary ETL tools referred byporters
. -
porters
: automatic scripts to port a new repo periodically with ETL tools. -
repos
: documentation for each repo. -
global_config.sh
: global settings used by porters. -
workflow.sh
: global workflow to run periodically.
-
Add a related ETL program or script to
etlers
. Each program deserves an independent folder. -
Add a shell script in
porters
to call your ETL program automatically. -
Append the shell script to right position in
workflow.sh
. -
Add documentation of the new repo to
repos
. -
Contact admin to redeploy this tool set.
-
Xiaming Chen, [email protected]
-
Haiyang Wang, [email protected]