Simple ETL pipeline
Install requirements
pip install -r requirements.txt
Setup Git hooks
pre-commit install #set up git hooks
Please make sure pip-tools is properly installed in your virtual environment with
pip install pip-tools
This should be done by adding the required package in the requirements.in file. Then using pip-tools the requirements.txt file can be reproduced.
pip-compile --output-file requirements.txt --quiet requirements.in
pip-sync requirements.txt
Or using the makefile
make requirements
In this repo, we use a series of code quality checks, which enforce certain quality in the code. More precisely, hint
typing is performed with mypy
, linting with flake8
and automatic code formatting with black
. Each of these steps
can be run either via command line, or with make
:
Check name | Terminal | Makefile |
---|---|---|
Typehint | mypy --config-file pyproject.toml app/ |
make typehint |
Linting | pflake8 --config=pyproject.toml app/ tests/ |
make lint |
Code formatting | black app/ tests/ |
make black |
Unit tests | coverage run && coverage report |
make test |
Note that these checks run also before each commit via the pre-commit
module.
Furthermore, in order to set up the pre-commit
and get it running before each commit,
hooks should be installed with pre-commit install
.
After recreated the Python environment, move inside this project folder and run:
uvicorn app.main:app --reload --workers 1 --host 0.0.0.0 --port 8080
After recreated the Python environment, move inside this project folder and run:
make dockercompose
The ENV variables used in the project are the following:
ENV variable name | Type | Description |
---|---|---|
ROOT_FOLDER | str | root path of the application folder cherrynpl-etl |
PYTHONPATH | str | PYTHONPATH variable to be set as ROOT_FOLDER |
APP_CONF_DIR | str | Path where app.yaml is kept |
MONGODB_USER | str | user name for mongodb |
MONGODB_PASSWORD | str | password for mongodb |
DSMONGO_URL | str | mongodb url example localhost:27107/admin |
MONGODB_NAME | str | database name for mongodb |
DOCUMENT_COLLECTION | str | Name of the collection where documents will be saved |
- Please navigate to the Swagger URL localhost:8080/docs after executing
make dockercompose
- POST/orchestrate-pipeline : This endpoint triggers the ETL job and responds with a string body containing job id of the ETL job
- GET/logs/{id} : This endpoint is used to get the log generated during the ETL job. Please enter the id of the job that you got from the step 2.
- GET/get-docments/ : This endpoint fetches the transformed documents from the Mongodb collection and returns as a list of JSON. There is an optional parameter too for n number of docs to be returned