Skip to content

sid203/cherrynpl-etl

Repository files navigation

cherrynpl-etl

Simple ETL pipeline

Project setup

Install requirements

pip install -r requirements.txt

Setup Git hooks

pre-commit install #set up git hooks

Keep requirements clean and updated

Please make sure pip-tools is properly installed in your virtual environment with

pip install pip-tools

This should be done by adding the required package in the requirements.in file. Then using pip-tools the requirements.txt file can be reproduced.

pip-compile --output-file requirements.txt --quiet requirements.in
pip-sync requirements.txt

Or using the makefile

make requirements

Local code quality checks

In this repo, we use a series of code quality checks, which enforce certain quality in the code. More precisely, hint typing is performed with mypy, linting with flake8 and automatic code formatting with black. Each of these steps can be run either via command line, or with make:

Check name Terminal Makefile
Typehint mypy --config-file pyproject.toml app/ make typehint
Linting pflake8 --config=pyproject.toml app/ tests/ make lint
Code formatting black app/ tests/ make black
Unit tests coverage run && coverage report make test

Note that these checks run also before each commit via the pre-commit module. Furthermore, in order to set up the pre-commit and get it running before each commit, hooks should be installed with pre-commit install.

How to start fastapi app locally

After recreated the Python environment, move inside this project folder and run:

uvicorn app.main:app --reload --workers 1 --host 0.0.0.0 --port 8080

How to start the application using docker

After recreated the Python environment, move inside this project folder and run:

make dockercompose

ENV variables

The ENV variables used in the project are the following:

ENV variable name Type Description
ROOT_FOLDER str root path of the application folder cherrynpl-etl
PYTHONPATH str PYTHONPATH variable to be set as ROOT_FOLDER
APP_CONF_DIR str Path where app.yaml is kept
MONGODB_USER str user name for mongodb
MONGODB_PASSWORD str password for mongodb
DSMONGO_URL str mongodb url example localhost:27107/admin
MONGODB_NAME str database name for mongodb
DOCUMENT_COLLECTION str Name of the collection where documents will be saved

How to trigger ETL pipeline from Swagger UI

  1. Please navigate to the Swagger URL localhost:8080/docs after executing make dockercompose
  2. POST/orchestrate-pipeline : This endpoint triggers the ETL job and responds with a string body containing job id of the ETL job
  3. GET/logs/{id} : This endpoint is used to get the log generated during the ETL job. Please enter the id of the job that you got from the step 2.
  4. GET/get-docments/ : This endpoint fetches the transformed documents from the Mongodb collection and returns as a list of JSON. There is an optional parameter too for n number of docs to be returned

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published