Skip to content

dkalamar/scalable-airflow-template

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scalable Airflow Setup Template

This repo's goal is to get you going fast and scalable with your Airflow on Kubernetes Setup.

Features

👶 Easy Setup: Using cookiecutter to fill in the blanks.

🔥 Disposable Infrastructure: Using helm and some premade commands, we can destroy and re-deploy the entire infrastructure easily.

🚀 Cost-Efficient: We use kubernetes as the tasks' engine. Airflow scheduler will run each task on a new pod and delete it upon completion. Allowing us to scale according to workload using the minimal amount of resources.

🔩 Decoupled Executor: Another great advantage of using Kubernetes as the task runner is - decoupling orchestration from execution. You can read more about it in We're All Using Airflow Wrong and How to Fix It.

🏃 Dynamically Updated Workflows: We use Git-Sync containers. Those will allow us to update the workflows using git alone. No need to redeploy Airflow on each workflow change.

Installation

$ cookiecutter https://github.com/talperetz/scalable-airflow-template

Cookicutter Options Explained

  • airflow_executor: You can use Kubernetes for execution with both Celery and Kubernetes as executors. To learn more checkout Scale Your Data Pipelines with Airflow and Kubernetes
  • local_airflow_image_name: image name. required if you want to build your own Airflow image.
  • airflow_image_repository: ECR repository link. required if you want to build your own Airflow image.
  • git_repo_to_sync_dags: link to the scalable_airflow repository with your new workflows on github.
  • git_username_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_username" | base64
  • git_password_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_password" | base64
  • fernet_key: You can fill fernet_key option with the response from this command:
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Usage

Prerequisites

$ brew install kubectl
$ brew install helm

for custom Airflow image you'll also need:
Kubernetes cluster set with autoscaler
ECR Repository for the docker image

It is also recommended to set up Kubernetes Dashboard

Default Airflow Image

$ make ns #Creates namespace for easy build and teardown
$ make deploy

At this point you should see the stack deployed to kubernetes.
To see Airflow's UI:

$ make ui pod=[webserver-pod-name]

Custom Airflow Image

After changing the config/docker/Dockerfile and scripts/entrypoint.sh
Build your custom airflow image

$ make build

Push to ECR

$ make push

Deploy to Kubernetes

$ make deploy

To see Airflow's UI:

$ make ui pod=[webserver-pod-name]

Fine Tuning The Setup

This template uses:

Airflow Helm Chart: Airflow stable helm chart

Docker Image: https://github.com/puckel/docker-airflow

for more details and fine tuning of the setup please refer to the links above.


About

fast and scalable Airflow on Kubernetes Setup.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 34.6%
  • Python 27.4%
  • Makefile 21.0%
  • Dockerfile 17.0%