Skip to content

Latest commit

 

History

History
105 lines (79 loc) · 3.74 KB

README.md

File metadata and controls

105 lines (79 loc) · 3.74 KB

Scalable Airflow Setup Template

This repo's goal is to get you going fast and scalable with your Airflow on Kubernetes Setup.

Features

👶 Easy Setup: Using cookiecutter to fill in the blanks.

🔥 Disposable Infrastructure: Using helm and some premade commands, we can destroy and re-deploy the entire infrastructure easily.

🚀 Cost-Efficient: We use kubernetes as the tasks' engine. Airflow scheduler will run each task on a new pod and delete it upon completion. Allowing us to scale according to workload using the minimal amount of resources.

🔩 Decoupled Executor: Another great advantage of using Kubernetes as the task runner is - decoupling orchestration from execution. You can read more about it in We're All Using Airflow Wrong and How to Fix It.

🏃 Dynamically Updated Workflows: We use Git-Sync containers. Those will allow us to update the workflows using git alone. No need to redeploy Airflow on each workflow change.

Installation

$ cookiecutter https://github.com/talperetz/scalable-airflow-template

Cookicutter Options Explained

  • airflow_executor: You can use Kubernetes for execution with both Celery and Kubernetes as executors. To learn more checkout Scale Your Data Pipelines with Airflow and Kubernetes
  • local_airflow_image_name: image name. required if you want to build your own Airflow image.
  • airflow_image_repository: ECR repository link. required if you want to build your own Airflow image.
  • git_repo_to_sync_dags: link to the scalable_airflow repository with your new workflows on github.
  • git_username_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_username" | base64
  • git_password_in_base_64: You can convert strings to base64 via shell with:
$ echo -n "github_password" | base64
  • fernet_key: You can fill fernet_key option with the response from this command:
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Usage

Prerequisites

$ brew install kubectl
$ brew install helm

for custom Airflow image you'll also need:
Kubernetes cluster set with autoscaler
ECR Repository for the docker image

It is also recommended to set up Kubernetes Dashboard

Default Airflow Image

$ make deploy

At this point you should see the stack deployed to kubernetes.
To see Airflow's UI:

$ make ui pod=[webserver-pod-name]

Custom Airflow Image

After changing the config/docker/Dockerfile and scripts/entrypoint.sh
Build your custom airflow image

$ make build

Push to ECR

$ make push

Deploy to Kubernetes

$ make deploy

To see Airflow's UI:

$ make ui pod=[webserver-pod-name]

Fine Tuning The Setup

This template uses:

Airflow Helm Chart: Airflow stable helm chart

Docker Image: https://github.com/puckel/docker-airflow

for more details and fine tuning of the setup please refer to the links above.