Skip to content

astronomer/cosmos-benchmark

Repository files navigation

Overview

This project aims to have a representative dbt project to evaluate the performance of Cosmos, a tool to run dbt Core or dbt Fusion projects as Apache Airflow DAGs and Task Groups with a few lines of code.

Project Contents

This project contains the following files and folders:

  • benchmark: Contains step-by-step scripts to benchmark dbt Core and dbt Core with Cosmos using fhir-dbt-analytics
  • dags: This folder contains the Python files that represent Airflow DAGs. These are simple DAGs that leverage Astronomer Cosmos.
  • dbt: This folder contains the following dbt projects:
    • fhir-dbt-analytics: A dbt Core project developed by Google, that interfaces with BigQuery. It contains:
      • 2 seeds
      • 52 sources
      • 185 models
  • Dockerfile: This file contains a versioned Astro Runtime Docker image that provides a differentiated Airflow experience.
  • include: This folder contains any additional files that you want to include as part of your project. In this particular case, it contains configuration files.
  • packages.txt: Install OS-level packages needed for your project by adding them to this file. It is empty by default.
  • requirements.txt: Install Python packages needed for your project by adding them to this file. It is empty by default.
  • airflow_settings.yaml: Use this local-only file to specify Airflow Connections, Variables, and Pools instead of entering them in the Airflow UI as you develop DAGs in this project.

Run Your Project Locally

Follow these three steps:

  1. Initialise submodules by using:
git submodule init
git submodule update 
  1. Start Airflow on your local machine by running:
astro dev start

This command will spin up five Docker containers on your machine, each for a different Airflow component:

  • Postgres: Airflow's Metadata Database
  • Scheduler: The Airflow component responsible for monitoring and triggering tasks
  • DAG Processor: The Airflow component responsible for parsing DAGs
  • API Server: The Airflow component responsible for serving the Airflow UI and API
  • Triggerer: The Airflow component responsible for triggering deferred tasks

When all five containers are ready, the command will open the browser to the Airflow UI at http://localhost:8080/. You should also be able to access your Postgres Database at 'localhost:5432/postgres' with username 'postgres' and password 'postgres'.

Note: If you already have either of the above ports allocated, you can either stop your existing Docker containers or change the port.

  1. Create a BigQuery connection in Airflow, with the name bigquery_conn. This can be done by following these instructions. This is an example of how the setup can look, considering you pre-generated a BigQuery service account JSON key file:
example-bq-conn-ui

Deploy Your Project to Astronomer

If you have an Astronomer account, deploying it to Astro Cloud is simple. For deploying instructions, refer to Astronomer documentation: https://www.astronomer.io/docs/astro/deploy-code/

About

Astro project for fhir-dbt-analytics dbt project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •