This project aims to have a representative dbt project to evaluate the performance of Cosmos, a tool to run dbt Core or dbt Fusion projects as Apache Airflow DAGs and Task Groups with a few lines of code.
This project contains the following files and folders:
- benchmark: Contains step-by-step scripts to benchmark dbt Core and dbt Core with Cosmos using fhir-dbt-analytics
- dags: This folder contains the Python files that represent Airflow DAGs. These are simple DAGs that leverage Astronomer Cosmos.
- dbt: This folder contains the following dbt projects:
- fhir-dbt-analytics: A dbt Core project developed by Google, that interfaces with BigQuery. It contains:
- 2 seeds
- 52 sources
- 185 models
- fhir-dbt-analytics: A dbt Core project developed by Google, that interfaces with BigQuery. It contains:
- Dockerfile: This file contains a versioned Astro Runtime Docker image that provides a differentiated Airflow experience.
- include: This folder contains any additional files that you want to include as part of your project. In this particular case, it contains configuration files.
- packages.txt: Install OS-level packages needed for your project by adding them to this file. It is empty by default.
- requirements.txt: Install Python packages needed for your project by adding them to this file. It is empty by default.
- airflow_settings.yaml: Use this local-only file to specify Airflow Connections, Variables, and Pools instead of entering them in the Airflow UI as you develop DAGs in this project.
Follow these three steps:
- Initialise submodules by using:
git submodule init
git submodule update - Start Airflow on your local machine by running:
astro dev startThis command will spin up five Docker containers on your machine, each for a different Airflow component:
- Postgres: Airflow's Metadata Database
- Scheduler: The Airflow component responsible for monitoring and triggering tasks
- DAG Processor: The Airflow component responsible for parsing DAGs
- API Server: The Airflow component responsible for serving the Airflow UI and API
- Triggerer: The Airflow component responsible for triggering deferred tasks
When all five containers are ready, the command will open the browser to the Airflow UI at http://localhost:8080/. You should also be able to access your Postgres Database at 'localhost:5432/postgres' with username 'postgres' and password 'postgres'.
Note: If you already have either of the above ports allocated, you can either stop your existing Docker containers or change the port.
- Create a BigQuery connection in Airflow, with the name
bigquery_conn. This can be done by following these instructions. This is an example of how the setup can look, considering you pre-generated a BigQuery service account JSON key file:
If you have an Astronomer account, deploying it to Astro Cloud is simple. For deploying instructions, refer to Astronomer documentation: https://www.astronomer.io/docs/astro/deploy-code/