Skip to content

Commit 454f966

Browse files
Basic dbt integration guide (#24014)
## Summary & Motivation A basic dbt integration guide that walks users through: - Loading a dbt project into Dagster - Setting upstream asset dependencies - Setting downstream asset dependencies Note that the `manifest.json` file is being committed so that tests pass. ## How I Tested These Changes ## Changelog [New | Bug | Docs] NOCHANGELOG
1 parent b00abe6 commit 454f966

File tree

18 files changed

+451
-1
lines changed

18 files changed

+451
-1
lines changed
Lines changed: 83 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,86 @@
11
---
22
title: Transforming data with dbt
33
sidebar_position: 20
4-
---
4+
last_update:
5+
date: 2024-08-26
6+
author: Nick Roach
7+
---
8+
9+
Dagster orchestrates dbt alongside other technologies, so you can schedule dbt with Spark, Python, etc. in a single data pipeline. Dagster's asset-oriented approach allows Dagster to understand dbt at the level of individual dbt models.
10+
11+
## What you'll learn
12+
13+
- How to import a basic dbt project into Dagster
14+
- How to set upstream and downstream dependencies on non-dbt assets
15+
- How to schedule your dbt assets
16+
17+
<details>
18+
<summary>Prerequisites</summary>
19+
20+
To follow the steps in this guide, you'll need:
21+
22+
- A basic understanding of dbt, DuckDB, and Dagster concepts such as [assets](/todo) and [resources](/todo)
23+
- To install the [dbt](https://docs.getdbt.com/docs/core/installation-overview) and [DuckDB CLIs](https://duckdb.org/docs/api/cli/overview.html)
24+
- To install the following packages:
25+
26+
```shell
27+
pip install dagster duckdb plotly dagster-dbt dbt-duckdb
28+
```
29+
</details>
30+
31+
## Setting up a basic dbt project
32+
33+
Start by downloading this basic dbt project, which includes a few models and a DuckDB backend:
34+
35+
```bash
36+
git clone https://github.com/dagster-io/basic-dbt-project
37+
```
38+
39+
The project structure should look like this:
40+
41+
```
42+
├── README.md
43+
├── dbt_project.yml
44+
├── profiles.yml
45+
├── models
46+
│ └── example
47+
│ ├── my_first_dbt_model.sql
48+
│ ├── my_second_dbt_model.sql
49+
│ └── schema.yml
50+
```
51+
52+
First, you need to point Dagster at the dbt project and ensure Dagster has what it needs to build an asset graph. Create a `definitions.py` in the same directory as the dbt project:
53+
54+
<CodeExample filePath="guides/etl/transform-dbt/dbt_definitions.py" language="python" title="definitions.py" />
55+
56+
## Adding upstream dependencies
57+
58+
Oftentimes, you'll want Dagster to generate data that will be used by downstream dbt models. To do this, add an upstream asset that the dbt project will as a source:
59+
60+
<CodeExample filePath="guides/etl/transform-dbt/dbt_definitions_with_upstream.py" language="python" title="definitions.py" />
61+
62+
Next, you'll add a dbt model that will source the `raw_customers` asset and define the dependency for Dagster. Create the dbt model:
63+
64+
<CodeExample filePath="guides/etl/transform-dbt/basic-dbt-project/models/example/customers.sql" language="sql" title="customers.sql" />
65+
66+
Next, create a `_source.yml` file that points dbt to the upstream `raw_customers` asset:
67+
68+
<CodeExample filePath="guides/etl/transform-dbt/basic-dbt-project/models/example/_source.yml" language="yaml" title="_source.yml_" />
69+
70+
{/* TODO: Maybe screenshot to show the lineage? */}
71+
72+
## Adding downstream dependencies
73+
74+
You may also have assets that depend on the output of dbt models. Next, create an asset that depends on the result of the new `customers` model. This asset will create a histogram of the first names of the customers:
75+
76+
<CodeExample filePath="guides/etl/transform-dbt/dbt_definitions_with_downstream.py" language="python" title="definitions.py" />
77+
78+
## Scheduling dbt models
79+
80+
You can schedule your dbt models by using the `dagster-dbt`'s `build_schedule_from_dbt_selection` function:
81+
82+
<CodeExample filePath="guides/etl/transform-dbt/dbt_definitions_with_schedule.py" language="python" title="Scheduling our dbt models" />
83+
84+
## Next steps
85+
86+
[comment]: <> (TODO: Add link to dbt partitioning guide)

docs/vale/styles/config/vocabularies/Dagster/accept.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ gRPC
1111
REST API
1212
[Ss]ubprocess
1313
Serverless
14+
CLI[s]
15+
uncomment
1416

1517
Airbyte
1618
AirFlow
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
jaffle_shop/README.md
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
2+
target/
3+
dbt_packages/
4+
logs/
5+
*.duckdb
6+
order_count_chart.html
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
2+
# Name your project! Project names should contain only lowercase characters
3+
# and underscores. A good package name should reflect your organization's
4+
# name or the intended use of these models
5+
name: 'basic_dbt_project'
6+
version: '1.0.0'
7+
8+
# This setting configures which "profile" dbt uses for this project.
9+
profile: 'basic_dbt_project'
10+
11+
# These configurations specify where dbt should look for different types of files.
12+
# The `model-paths` config, for example, states that models in this project can be
13+
# found in the "models/" directory. You probably won't need to change these!
14+
model-paths: ["models"]
15+
analysis-paths: ["analyses"]
16+
test-paths: ["tests"]
17+
seed-paths: ["seeds"]
18+
macro-paths: ["macros"]
19+
snapshot-paths: ["snapshots"]
20+
21+
clean-targets: # directories to be removed by `dbt clean`
22+
- "target"
23+
- "dbt_packages"
24+
25+
26+
# Configuring models
27+
# Full documentation: https://docs.getdbt.com/docs/configuring-models
28+
29+
# In this example config, we tell dbt to build all models in the example/
30+
# directory as views. These settings can be overridden in the individual model
31+
# files using the `{{ config(...) }}` macro.
32+
models:
33+
basic_dbt_project:
34+
# Config indicated by + and applies to all files under models/example/
35+
example:
36+
+materialized: view
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
version: 2
2+
sources:
3+
- name: raw
4+
tables:
5+
- name: raw_customers
6+
# highlight-start
7+
meta: # This metadata:
8+
dagster: # Tells dbt where this model's source data is, and
9+
asset_key: ["raw_customers"] # Tells Dagster which asset represents the source data
10+
# highlight-end
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
select
2+
id as customer_id,
3+
first_name,
4+
last_name
5+
from {{ source('raw', 'raw_customers') }} # Define the raw_customers asset as a source
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{{ config(materialized='table') }}
2+
3+
with source_data as (
4+
5+
select 1 as id
6+
union all
7+
select null as id
8+
9+
)
10+
11+
select *
12+
from source_data
13+
where id is not null
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
select *
2+
from {{ ref('my_first_dbt_model') }}
3+
where id = 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2+
version: 2
3+
4+
models:
5+
- name: my_first_dbt_model
6+
description: "A starter dbt model"
7+
columns:
8+
- name: id
9+
description: "The primary key for this table"
10+
data_tests:
11+
- unique
12+
- not_null
13+
14+
- name: my_second_dbt_model
15+
description: "A starter dbt model"
16+
columns:
17+
- name: id
18+
description: "The primary key for this table"
19+
data_tests:
20+
- unique
21+
- not_null

0 commit comments

Comments
 (0)