Skip to content

Commit c2c21d0

Browse files
dehumepskinnerthyme
authored andcommitted
[du] add backoff for duckdb connections (dagster-io#26408)
## Summary & Motivation Update code snippets for sections 4 and 6 to wrap DuckDB connection in backoff to prevent race conditions when materializing all assets (Not an issue after these assets switch to using resources). ## How I Tested These Changes ## Changelog > Insert changelog entry or delete this section.
1 parent 255a6b0 commit c2c21d0

File tree

5 files changed

+53
-7
lines changed

5 files changed

+53
-7
lines changed

docs/dagster-university/pages/dagster-essentials/lesson-4/coding-practice-taxi-zones-asset.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,13 @@ def taxi_zones() -> None:
4141
);
4242
"""
4343

44-
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
44+
conn = backoff(
45+
fn=duckdb.connect,
46+
retry_on=(RuntimeError, duckdb.IOException),
47+
kwargs={
48+
"database": os.getenv("DUCKDB_DATABASE"),
49+
},
50+
max_retries=10,
51+
)
4552
conn.execute(query)
4653
```

docs/dagster-university/pages/dagster-essentials/lesson-4/coding-practice-trips-by-week-asset.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,20 @@ from datetime import datetime, timedelta
6262
from . import constants
6363

6464
import pandas as pd
65+
from dagster._utils.backoff import backoff
6566

6667
@asset(
6768
deps=["taxi_trips"]
6869
)
6970
def trips_by_week() -> None:
70-
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
71+
conn = backoff(
72+
fn=duckdb.connect,
73+
retry_on=(RuntimeError, duckdb.IOException),
74+
kwargs={
75+
"database": os.getenv("DUCKDB_DATABASE"),
76+
},
77+
max_retries=10,
78+
)
7179

7280
current_date = datetime.strptime("2023-03-01", constants.DATE_FORMAT)
7381
end_date = datetime.strptime("2023-04-01", constants.DATE_FORMAT)

docs/dagster-university/pages/dagster-essentials/lesson-4/loading-data-into-a-database.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage
1313
```python
1414
import duckdb
1515
import os
16+
from dagster._utils.backoff import backoff
1617
```
1718

1819
2. Copy and paste the code below into the bottom of the `trips.py` file. Note how this code looks similar to the asset definition code for the `taxi_trips_file` and the `taxi_zones` assets:
@@ -42,7 +43,14 @@ Now that you have a query that produces an asset, let’s use Dagster to manage
4243
);
4344
"""
4445

45-
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
46+
conn = backoff(
47+
fn=duckdb.connect,
48+
retry_on=(RuntimeError, duckdb.IOException),
49+
kwargs={
50+
"database": os.getenv("DUCKDB_DATABASE"),
51+
},
52+
max_retries=10,
53+
)
4654
conn.execute(query)
4755
```
4856

@@ -54,7 +62,7 @@ Now that you have a query that produces an asset, let’s use Dagster to manage
5462

5563
3. Next, a variable named `query` is created. This variable contains a SQL query that creates a table named `trips`, which sources its data from the `data/raw/taxi_trips_2023-03.parquet` file. This is the file created by the `taxi_trips_file` asset.
5664

57-
4. A variable named `conn` is created, which defines the connection to the DuckDB database in the project. To do this, it uses the `.connect` method from the `duckdb` library, passing in the `DUCKDB_DATABASE` environment variable to tell DuckDB where the database is located.
65+
4. A variable named `conn` is created, which defines the connection to the DuckDB database in the project. To do this, we first wrap everything with the Dagster utility function `backoff`. Using the backoff function ensures that multiple assets can use DuckDB safely without locking resources. The backoff function takes in the function we want to call (in this case the `.connect` method from the `duckdb` library), any errors to retry on (`RuntimeError` and `duckdb.IOException`), the max number of retries, and finally, the arguments to supply to the `.connect` DuckDB method. Here we are passing in the `DUCKDB_DATABASE` environment variable to tell DuckDB where the database is located.
5866

5967
The `DUCKDB_DATABASE` environment variable, sourced from your project’s `.env` file, resolves to `data/staging/data.duckdb`. **Note**: We set up this file in Lesson 2 - refer to this lesson if you need a refresher. If this file isn’t set up correctly, the materialization will result in an error.
6068

docs/dagster-university/pages/dagster-essentials/lesson-6/setting-up-a-database-resource.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,14 @@ Throughout this module, you’ve used DuckDB to store and transform your data. E
1414
)
1515
def taxi_trips() -> None:
1616
...
17-
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
17+
conn = backoff(
18+
fn=duckdb.connect,
19+
retry_on=(RuntimeError, duckdb.IOException),
20+
kwargs={
21+
"database": os.getenv("DUCKDB_DATABASE"),
22+
},
23+
max_retries=10,
24+
)
1825
...
1926
```
2027

docs/dagster-university/pages/dagster-essentials/lesson-6/using-resources-in-assets.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,14 @@ def taxi_trips() -> None:
4848
);
4949
"""
5050

51-
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
51+
conn = backoff(
52+
fn=duckdb.connect,
53+
retry_on=(RuntimeError, duckdb.IOException),
54+
kwargs={
55+
"database": os.getenv("DUCKDB_DATABASE"),
56+
},
57+
max_retries=10,
58+
)
5259
conn.execute(query)
5360
```
5461

@@ -100,7 +107,14 @@ To refactor `taxi_trips` to use the `database` resource, we had to:
100107
3. Replace the lines that connect to DuckDB and execute a query:
101108

102109
```python
103-
conn = duckdb.connect(os.getenv("DUCKDB_DATABASE"))
110+
conn = backoff(
111+
fn=duckdb.connect,
112+
retry_on=(RuntimeError, duckdb.IOException),
113+
kwargs={
114+
"database": os.getenv("DUCKDB_DATABASE"),
115+
},
116+
max_retries=10,
117+
)
104118
conn.execute(query)
105119
```
106120

@@ -111,6 +125,8 @@ To refactor `taxi_trips` to use the `database` resource, we had to:
111125
conn.execute(query)
112126
```
113127

128+
Notice that we no longer need to use the `backoff` function. The Dagster `DuckDBResource` handles this functionality for us.
129+
114130
---
115131

116132
## Before you continue

0 commit comments

Comments
 (0)