-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable retry
support for Microbatch models
#10751
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10751 +/- ##
==========================================
+ Coverage 89.00% 89.03% +0.02%
==========================================
Files 181 182 +1
Lines 23126 23195 +69
==========================================
+ Hits 20583 20651 +68
- Misses 2543 2544 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…d circular imports In our next commit we're gonna modify `dbt/contracts/graph/nodes.py` to import the `BatchType` as part of our work to implement dbt retry for microbatch model nodes. Unfortunately, the import in `nodes.py` creates a circular dependency because `dbt/artifacts/schemas/results.py` imports from `nodes.py` and `dbt/artifacts/schemas/run/v5/run.py` imports from that `results.py`. Thus the new import creates a circular import. Now this _shouldn't_ be necessary as nothing in artifacts should import from the rest of dbt-core. However, we do. We should fix this, but this is also out of scope for this segement of work.
… microbatch models
c0e16de
to
8d7ab70
Compare
I have 2-ish tests to add, and then we should be good to go! |
…-refresh behavior This is necessary because of retry. Say on the initial run the microbatch model succeeds on 97% of it's batches. Then on retry it does the last 3%. If the retry of the microbatch model executes in full refresh mode it _might_ blow away the 97% of work that has been done. This edge case seems to be adapter specific.
…ialSuccess In the previous commit we made it so that retries of microbatch models wouldn't run in full refresh mode when the microbatch model to retry has batches already specified from the prior run. This is only problematic when the run being retried was a full refresh AND all the batches for a given microbatch model failed. In that case WE DO want to do a full refresh for the given microbatch model. To better outline the problem, consider the following: * a microbatch model had a begin of `2020-01-01` and has been running this way for awhile * the begin config has changed to `2024-01-01` and dbt run --full-refresh gets run * every batch for an microbatch model fails * on dbt retry the the relation is said to exist, and the now out of range data (2020-01-01 through 2023-12-31) is never purged To avoid this, all we have to do is ONLY pass the batch information for partially successful microbatch models. Note: microbatch models only have a partially successful status IFF they have both successful and failed batches.
retry
support for Microbatch models
I still need to open a |
core/dbt/events/types.py
Outdated
@@ -1293,9 +1293,12 @@ def code(self) -> str: | |||
return "Q012" | |||
|
|||
def message(self) -> str: | |||
if self.status == "error": | |||
if self.status == "error": # or 'PARTIAL SUCCESS' in self.status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, I'll get it fixed
"begin": { | ||
"default": null | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this change was already made in https://github.com/dbt-labs/schemas.getdbt.com, so I'm unsure why this PR is doing it for dbt-core but 🤷
The updates to schemas.getdbt.com can be found here dbt-labs/schemas.getdbt.com#61 |
Resolves #10624
Resolves #10715
Problem
dbt was taking an all or nothing approach to microbatch models when running
dbt retry
. We want the command to instead be smarter, only retrying batches that failed for a given microbatch modelSolution
PartialSuccess
batch_results
to the RunResult objectfailed
batches to the corresponding model during theretry
taskPartialSuccess
microbatch modelChecklist