Implement Insert Overwrite Strategy for Incremental Models in Dataform #1692
Replies: 2 comments
-
This is a blocker for us as well. A merge operation that deletes all rows and replaces them entirely is a waste of money and resources, when we could just do a full partition overwrite. |
Beta Was this translation helpful? Give feedback.
-
I know this one is somewhat stale, but it's worth noting that this can be implemented manually using a But I agree that Dataform should absolutely support this pattern natively (a la dbt's |
Beta Was this translation helpful? Give feedback.
-
Objective
In our transition from dbt to Dataform, we've identified a critical feature gap that affects our ability to efficiently manage large-scale data transformations. Specifically, we are seeking the implementation of an
Insert Overwrite
strategy in Dataform, similar to dbt'sincremental_overwrite
mode. This strategy is essential for our use cases where data volumes are significant, and the absence of unique identifiers complicates the use of merge operations.Current Challenge
Our current workflow involves incremental updates to large tables, where simply appending new data or performing merge operations is not viable due to performance concerns and the nature of our data. The dbt
incremental_overwrite
strategy has been instrumental in addressing this challenge by allowing us to:MERGE
operation that conditionally deletes existing partitions in the target table based on the temporary table's data and then inserts the new data.This approach ensures high-performance data updates without requiring unique row identifiers and minimizes the impact on query performance during the update process.
Proposed Solution
We propose that Dataform incorporates an
Insert Overwrite
strategy for incremental models that mirrors the dbt approach. This would involve:Justification
The
Insert Overwrite
strategy is crucial for handling large datasets where performance and data freshness are paramount. Alternative solutions, such as full table replacements or complex merge operations, are not feasible due to their impact on resource utilization and operational efficiency.Additional Context
This feature is especially relevant for our workflows involving historical data analysis and real-time data processing, where maintaining data integrity and query performance is essential.
Also, official feature request:
https://issuetracker.google.com/issues/330743873
Beta Was this translation helpful? Give feedback.
All reactions