You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Incremental loading for a satellite with batch loaded data (multiple rows for the same pk, with different hashdiff, but the same load_datetime) doesn't work as it should. Because of the way the CTE is written this skips rows of unique data that should be included. This is because the window functions don't maintain ordering.
Create a raw staging table with multiple entries with the same PK and load_datetime, and different payloads
Create a stg table using the staging table as source, hash the pk, and the payload columns
Create a sat table, with incremental load and source filtering
Check the row count for the stg and satelitte tables, these will be different from expected.
Example I used to replicate this error + potential solution. example.zip
Expected behavior
All unique rows should be selected. This can be fixed by applying the row_number() function in the source_data CTE, and then using it in the order by clause of the LAG function in the unique_source_records CTE. See attached files for example. Unique rows are being filtered out because applying the window functions in two separate CTE's means that the row ordering is different for each CTE.
Thank you for this report! On the surface this does look like a legitimate bug - we will look to test this our end as soon as we can and get back to you
Describe the bug
Incremental loading for a satellite with batch loaded data (multiple rows for the same pk, with different hashdiff, but the same load_datetime) doesn't work as it should. Because of the way the CTE is written this skips rows of unique data that should be included. This is because the window functions don't maintain ordering.
Environment
dbt version: 1.7.13
automate_dv version: 0.10.2
Database/Platform: Snowflake
To Reproduce
Steps to reproduce the behavior:
Example I used to replicate this error + potential solution.
example.zip
Expected behavior
All unique rows should be selected. This can be fixed by applying the row_number() function in the source_data CTE, and then using it in the order by clause of the LAG function in the unique_source_records CTE. See attached files for example. Unique rows are being filtered out because applying the window functions in two separate CTE's means that the row ordering is different for each CTE.
AB#5344
The text was updated successfully, but these errors were encountered: