[BUG] incremental loading method for satellites missing data #233

crystalgtaylor91 · 2024-05-13T03:26:09Z

Describe the bug
Incremental loading for a satellite with batch loaded data (multiple rows for the same pk, with different hashdiff, but the same load_datetime) doesn't work as it should. Because of the way the CTE is written this skips rows of unique data that should be included. This is because the window functions don't maintain ordering.

Environment

dbt version: 1.7.13
automate_dv version: 0.10.2
Database/Platform: Snowflake

To Reproduce
Steps to reproduce the behavior:

Create a raw staging table with multiple entries with the same PK and load_datetime, and different payloads
Create a stg table using the staging table as source, hash the pk, and the payload columns
Create a sat table, with incremental load and source filtering
Check the row count for the stg and satelitte tables, these will be different from expected.

Example I used to replicate this error + potential solution.
example.zip

Expected behavior
All unique rows should be selected. This can be fixed by applying the row_number() function in the source_data CTE, and then using it in the order by clause of the LAG function in the unique_source_records CTE. See attached files for example. Unique rows are being filtered out because applying the window functions in two separate CTE's means that the row ordering is different for each CTE.

AB#5344

DVAlexHiggs · 2024-05-13T16:27:19Z

Thank you for this report! On the surface this does look like a legitimate bug - we will look to test this our end as soon as we can and get back to you

isaacsummers · 2024-10-30T21:29:19Z

also seeing this bug, want to bring this back to attention

crystalgtaylor91 added the bug Something isn't working label May 13, 2024

crystalgtaylor91 assigned DVAlexHiggs May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] incremental loading method for satellites missing data #233

[BUG] incremental loading method for satellites missing data #233

crystalgtaylor91 commented May 13, 2024 •

edited by azure-boards bot

Loading

DVAlexHiggs commented May 13, 2024 •

edited

Loading

isaacsummers commented Oct 30, 2024

[BUG] incremental loading method for satellites missing data #233

[BUG] incremental loading method for satellites missing data #233

Comments

crystalgtaylor91 commented May 13, 2024 • edited by azure-boards bot Loading

DVAlexHiggs commented May 13, 2024 • edited Loading

isaacsummers commented Oct 30, 2024

crystalgtaylor91 commented May 13, 2024 •

edited by azure-boards bot

Loading

DVAlexHiggs commented May 13, 2024 •

edited

Loading