Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference Hub not updated during incremental load in Snowflake #128

Open
OGrohmann opened this issue Sep 15, 2023 · 6 comments · May be fixed by #189
Open

Reference Hub not updated during incremental load in Snowflake #128

OGrohmann opened this issue Sep 15, 2023 · 6 comments · May be fixed by #189
Labels
bug Something isn't working

Comments

@OGrohmann
Copy link

We are using the ref_hub macro in an incremental model in our Snowflake environment to build our reference tables. We have observed, that the initial load for the reference hub works fine, however, during the incremental loads new ref hub keys are not being added. The SQL syntax looks ok, however it seems not to work with Snowflake.
This issue is only observed in the reference hub model. The ref satellite is being correctly updated.
However, as a consequence new entries are missing in the final reference table, as the new hub entries are missing.
Steps to reproduce:

  1. Create new reference hub, example:

{{ config(materialized='incremental') }}

{%- set yaml_metadata -%}
source_models: staging_model
ref_keys:
- KEY1
- KEY2
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ datavault4dbt.ref_hub(source_models=metadata_dict['source_models'],
ref_keys=metadata_dict['ref_keys']) }}

  1. Build the model
  2. Add new entries in the source with new key combination
  3. Re-run model to trigger incremental update
  4. Observe that new entries have not been added.
@bschlottfeldt
Copy link
Contributor

Hi @OGrohmann , thank you for reporting the issue, gonna test here on my side and let you know what I find.
Kind regards,
Barbara

@bschlottfeldt bschlottfeldt self-assigned this Sep 15, 2023
@bschlottfeldt bschlottfeldt added the bug Something isn't working label Sep 15, 2023
@bschlottfeldt
Copy link
Contributor

bschlottfeldt commented Sep 20, 2023

Hi @OGrohmann just tested both the initial load and incremental load here for the reference hub and it works fine here (also using Snowflake adapter).

My guess is the Load Datetimestamp for your records in your stage that you want to add to your ref hub are either earlier or equal to the Load Datetimestamp already in your Reference Hub.
Please make sure that the Load Datetimestamp in your stage for the new records is greater than the ones already in your Reference Hub, otherwise the incremental logic wont add your records.
Let me know if this is the case and if it solves your issue.

Kind regards,
Barbara

@OGrohmann
Copy link
Author

OGrohmann commented Oct 16, 2023

Hi @bschlottfeldt ,
sorry for the late feedback. We tried to provide an example to reproduce the issue and were quite surprised that we couldn't reproduce it in our artificial example. Finally we found it out how to reproduce. In your stage you have to assign the data type "TEXT" to one of your reference key columns. This is a synonymous for STRING or VARCHAR in Snowflake. However, the ghost record value for TEXT is NULL, whereas for VARCHAR it will be (unknown)/(error). Seems like this is causing the issue for incremental loads. However, it can be easily fixed by a workaround, i.e. replacing the TEXT data type with VARCHAR.
Can you please check and confirm the issue?
Thanks, Oliver.

@bschlottfeldt
Copy link
Contributor

@OGrohmann hi Oliver, i will test it next week, sorry about the late response. Thanks for pointing out the issue.
Kind regards,
Barbara

@bschlottfeldt bschlottfeldt removed their assignment Jan 22, 2024
@bschlottfeldt
Copy link
Contributor

@OGrohmann i can confirm the problem happens as you have described, this part of the compiled code in the incremental run of ref_hub macro wont return any results if one of the keys in the reference hub was inserted as null. Since the ghost record default datatype for TEXT in Snowflake was defined by our macro as NULL, it inserts null as the reference key for the ghost record. The presence of a null in the query of distinct_target_ref_keys the filter for NOT IN wont work, it will produce no results, therefore no records will be inserted in the incremental run, even though there are new records. FYI @tkirschke @thoffmann-sf this should be prevented in the ref_hub macro by adding a where refkey is not null clause in the distinct_target_ref_keys. and also should be fixed by defining the default value for ghost record of datatype TEXT as '(unknown)' for the zero key and '(error)' for the error key.

records_to_insert AS (
    SELECT
        refkey,
        ldts,
        rsrc
    FROM earliest_ref_key_over_all_sources
    WHERE CONCAT_WS('||', refkey) NOT IN (SELECT * FROM distinct_target_ref_keys)
)

Kind regards,
Barbara

@tkirschke
Copy link
Member

Hi @OGrohmann ,

I have added a Branch to this Issue where I tried to fix the problem, by excluding NULL refkeys, and defining the default value for TEXT columns, as @bschlottfeldt suggested.

Can you check if the fix works for you?

Best regards,
Tim

@tkiehn tkiehn linked a pull request Jun 21, 2024 that will close this issue
@tkiehn tkiehn linked a pull request Jun 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants