Skip to content

Commit

Permalink
Merge pull request #92 from dbt-labs/add-detect-column-differences
Browse files Browse the repository at this point in the history
Add `compare_which_columns_differ`
  • Loading branch information
dave-connors-3 authored Mar 19, 2024
2 parents 96b3b95 + f502c14 commit 1f6552f
Show file tree
Hide file tree
Showing 9 changed files with 157 additions and 1 deletion.
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Useful macros when performing data audits
- [compare\_all\_columns (source)](#compare_all_columns-source)
- [Usage:](#usage-1)
- [Arguments:](#arguments)
- [compare\_which\_columns\_differ (source)](#compare_which_columns_differ-source)
- [## compare\_row\_counts (source)](#-compare_row_counts-source)
- [Usage:](#usage-2)
- [Arguments:](#arguments-1)
Expand Down Expand Up @@ -372,6 +373,49 @@ flag.
dbt test --select stg_customers --store-failures
```

## compare_which_columns_differ ([source](macros/compare_which_columns_differ.sql))
This macro generates SQL that can be used to detect which common columns between two relations
contain any value level changes. It does not return the magnitude of the change, only whether or not a difference has occurred.
This can be useful when comparing two versions of a model between development and production environments.


```sql

{% set prod_relation=adapter.get_relation(
database=target.database,
schema="prod_schema",
identifier="fct_orders"
) -%}

{% set dev_relation=ref('fct_orders') %}

{{ audit_helper.compare_which_columns_differ(
a_relation=prod_relation,
b_relation=dev_relation,
primary_key="order_id",
exclude_columns=["tax_amount"]
) }}


Results:


| column_name | has_difference |
|-------------|----------------|
| order_id | False |
| customer_id | False |
| order_date | True |
| status | False |
| amount | True |



```
Arguments:
* `a_relation` and `b_relation`: The [relations](https://docs.getdbt.com/reference#relation)
you want to compare.
* `primary_key` (required): The primary key of the model used to join the relations to ensure that the same rows are being compared.
* `exclude_columns` (optional): Any columns you wish to exclude from the validation.
## ## compare_row_counts ([source](macros/compare_row_counts.sql))
This macro does a simple comparison of the row counts in two relations.

Expand Down
17 changes: 17 additions & 0 deletions integration_tests/models/compare_which_columns_differ.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% set a_relation=ref('data_compare_which_columns_differ_a')%}

{% set b_relation=ref('data_compare_which_columns_differ_b') %}

-- lowercase for CI

select
lower(column_name) as column_name,
has_difference
from (

{{ audit_helper.compare_which_columns_differ(
a_relation=a_relation,
b_relation=b_relation,
primary_key="id"
) }}
) as macro_output
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{% set a_relation=ref('data_compare_which_columns_differ_a')%}

{% set b_relation=ref('data_compare_which_columns_differ_b') %}


select
lower(column_name) as column_name,
has_difference
from (

{{ audit_helper.compare_which_columns_differ(
a_relation=a_relation,
b_relation=b_relation,
primary_key="id",
exclude_columns=["becomes_null"]
) }}

) as macro_output
12 changes: 11 additions & 1 deletion integration_tests/models/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,17 @@ models:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_without_summary')

- name: compare_which_columns_differ
tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_which_columns_differ')

- name: compare_which_columns_differ_exclude_cols
tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_which_columns_differ_exclude_cols')

- name: compare_row_counts
tests:
- dbt_utils.equality:
compare_model: ref('expected_results__compare_row_counts')
compare_model: ref('expected_results__compare_row_counts')
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
id,value_changes,becomes_null,becomes_not_null,does_not_change
1,pink,22,a,dave
2,blue,33,,dave
3,green,44,c,dave
4,yellow,55,d,dave
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
id,value_changes,becomes_null,becomes_not_null,does_not_change
1,red,22,a,dave
2,blue,,b,dave
3,green,44,c,dave
4,yellow,55,d,dave
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
column_name,has_difference
id,false
value_changes,true
becomes_null,true
becomes_not_null,true
does_not_change,false
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
column_name,has_difference
id,false
value_changes,true
becomes_not_null,true
does_not_change,false
46 changes: 46 additions & 0 deletions macros/compare_which_columns_differ.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{% macro compare_which_columns_differ(a_relation, b_relation, primary_key, exclude_columns=[]) %}
{{ return(adapter.dispatch('compare_which_columns_differ', 'audit_helper')(a_relation, b_relation, primary_key, exclude_columns)) }}
{% endmacro %}

{% macro default__compare_which_columns_differ(a_relation, b_relation, primary_key, exclude_columns=[]) %}

{% set column_names = dbt_utils.get_filtered_columns_in_relation(from=a_relation, except=exclude_columns) %}

with bool_or as (

select
true as anchor
{% for column in column_names %}
{% set column_name = adapter.quote(column) %}
{% set compare_statement %}
((a.{{ column_name }} != b.{{ column_name }})
or (a.{{ column_name }} is null and b.{{ column_name }} is not null)
or (a.{{ column_name }} is not null and b.{{ column_name }} is null))
{% endset %}

, {{ dbt.bool_or(compare_statement) }} as {{ column | lower }}_has_difference

{% endfor %}
from {{ a_relation }} as a
inner join {{ b_relation }} as b
on a.{{ primary_key }} = b.{{ primary_key }}

)

{% for column in column_names %}

select
'{{ column }}' as column_name,
{{ column | lower }}_has_difference as has_difference

from bool_or

{% if not loop.last %}

union all

{% endif %}

{% endfor %}

{% endmacro %}

0 comments on commit 1f6552f

Please sign in to comment.