diff --git a/README.md b/README.md index 52546ef2..f966e833 100644 --- a/README.md +++ b/README.md @@ -346,6 +346,30 @@ the lower bound of the next record (common for date ranges). | 0 | 1 | | 2 | 3 | | 4 | 5 | + +#### unique_combination_of_columns ([source](macros/schema_tests/unique_combination_of_columns.sql)) +This test confirms that the combination of columns is unique. For example, the +combination of month and product is unique, however neither column is unique +in isolation. + +We generally recommend testing this uniqueness condition by either: +* generating a [surrogate_key](#surrogate_key-source) for your model and testing +the uniqueness of said key, OR +* passing the `unique` test a coalesce of the columns (as discussed [here](https://docs.getdbt.com/docs/testing#section-testing-expressions)). + +However, these approaches can become non-perfomant on large data sets, in which +case we recommend using this test instead. + +**Usage:** +```yaml +- name: revenue_by_product_by_month + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - month + - product +``` + --- ### SQL helpers #### get_query_results_as_dict ([source](macros/sql/get_query_results_as_dict.sql)) diff --git a/integration_tests/data/schema_tests/data_unique_combination_of_columns.csv b/integration_tests/data/schema_tests/data_unique_combination_of_columns.csv new file mode 100644 index 00000000..09d64b53 --- /dev/null +++ b/integration_tests/data/schema_tests/data_unique_combination_of_columns.csv @@ -0,0 +1,7 @@ +month,product,revenue +2019-01-01,jaffle,500 +2019-01-01,lamington,100 +2019-01-01,pavlova,600 +2019-02-01,jaffle,300 +2019-02-01,lamington,300 +2019-02-01,pavlova,400 diff --git a/integration_tests/models/schema_tests/schema.yml b/integration_tests/models/schema_tests/schema.yml index 0ee26da0..ab7fff81 100644 --- a/integration_tests/models/schema_tests/schema.yml +++ b/integration_tests/models/schema_tests/schema.yml @@ -80,3 +80,10 @@ models: upper_bound_column: coalesce(valid_to, '2099-01-01') partition_by: subscription_id gaps: required + + - name: data_unique_combination_of_columns + tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - month + - product diff --git a/macros/schema_tests/unique_combination_of_columns.sql b/macros/schema_tests/unique_combination_of_columns.sql new file mode 100644 index 00000000..b7326470 --- /dev/null +++ b/macros/schema_tests/unique_combination_of_columns.sql @@ -0,0 +1,22 @@ +{% macro test_unique_combination_of_columns(model) %} + +{%- set columns = kwargs.get('combination_of_columns', kwargs.get('arg')) %} + +{%- set columns_csv=columns | join(', ') %} + +with validation_errors as ( + + select + {{ columns_csv }} + from {{ model }} + + group by {{ columns_csv }} + having count(*) > 1 + +) + +select count(*) +from validation_errors + + +{% endmacro %}