Skip to content

Commit

Permalink
feat: clone with new owner (Montreal-Analytics#27)
Browse files Browse the repository at this point in the history
* feat: clone with new owner

* chore: add more doc and comments

* feat: simplify the functions

* fix: update ownership model

* rename macro

* style

* qualify macros

* typos
  • Loading branch information
JFrackson authored Jan 31, 2023
1 parent 37f03fc commit 114923b
Show file tree
Hide file tree
Showing 9 changed files with 247 additions and 20 deletions.
35 changes: 35 additions & 0 deletions 2-step_cloning_pattern.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# 2-Step dbt Cloning Pattern

_Credit: [This cloning pattern is inspired by Dan Gooden’s article here from the Airtasker Tribe blog.](https://medium.com/airtribe/test-sql-pipelines-against-production-clones-using-dbt-and-snowflake-2f8293722dd4)_

Cloning is a cost- and time-efficient way of developing dbt models on Snowflake but it can be challenging when your cloning needs traverse different environments with different access controls: i.e. you want to clone a production database for use in development.

A solution for this is to run a 2-step cloning pattern:

1. A production role clones the production database or schema and then changes the ownership of its sub-objects to a developer role, thus creating a developer clone of production. The cloned object is still owned by the production role (which preserves the privilege to drop or replace that clone), but now the developer role has full access of its sub-objects.
2. Developer users use the developer role to clone that developer clone database or schema, thus creating a new personal developer clone for development. The developer role has full ownership of this cloned database and all its sub-objects.

This pattern can be used for cloning a schema or a database. If all the dbt models are stored within a single schema, schema-level cloning is a good option. When dbt is configured to write data to multiple schemata, database-level cloning is a good, more production-like option.

This patterns optimizes for the following:

- **Access Control:** no need to compromise on your access control system, such as by allowing your developer role to have extensive access on production. This pattern takes environmental separation as a given.
- **Flexible Availability:** step 1 can be run on any preferred schedule: the developer clone could be updated hourly, daily, weekly, or any other cadence. This first clone is ideally run after a complete execution of dbt for the freshest data possible.
- **Developer Flexibility:** developers can take personal clones whenever they need to and can even take multiple clones if they have need of more than one concurrent development environment. These developer clones are ideally commonly rotated to keep data fresh and production-like.

## Setup:

1. Update one of your production jobs to include step 1 of the cloning pattern. Here is an example implementation for database-level cloning from production to production_clone:

```bash
dbt build &&
dbt run-operation clone_database \
--args "{'source_database': 'production', 'destination_database': 'production_clone', 'new_owner_role': 'developer_role'}"
```

2. As needed, locally run step 2 of the cloning pattern to create or update personal development clones. Here is an example implementation for database-level cloning from production_clone to an ephemeral database called developer_clone_me:

```bash
dbt run-operation clone_database \
--args "{'source_database': 'production_clone', 'destination_database': 'developer_clone_me'}"
```
69 changes: 58 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This [dbt](https://github.com/dbt-labs/dbt-core) package contains Snowflake-spec
Check [dbt Hub](https://hub.getdbt.com/montreal-analytics/snowflake_utils/latest/) for the latest installation instructions, or [read the docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.

## Prerequisites
Snowflake Utils is compatible with dbt 0.20.0 and later.
Snowflake Utils is compatible with dbt 1.1.0 and later.

----

Expand Down Expand Up @@ -68,28 +68,60 @@ When a variable is configured for a conditon _and_ that condition is matched whe
When compiling or generating docs, the console reports that dbt is using the incremental run warehouse. It isn't actually so. During these operations, only the target warehouse is activated.

### snowflake_utils.clone_schema ([source](macros/clone_schema.sql))
This macro clones the source schema into the destination schema.
This macro is a part of the recommended 2-step Cloning Pattern for dbt development, explained in detail [here](2-step_cloning_pattern.md).

This macro clones the source schema into the destination schema and optionally grants ownership over its tables and views to a new owner.

Note: the owner of the schema is the role that executed the command, but if configured, the owner of its sub-objects would be the new_owner_role. This is important for maintaining and replacing clones and is explained in more detail [here](2-step_cloning_pattern.md).

#### Arguments
* `source_schema` (required): The source schema name
* `destination_schema` (required): The destination schema name
* `source_database` (optional): The source database name
* `destination_database` (optional): The destination database name
* `source_database` (optional): The source database name; default value is your profile's target database.
* `destination_database` (optional): The destination database name; default value is your profile's target database.
* `new_owner_role` (optional): The new ownership role name. If no value is passed, the ownership will remain unchanged.

#### Usage

Call the macro as an [operation](https://docs.getdbt.com/docs/using-operations):

```
dbt run-operation clone_schema \
--args "{'source_schema': 'analytics', 'destination_schema': 'ci_schema'}"
# set the databases and new_owner_role
dbt run-operation clone_schema \
--args "{'source_schema': 'analytics', 'destination_schema': 'ci_schema', 'source_database': 'production', 'destination_database': 'temp_database', 'new_owner_role': 'developer_role'}"
```


### snowflake_utils.clone_database ([source](macros/clone_database.sql))
This macro is a part of the recommended 2-step Cloning Pattern for dbt development, explained in detail [here](2-step_cloning_pattern.md).

This macro clones the source database into the destination database and optionally grants ownership over its schemata and its schemata's tables and views to a new owner.

Note: the owner of the database is the role that executed the command, but if configured, the owner of its sub-objects would be the new_owner_role. This is important for maintaining and replacing clones and is explained in more detail [here](2-step_cloning_pattern.md).

#### Arguments
* `source_database` (required): The source database name
* `destination_database` (required): The destination database name
* `new_owner_role` (optional): The new ownership role name. If no value is passed, the ownership will remain unchanged.

#### Usage

Call the macro as an [operation](https://docs.getdbt.com/docs/using-operations):

```
# for multiple arguments, use the dict syntax
dbt run-operation clone_schema --args "{'source_schema': 'analytics', 'destination_schema': 'ci_schema'}"
dbt run-operation clone_database \
--args "{'source_database': 'production_clone', 'destination_database': 'developer_clone'}"
# set the databases
dbt run-operation clone_schema --args "{'source_schema': 'analytics', 'destination_schema': 'ci_schema', 'source_database': 'production', 'destination_database': 'temp_database'}"
# set the new_owner_role
dbt run-operation clone_database \
--args "{'source_database': 'production_clone', 'destination_database': 'developer_clone', 'new_owner_role': 'developer_role'}"
```

### snowflake_utils.drop_schema ([source](macros/drop_schema.sql))
This macro drops a schema in the selected database (defaults to target database if no database is selected).
This macro drops a schema in the selected database (defaults to target database if no database is selected). A schema can only be dropped by the role that owns it.

#### Arguments
* `schema_name` (required): The schema to drop
Expand All @@ -100,8 +132,23 @@ This macro drops a schema in the selected database (defaults to target database
Call the macro as an [operation](https://docs.getdbt.com/docs/using-operations):

```
# for multiple arguments, use the dict syntax
dbt run-operation drop_schema --args "{'schema_name': 'customers_temp', 'database': 'production'}"
dbt run-operation drop_schema \
--args "{'schema_name': 'customers_temp', 'database': 'production'}"
```

### snowflake_utils.drop_database ([source](macros/drop_database.sql))
This macro drops a database. A database can only be dropped by the role that owns it.

#### Arguments
* `database_name` (required): The database name

#### Usage

Call the macro as an [operation](https://docs.getdbt.com/docs/using-operations):

```
dbt run-operation drop_database \
--args "{'database_name': 'production_clone'}"
```

### snowflake_utils.apply_meta_as_tags ([source](macros/apply_meta_as_tags.sql))
Expand Down
5 changes: 2 additions & 3 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
name: 'snowflake_utils'
version: '0.3.0'
version: '0.4.0'

config-version: 2

require-dbt-version: ">=0.17.0"
require-dbt-version: ">=1.1.0"

source-paths: ["models"]
target-path: "target"
clean-targets: ["target", "dbt_modules"]
test-paths: ["test"]
Expand Down
64 changes: 64 additions & 0 deletions macros/clone_database.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{#
-- This macro clones the source database into the destination database and
-- optionally grants ownership over it, its schemata, and its schemata's tables
-- and views to a new owner.
#}
{% macro clone_database(
source_database,
destination_database,
new_owner_role=''
) %}

{% if source_database and destination_database %}

{{ (log("Cloning existing database " ~ source_database ~
" into database " ~ destination_database, info=True)) }}

{% call statement('clone_database', fetch_result=True, auto_begin=False) -%}
CREATE OR REPLACE DATABASE {{ destination_database }}
CLONE {{ source_database }};
{%- endcall %}

{%- set result = load_result('clone_database') -%}
{{ log(result['data'][0][0], info=True)}}

{% else %}

{{ exceptions.raise_compiler_error("Invalid arguments. Missing source database and/or destination database") }}

{% endif %}

{% if new_owner_role != '' %}

{% set list_schemas_query %}
-- get all schemata within the cloned database to then iterate through them and
-- change their ownership
SELECT schema_name
FROM {{ destination_database }}.information_schema.schemata
WHERE schema_name != 'INFORMATION_SCHEMA'
{% endset %}

{% set results = run_query(list_schemas_query) %}

{% if execute %}
{# Return the first column #}
{% set schemata_list = results.columns[0].values() %}
{% else %}
{% set schemata_list = [] %}
{% endif %}

{% for schema_name in schemata_list %}

{{ snowflake_utils.grant_ownership_on_schema_objects(new_owner_role, schema_name, destination_database) }}

{% endfor %}

{{ log("Grant ownership on " ~ destination_database ~ " to " ~ new_owner_role, info=True)}}

{% call statement('clone_database', fetch_result=True, auto_begin=False) -%}
GRANT ALL PRIVILEGES ON DATABASE {{ destination_database }} TO {{ new_owner_role }};
{%- endcall %}

{% endif %}

{% endmacro %}
18 changes: 17 additions & 1 deletion macros/clone_schema.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
{% macro clone_schema(source_schema, destination_schema, source_database=target.database, destination_database=target.database) %}
{#
-- This macro clones the source schema into the destination schema and
-- optionally grants ownership over it and its tables and views to a new owner.
#}
{% macro clone_schema(
source_schema,
destination_schema,
source_database=target.database,
destination_database=target.database,
new_owner_role=''
) %}

{% if source_schema and destination_schema %}

Expand All @@ -19,4 +29,10 @@

{% endif %}

{% if new_owner_role != '' %}

{{ snowflake_utils.grant_ownership_on_schema_objects(new_owner_role, destination_schema, destination_database) }}

{% endif %}

{% endmacro %}
23 changes: 23 additions & 0 deletions macros/drop_database.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{#
-- This macro drops a database.
#}
{% macro drop_database(database_name) %}

{% if database_name %}

{{ log("Dropping database " ~ database_name ~ "...", info=True) }}

{% call statement('drop_database', fetch_result=True, auto_begin=False) -%}
DROP DATABASE {{ database_name }}
{%- endcall %}

{%- set result = load_result('drop_database') -%}
{{ log(result['data'][0][0], info=True)}}

{% else %}

{{ exceptions.raise_compiler_error("Invalid arguments. Missing database name") }}

{% endif %}

{% endmacro %}
15 changes: 11 additions & 4 deletions macros/drop_schema.sql
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
{% macro drop_schema(schema_name, database=target.database) %}
{#
-- This macro drops a schema in the selected database (defaults to target
-- database if no database is selected).
#}
{% macro drop_schema(
schema_name,
database_name=target.database
) %}

{% if schema_name %}

{{ log("Dropping schema " ~ database ~ "." ~ schema_name ~ "...", info=True) }}
{{ log("Dropping schema " ~ database_name ~ "." ~ schema_name ~ "...", info=True) }}

{% call statement('drop_schema', fetch_result=True, auto_begin=False) -%}
DROP SCHEMA {{ database }}.{{ schema_name }}
DROP SCHEMA {{ database_name }}.{{ schema_name }}
{%- endcall %}

{%- set result = load_result('drop_schema') -%}
Expand All @@ -17,4 +24,4 @@

{% endif %}

{% endmacro %}
{% endmacro %}
36 changes: 36 additions & 0 deletions macros/grant_ownership_on_schema_objects.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{#
-- This macro grants ownership over a schema's tables and views and is
-- optionally called by the clone_schema and clone_database macros.
#}
{% macro grant_ownership_on_schema_objects(
new_owner_role,
destination_schema,
destination_database=target.database
) %}

{% if new_owner_role and destination_schema %}

{{ (log("Granting ownership on " ~ destination_database ~ "." ~ destination_schema ~
" and its tables and views to " ~ new_owner_role, info=True)) }}

{% call statement('grant_ownership_on_schema_objects', fetch_result=True, auto_begin=False) -%}
GRANT USAGE ON SCHEMA {{ destination_database }}.{{ destination_schema }}
TO {{ new_owner_role }};
GRANT OWNERSHIP ON ALL TABLES IN SCHEMA {{ destination_database }}.{{ destination_schema }}
TO {{ new_owner_role }} REVOKE CURRENT GRANTS;
GRANT OWNERSHIP ON ALL VIEWS IN SCHEMA {{ destination_database }}.{{ destination_schema }}
TO {{ new_owner_role }} REVOKE CURRENT GRANTS;
GRANT ALL PRIVILEGES ON SCHEMA {{ destination_database }}.{{ destination_schema }}
TO {{ new_owner_role }};
{%- endcall %}

{%- set result = load_result('grant_ownership_on_schema_objects') -%}
{{ log(result['data'][0][0], info=True)}}

{% else %}

{{ exceptions.raise_compiler_error("Invalid arguments. Missing new owner role and/or destination schema") }}

{% endif %}

{% endmacro %}
2 changes: 1 addition & 1 deletion packages.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
packages:
- package: dbt-labs/dbt_utils
version: ">=0.7.0"
version: [">=0.7.0", "<1.1.0"]

0 comments on commit 114923b

Please sign in to comment.