Repository for the dbt courses

dbt documentation

https://courses.getdbt.com/courses https://docs.getdbt.com/guides/best-practices https://docs.getdbt.com/docs/introduction https://docs.getdbt.com/reference/references-overview

Snowflake instance

https://qr92018.eu-west-2.aws.snowflakecomputing.com/

Setup

Install dbt in a virtual environment

pip install --no-dependencies -r requirements.txt

Create a ~/.dbt/profiles.yml

personal_profile:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: ****
      user: ****
      password: ****
      role: ****
      database: ****
      warehouse: ****
      schema: ****
      threads: 8
      client_session_keep_alive: False
      query_tag: ****
      connect_retries: 0
      connect_timeout: 10
      retry_on_database_errors: False
      retry_all: False
      reuse_connections: False

Check connection

dbt deps
dbt debug

Create database, schemas and tables

dbt run-operation create_db_objects
dbt run-operation create_sample_src_tables

Build and test dbt models.

dbt build

Notes on materializations in dbt

Materialization defines how dbt builds the models (select statements).

Tables: Copy of the data from upstream into a NEW table.
- Higher build time and more storage usage, but shorter query time
- If new record added in upstream stable, model/table is rebuilt from scratch.
Views: No copy, data remains in the upstream table, and the query is run every time we do a select on the view.
- Quicker build time and less storage usage, but longer query time
- If new record added in upstream stable, it will be part of the view/query, but slower.
Ephemeral: Brings CTE into downstream models (another select statement), but no persistent storage, do not exist in the DB.
- Can be used if too many CTEs in a model, and it makes sense to break it down, or that the CTE can be reused in other downstream models, but we don't want to store the data. Help reduces the amount of tables in the WH, but harder to debug, cannot be queried directly.
Incremental: Only brings in new records in upstream tables, do not rebuild the table from scratch
Snapshot: Look at changed records, if anything changes, bring the updated records as a NEW row - no loss of information
- Used for type 2 slowly changing dimension tables: https://docs.getdbt.com/docs/build/snapshots
- Preserve the history of changing fields (e.g. dobjects, sec master)
- Snapshots: https://courses.getdbt.com/courses/take/advanced-materializations/lessons/30195287-implementing-snapshots
- Usually applied on source tables

Generally, start with a view, when it takes too long to query, "upgrade" the model to a table. When the table takes too long to build, consider upgrading to an incremental table. Upgrading to an incremental table is not straightforward, follow the dbt documentation

Node selection syntax

https://docs.getdbt.com/reference/node-selection/syntax

Tests

Native tests and packages (dbt_expectations) in yml
Singular tests: sql files in the tests/ folder, with reference to models. Tests business logic and shows up in the lineage graph.
Generic tests, macro-like, parameterized sql, which are reusable across models. Usually start with a singular test (pure sql) then "promote" to a generic tests for reusability.
Override native tests by creating a generic tests with the same name, e.g 'not_null', for customization

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.github		.github
.state		.state
analyses		analyses
macros		macros
models		models
scripts		scripts
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
.sqlfluff		.sqlfluff
.sqlfluffignore		.sqlfluffignore
.yamllint		.yamllint
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository for the dbt courses

dbt documentation

Snowflake instance

Setup

Notes on materializations in dbt

Node selection syntax

Tests

About

Releases

Packages

Contributors 2

Languages

gbourniq/dbt-fundamentals

Folders and files

Latest commit

History

Repository files navigation

Repository for the dbt courses

dbt documentation

Snowflake instance

Setup

Notes on materializations in dbt

Node selection syntax

Tests

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages