Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: more Benefits data migration #3543

Merged
merged 5 commits into from
Nov 13, 2024
Merged

Conversation

thekaveman
Copy link
Member

@thekaveman thekaveman commented Nov 13, 2024

Description

This PR restructures (again) the SQL for the fct_benefits_events model. Follows from:

Here we split processing of Amplitude data into two phases (via intermediate CTEs):

  1. Data definition: extract JSON columns, COALESCE old columns into new as they evolve
  2. Data migration: update any data values that have changed over time e.g. changing the name of a flow, or creating new records from historical records.

Resolves cal-itp/benefits#2521

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

If making changes to dbt models, please run the command poetry run dbt run -s CHANGED_MODEL and include the output in this section of the PR.

$ poetry run dbt run -s +fct_benefits_events
02:13:25  Running with dbt=1.5.1
02:13:27  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
02:13:27  Found 422 models, 963 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 174 sources, 4 exposures, 0 metrics, 0 groups
02:13:27  
02:13:30  Concurrency: 8 threads (target='dev')
02:13:30  
02:13:30  1 of 2 START sql view model kegan_staging.stg_amplitude__benefits_events ....... [RUN]
02:13:31  1 of 2 OK created sql view model kegan_staging.stg_amplitude__benefits_events .. [CREATE VIEW (0 processed) in 0.96s]
02:13:31  2 of 2 START sql table model kegan_mart_benefits.fct_benefits_events ........... [RUN]
02:13:44  2 of 2 OK created sql table model kegan_mart_benefits.fct_benefits_events ...... [CREATE TABLE (26.9m rows, 73.1 GiB processed) in 12.94s]
02:13:44  
02:13:44  Finished running 1 view model, 1 table model in 0 hours 0 minutes and 17.00 seconds (17.00s).
02:13:44  
02:13:44  Completed successfully
02:13:44  
02:13:44  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)

- CTE fct_benefits_events_raw extracts JSON columns and COALESCEs old columns
- CTE fct_benefits_events applies migration / cleanup for data values in fct_benefits_events_raw
- CTE fct_benefits_historic_enrollments converts old-style enrollments in fct_benefits_events to current-style

final table is combination of 2 CTEs: fct_benefits_events + fct_benefits_historic_enrollments
@thekaveman thekaveman self-assigned this Nov 13, 2024
Copy link

Warehouse report 📦

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@thekaveman thekaveman merged commit 1f20b4e into main Nov 13, 2024
4 checks passed
@thekaveman thekaveman deleted the chore/benefits-data-migration branch November 13, 2024 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Further analytics updates for Metabase pipeline
2 participants