Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lead score propensity use case for non-ecommerce #273

Open
wants to merge 152 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
5157d42
predicting for only the users with traffic in the past 72h - purchase…
Apr 24, 2024
a67b8a5
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Apr 24, 2024
a23a337
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Apr 29, 2024
7da1640
running inference only for users events in the past 72h
Apr 29, 2024
321a69e
including 72h users for all models predictions
May 1, 2024
f20f2bc
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 1, 2024
da686f1
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 8, 2024
4d96cac
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 8, 2024
ba0cfeb
considering null values in TabWorkflow models
May 8, 2024
3548ccc
deleting unused pipfile
May 9, 2024
8ec9bea
upgrading lib versions
May 9, 2024
42f1938
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 9, 2024
352e2b4
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 10, 2024
02f02ea
implementing reporting preprocessing as a new pipeline
May 13, 2024
2e6bcfd
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 15, 2024
fb6834e
adding more code documentation
May 17, 2024
8a8c357
adding important information on the main README.md and DEVELOPMENT.md
May 18, 2024
efa2750
adding schedule run name and more code documentation
May 22, 2024
3c0badc
implementing a new scheduler using the vertex ai sdk & adding user_id…
May 22, 2024
c1787d8
adding more code documentation
May 22, 2024
8090fd7
adding code doc to the python custom component
May 22, 2024
99a20fa
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 23, 2024
362cc59
adding more code documentation
May 23, 2024
b89e26a
fixing aggregated predictions query
May 23, 2024
d20b18a
adding renovate.json
May 23, 2024
9557ef7
removing unnecessary resources from deployment
May 23, 2024
f85166f
Writing MDS guide
May 23, 2024
85e60ce
adding the MDS developer and troubleshooting documentation
May 24, 2024
96d82f3
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 24, 2024
ed2eacb
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 29, 2024
17e2eca
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 29, 2024
a25842b
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo May 31, 2024
a0859f6
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 3, 2024
e971d96
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 3, 2024
178f834
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 4, 2024
9343f66
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 5, 2024
5e3b7ba
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 6, 2024
58b99c2
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 17, 2024
a013492
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jun 21, 2024
6b85fa8
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jul 9, 2024
75ab01a
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jul 18, 2024
5a0abb9
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jul 23, 2024
73a515e
fixing deployment for activation pipelines and gemini dataset
Jul 24, 2024
94a1707
Update README.md
chmstimoteo Jul 24, 2024
ec2f090
Update README.md
chmstimoteo Jul 24, 2024
2186491
Update README.md
chmstimoteo Jul 24, 2024
09dc6cb
Update README.md
chmstimoteo Jul 24, 2024
97c1df3
removing deprecated api
Jul 24, 2024
08c9057
fixing purchase propensity pipelines names
Jul 24, 2024
05d59fa
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jul 25, 2024
c9067fa
adding extra condition for when there is not enough data for the wind…
Jul 25, 2024
99a9d35
adding more instructions for post deployment and fixing issues when G…
Jul 25, 2024
7cc407b
removing unnecessary comments
Jul 25, 2024
4eac341
adding the number of past days to process in the variables files
Jul 26, 2024
f9b4c09
adding comment about combining data from different ga4 export dataset…
Jul 26, 2024
8c3bd41
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Jul 26, 2024
629d7f2
fixing small issues with feature engineering and ml pipelines
Jul 30, 2024
df2eb3d
fixing hyper parameter tuning for kmeans modeling
Aug 1, 2024
07befc5
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 1, 2024
2873f86
fixing optuna parameters
Aug 1, 2024
0fefa9e
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 1, 2024
8457f2b
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 1, 2024
2f01e78
adding cloud shell image
Aug 8, 2024
b8e6380
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 8, 2024
c27a37b
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 9, 2024
f87c33e
fixing the list of all possible users in the propensity training prep…
Aug 9, 2024
0f830c0
additional guardrails for when there is not enough data
Aug 12, 2024
2ce2929
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 12, 2024
0f552c7
adding more documentation
Aug 13, 2024
d37df6b
adding more doc to feature store
Aug 14, 2024
ea8e356
add feature store documentation
Aug 16, 2024
e55e35f
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 16, 2024
b45521a
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 19, 2024
c9a998c
adding ml pipelines docs
Aug 19, 2024
57dcc05
adding ml pipelines docs
Aug 19, 2024
a55bdca
adding more documentation
Aug 20, 2024
a18d497
Merge branch 'main' into main
chmstimoteo Aug 23, 2024
769b1ee
adding user agent client info
Aug 23, 2024
9e4ebea
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 23, 2024
df18aba
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 23, 2024
8d81c35
fixing scope of client info
Aug 23, 2024
aa8d84a
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 23, 2024
5b0940b
fix
Aug 23, 2024
575b258
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 23, 2024
aa40e21
removing client_info from vertex components
Aug 24, 2024
686ac12
fixing versioning of tf submodules
Aug 26, 2024
201c548
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 26, 2024
d9c53cf
reconfiguring meta providers
Aug 26, 2024
49b10fe
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Aug 27, 2024
87f50e5
fixing issue 187
Sep 11, 2024
c2745e2
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Sep 11, 2024
1bd0c1d
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Sep 16, 2024
ee00d79
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Sep 20, 2024
4653d35
Merge branch 'main' of https://github.com/chmstimoteo/marketing-analy…
Sep 25, 2024
5e8c2e4
chore(deps): upgrade terraform providers and modules version
Oct 1, 2024
164e293
chore(deps): set the provider version
Oct 1, 2024
0749f1c
chore: formatting
Oct 1, 2024
add1311
fix: brand naming
Oct 1, 2024
460b26a
fix: typo
Oct 1, 2024
a8fc1b9
Merge pull request #4 from GoogleCloudPlatform/upgrade-terraform-version
chmstimoteo Oct 4, 2024
447bc7b
Merge branch 'main' of https://github.com/chmstimoteo/marketing-analy…
Oct 4, 2024
007bdf5
fixing secrets issue
Oct 4, 2024
6fab05f
Merge branch 'main' into main
chmstimoteo Oct 4, 2024
d1328eb
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 4, 2024
6f6a22e
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 8, 2024
d7c8f30
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 10, 2024
161aa3f
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 23, 2024
038af7a
implementing secrets region as tf variable
Oct 23, 2024
ce2fa21
implementing secrets region as tf variable
Oct 23, 2024
a823752
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 24, 2024
8ba5a97
last changes requested by lgrangeau
Oct 24, 2024
53a6ccc
documenting keys location better
Oct 24, 2024
6005429
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 28, 2024
39aba30
implementing vpc peering network
Oct 29, 2024
7504d93
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Oct 29, 2024
34ff531
Update README.md
chmstimoteo Nov 6, 2024
7bcd76b
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Nov 14, 2024
69546e7
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Nov 15, 2024
5c5e68c
Merge branch 'GoogleCloudPlatform:multi-property' into multi-property
chmstimoteo Nov 15, 2024
31666a1
Rebase Main into Multi-property (#243)
chmstimoteo Nov 15, 2024
15f7c71
Merge branch 'GoogleCloudPlatform:multi-property' into multi-property
chmstimoteo Nov 15, 2024
3932a3a
supporting property id in the resources
Nov 15, 2024
0ca2ce3
Merge pull request #5 from chmstimoteo/multi-property
chmstimoteo Nov 15, 2024
589ea32
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Nov 21, 2024
feb23c9
fixing iam member roles issues
Nov 22, 2024
e4110c5
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Nov 22, 2024
a423bb6
fixing issue with service account iam resources
Nov 22, 2024
a038e17
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Nov 22, 2024
67b3f6b
fixing issue with connection between vertex and bq
Nov 22, 2024
e27048c
Merge branch 'main' of https://github.com/chmstimoteo/marketing-analy…
Nov 22, 2024
5e7667c
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Nov 22, 2024
c1ccf19
Merge branch 'GoogleCloudPlatform:main' into main
chmstimoteo Dec 4, 2024
94c758b
Update README.md
chmstimoteo Dec 4, 2024
0a0f3f7
Merge branch 'GoogleCloudPlatform:main' into lead-score-propensity
chmstimoteo Dec 9, 2024
6246d7e
Merge branch 'GoogleCloudPlatform:main' into lead-score-propensity
chmstimoteo Dec 9, 2024
9fc6201
Update export-procedures.tf
chmstimoteo Dec 9, 2024
859dc0c
Merge branch 'GoogleCloudPlatform:main' into lead-score-propensity
chmstimoteo Dec 9, 2024
0170ba4
Merge branch 'GoogleCloudPlatform:main' into lead-score-propensity
chmstimoteo Dec 9, 2024
1e6dfcb
adding additional parameters
Dec 10, 2024
83300ed
merging
Dec 10, 2024
cbe0c51
setting variables to other config schedule values
Dec 10, 2024
f7d8a68
implemented terraforming of lead score propensity
Dec 12, 2024
380ab6d
Merge branch 'GoogleCloudPlatform:main' into lead-score-propensity
chmstimoteo Dec 13, 2024
644887e
fixing scheduler state
Dec 13, 2024
ab407a0
continue building pipeline code
Dec 13, 2024
5c7aea2
addming bigquery resources for the lead score use case
Dec 16, 2024
ca3a14c
continue implementing sql code
Dec 17, 2024
b31ea28
all compiled and deployed
Dec 17, 2024
ec8bc04
fixing missings features and tables
Dec 18, 2024
3c9a954
fixing a bug in the backfill
Dec 18, 2024
40de2a5
Merge branch 'GoogleCloudPlatform:main' into lead-score-propensity
chmstimoteo Dec 18, 2024
cdef7d3
fixing last issues
Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
513 changes: 463 additions & 50 deletions config/config.yaml.tftpl

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions infrastructure/terraform/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions infrastructure/terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ Also, this method allows you to extend this solution and develop it to satisfy y
Terraform stores state about managed infrastructure to map real-world resources to the configuration, keep track of metadata, and improve performance. Terraform stores this state in a local file by default, but you can also use a Terraform remote backend to store state remotely. [Remote state](https://developer.hashicorp.com/terraform/cdktf/concepts/remote-backends) makes it easier for teams to work together because all members have access to the latest state data in the remote store.

```bash
SOURCE_ROOT="${HOME}/${REPO}"
cd ${SOURCE_ROOT}
scripts/generate-tf-backend.sh
```
Expand Down
25 changes: 24 additions & 1 deletion infrastructure/terraform/modules/activation/export-procedures.tf
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,33 @@ resource "google_bigquery_routine" "export_churn_propensity_procedure" {
routine_type = "PROCEDURE"
language = "SQL"
definition_body = data.template_file.churn_propensity_csv_export_query.rendered
description = "Export purchase propensity predictions as CSV for GA4 User Data Import"
description = "Export churn propensity predictions as CSV for GA4 User Data Import"
arguments {
name = "prediction_table_name"
mode = "IN"
data_type = jsonencode({ "typeKind" : "STRING" })
}
}

data "template_file" "lead_score_propensity_csv_export_query" {
template = file("${local.source_root_dir}/templates/activation_user_import/lead_score_propensity_csv_export.sqlx")
vars = {
ga4_stream_id = var.ga4_stream_id
export_bucket = module.pipeline_bucket.name
}
}

resource "google_bigquery_routine" "export_lead_score_propensity_procedure" {
project = null_resource.check_bigquery_api.id != "" ? module.project_services.project_id : var.project_id
dataset_id = module.bigquery.bigquery_dataset.dataset_id
routine_id = "export_lead_score_propensity_predictions"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = data.template_file.lead_score_propensity_csv_export_query.rendered
description = "Export lead score propensity predictions as CSV for GA4 User Data Import"
arguments {
name = "prediction_table_name"
mode = "IN"
data_type = jsonencode({ "typeKind" : "STRING" })
}
}
40 changes: 39 additions & 1 deletion infrastructure/terraform/modules/activation/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ locals {
cltv_query_template_file = "cltv_query_template.sqlx"
purchase_propensity_query_template_file = "purchase_propensity_query_template.sqlx"
purchase_propensity_vbb_query_template_file = "purchase_propensity_vbb_query_template.sqlx"
lead_score_propensity_query_template_file = "lead_score_propensity_query_template.sqlx"
lead_score_propensity_vbb_query_template_file = "lead_score_propensity_vbb_query_template.sqlx"
churn_propensity_query_template_file = "churn_propensity_query_template.sqlx"
activation_container_image_id = "activation-pipeline"
docker_repo_prefix = "${var.location}-docker.pkg.dev/${var.project_id}"
Expand Down Expand Up @@ -750,7 +752,7 @@ data "template_file" "churn_propensity_query_template_file" {
}
}

# This resource creates a bucket object using as content the purchase_propensity_query_template_file file.
# This resource creates a bucket object using as content the churn_propensity_query_template_file file.
resource "google_storage_bucket_object" "churn_propensity_query_template_file" {
name = "${local.configuration_folder}/${local.churn_propensity_query_template_file}"
content = data.template_file.churn_propensity_query_template_file.rendered
Expand Down Expand Up @@ -791,6 +793,40 @@ resource "google_storage_bucket_object" "purchase_propensity_vbb_query_template_
bucket = module.pipeline_bucket.name
}

data "template_file" "lead_score_propensity_query_template_file" {
template = file("${local.template_dir}/activation_query/${local.lead_score_propensity_query_template_file}")

vars = {
mds_project_id = var.mds_project_id
mds_dataset_suffix = var.mds_dataset_suffix
}
}

# This resource creates a bucket object using as content the lead_score_propensity_query_template_file file.
resource "google_storage_bucket_object" "lead_score_propensity_query_template_file" {
name = "${local.configuration_folder}/${local.lead_score_propensity_query_template_file}"
content = data.template_file.lead_score_propensity_query_template_file.rendered
bucket = module.pipeline_bucket.name
}

# This resource creates a bucket object using as content the lead_score_propensity_vbb_query_template_file file.
data "template_file" "lead_score_propensity_vbb_query_template_file" {
template = file("${local.template_dir}/activation_query/${local.lead_score_propensity_vbb_query_template_file}")

vars = {
mds_project_id = var.mds_project_id
mds_dataset_suffix = var.mds_dataset_suffix
activation_project_id = var.project_id
dataset = module.bigquery.bigquery_dataset.dataset_id
}
}

resource "google_storage_bucket_object" "lead_score_propensity_vbb_query_template_file" {
name = "${local.configuration_folder}/${local.lead_score_propensity_vbb_query_template_file}"
content = data.template_file.lead_score_propensity_vbb_query_template_file.rendered
bucket = module.pipeline_bucket.name
}

# This data resources creates a data resource that renders a template file and stores the rendered content in a variable.
data "template_file" "activation_type_configuration" {
template = file("${local.template_dir}/activation_type_configuration_template.tpl")
Expand All @@ -802,6 +838,8 @@ data "template_file" "activation_type_configuration" {
purchase_propensity_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.purchase_propensity_query_template_file.output_name}"
purchase_propensity_vbb_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.purchase_propensity_vbb_query_template_file.output_name}"
churn_propensity_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.churn_propensity_query_template_file.output_name}"
lead_score_propensity_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.lead_score_propensity_query_template_file.output_name}"
lead_score_propensity_vbb_query_template_gcs_path = "gs://${module.pipeline_bucket.name}/${google_storage_bucket_object.lead_score_propensity_vbb_query_template_file.output_name}"
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,32 @@ resource "google_bigquery_dataset" "churn_propensity" {
}
}

# This resource creates a BigQuery dataset called `lead_score_propensity`.
resource "google_bigquery_dataset" "lead_score_propensity" {
dataset_id = local.config_bigquery.dataset.lead_score_propensity.name
friendly_name = local.config_bigquery.dataset.lead_score_propensity.friendly_name
project = null_resource.check_bigquery_api.id != "" ? local.lead_score_propensity_project_id : local.feature_store_project_id
description = local.config_bigquery.dataset.lead_score_propensity.description
location = local.config_bigquery.dataset.lead_score_propensity.location
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.feature_store.max_time_travel_hours configuration.
max_time_travel_hours = local.config_bigquery.dataset.lead_score_propensity.max_time_travel_hours
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = false

labels = {
version = "prod"
}

# The lifecycle block allows you to configure the lifecycle of the dataset.
# In this case, the ignore_changes attribute is set to all, which means that
# Terraform will ignore any changes to the dataset and will not attempt to update the dataset.
lifecycle {
ignore_changes = all
}
}

# This resource creates a BigQuery dataset called `customer_lifetime_value`.
resource "google_bigquery_dataset" "customer_lifetime_value" {
dataset_id = local.config_bigquery.dataset.customer_lifetime_value.name
Expand Down Expand Up @@ -300,7 +326,8 @@ module "gemini_insights" {
location = local.config_bigquery.dataset.gemini_insights.location
# The delete_contents_on_destroy attribute specifies whether the contents of the dataset should be deleted when the dataset is destroyed.
# In this case, the delete_contents_on_destroy attribute is set to false, which means that the contents of the dataset will not be deleted when the dataset is destroyed.
delete_contents_on_destroy = true
delete_contents_on_destroy = false
deletion_protection = true

dataset_labels = {
version = "prod",
Expand All @@ -314,7 +341,7 @@ module "gemini_insights" {
# The max_time_travel_hours attribute specifies the maximum number of hours that data in the dataset can be accessed using time travel queries.
# In this case, the maximum time travel hours is set to the value of the local file config.yaml section bigquery.dataset.gemini_insights.max_time_travel_hours configuration.
max_time_travel_hours = local.config_bigquery.dataset.gemini_insights.max_time_travel_hours
deletion_protection = false
deletion_protection = true
time_partitioning = null,
range_partitioning = null,
expiration_time = null,
Expand Down
Loading
Loading