Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Support WRITE_TRUNCATE as writeDisposition mode #1359

Open
dali-rmw opened this issue Aug 10, 2022 · 8 comments
Open

BigQuery: Support WRITE_TRUNCATE as writeDisposition mode #1359

dali-rmw opened this issue Aug 10, 2022 · 8 comments
Assignees

Comments

@dali-rmw
Copy link

No description provided.

@kolina
Copy link
Contributor

kolina commented Feb 16, 2023

Hi, @dali-rmw! Can you clarify in which scenarios you'd like to use this setting? If I understand correctly from the docs passing writeDisposition=WRITE_TRUNCATE for BQ jobs allows rewriting a destination table for query results. But in Dataform you can create an action and set the type field of the query to table

config { type: "table" }
SELECT 1 AS TEST

In this case the output table for such action will be rewritten on every run so it is basically an equivalent thing. More details here.

If it's not that you want feel free to describe your scenario :)

@bigrogerio
Copy link

hi @dali-rmw! i use dataform in BigQuery/GCP too.

As @kolina said, when you define the config snippet as table, BigQuery always understands this as a writeDisposition=WRITE_TRUNCATE param.

If you want to append data to an existing table, you have to use theincremental type in the config snippet. This would be the same as writeDisposition=WRITE_APPEND param.

Best regards.

@joajen
Copy link

joajen commented Apr 25, 2023

I have a scenario where WRITE_TRUNCATE and config{type:"table"} are not exactly the same thing.

I have a BQ materialized view which uses a particular table table_A as one of its sources.
Table_A gets its data overnight via a script which uses write_truncate to overwrite the data in the table with new data.

Now, when I try that with Dataform config{type: "table"} it actually does a REPLACE of Table_A.

The problem is that materialized views do not like to have their source tables changed. If you try to select from that materialized view you get Materialized view project_A.schema_A.mat_view references table project_A.schema_A.table_A which was deleted and recreated. The view must be deleted and recreated as well.

While I could drop and create the materialized view, this particular one is expensive to run as a one off - we end up not really getting the benefits of the materialized view.

@andres-lowrie
Copy link
Contributor

@joajen This is an interesting use case I'm curious would the expected output be that dataform create the materialized view and pass along options similar to these to BigQuery api

https://cloud.google.com/bigquery/docs/materialized-views-manage#refresh

or are you talking about that table_A upstream dependency of the mat view and would like dataform to write_trunc that table instead of create or replace it?

@joajen
Copy link

joajen commented Sep 6, 2023

@andres-lowrie
Thank you. The second point. I want the upstream dependency (ie base table table_A) to be write_truncate.

BQ has limitations on the materialized views - if you delete and recreate a base table the materialized view refresh will fail. See the last point at https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations

@dali-rmw
Copy link
Author

dali-rmw commented Nov 6, 2023

Hello,

As mentioned by @joajen, the behavior of a table type involves the deletion and recreation of the table. The issue I'm facing is that I create my tables using Terraform. When I specify the type as "table" in Dataform, it results in Dataform deleting and recreating the table. Consequently, my Terraform state registers this as a deleted table and attempts to recreate it each time I run Terraform.

@Ekrekr
Copy link
Contributor

Ekrekr commented Mar 27, 2024

This is a very reasonable request, but it's complex in its implementation, because we can either:

However these options would be best defined at the config per-table level, but:

  • If defined at the table level, Core would have to be changed too.
  • We'd probably want to put a default for these options.
  • We'd need multiple defaults, as options for each API that we call.

I'll keep thinking about what the best solution for this would be.

@Ekrekr
Copy link
Contributor

Ekrekr commented Apr 2, 2024

Note: this implementation for this is similar to the request in #1385, as it interacts with the API in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants