Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Incremental' tables should support schema changes #373

Open
BenBirt opened this issue Aug 16, 2019 · 5 comments
Open

'Incremental' tables should support schema changes #373

BenBirt opened this issue Aug 16, 2019 · 5 comments

Comments

@BenBirt
Copy link
Collaborator

BenBirt commented Aug 16, 2019

(disclaimer: I haven't tested how the following works in all cases or against all warehouses, it may be that we already support what follows in some ways / or not, I'm not sure yet.)

Right now it's unclear what happens to an incremental table if the user changes its schema (i.e. adds a new column or deletes an old one, or perhaps changes the type of an existing one). We should check the behaviour of these cases against our currently-supported warehouses.

It would be great if we could support automatically running e.g. ALTER TABLE statements to get these cases to work. The tricky thing here will be figuring out precisely what table alterations need to happen, whether we need to diff old vs. new schemas somehow, or whether we can just update the schema of the table to a new state idempotently without caring what the old state of the table might be.

If it turns out to be impossible / really hard to support this kind of behaviour, we should at least check that dataform does something that the user might expect, e.g. throw an error depending on the case.

@BenBirt BenBirt added the P2 label Aug 19, 2019
@BenBirt BenBirt added P3 and removed P2 labels Sep 2, 2019
@BenBirt BenBirt closed this as completed Dec 11, 2019
@lewish lewish reopened this Apr 22, 2020
@BenBirt BenBirt added P4 and removed P3 labels Oct 23, 2020
@deniszaboronsky
Copy link

@BenBirt I came on to raise this issue. In BigQuery (Dataform version 1.22.0) we are seeing that new columns require us to re-create the base table as otherwise what Dataform compiles to is:

INSERT INTO target_table (Col1, Col2, Col3)
AS (
SELECT
    Col1, Col2, Col3, Col4
FROM some_table)

Where the query used to be

SELECT
    Col1, Col2, Col3
FROM some_table

and is now

SELECT
    Col1, Col2, Col3, Col4
FROM some_table

This of course means that we lose the new column that we had hoped to add. Just though it was worth flagging that we would love an alter table statement. Let me know if any extra info would be useful

@Ekrekr Ekrekr added feature-request and removed P4 labels Feb 10, 2023
@SourabhKr
Copy link

@Ekrekr is there any update on this request?
The requirement here is very basic and usually there are workarounds running in all implementations.

@Ekrekr
Copy link
Contributor

Ekrekr commented Mar 26, 2024

@Ekrekr is there any update on this request?

Still not yet.

The requirement here is very basic

Not really, there's a lot of usability implications that would need to be tested.

Contributions are welcome!

@chateletlealSephora
Copy link

chateletlealSephora commented May 16, 2024

Hello,

Is there any updates regarding this feature.

dbt manages it through configuration, why not dataform ? Event if it doesn't work for records or repeated records, it is really useful for a part of the schema changes use-cases.

https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change

image

@federicojasson
Copy link

I wanted to add something to this issue.

The workaround I've implemented is to add idempotent DDL statements in an operation.

Example:

IF (SELECT COUNT(*) FROM some_dataset.INFORMATION_SCHEMA.TABLES WHERE table_name = 'some_table') > 0 THEN
  ALTER TABLE some_dataset.some_table
  ADD COLUMN IF NOT EXISTS some_column STRING;
END IF;

This works, the column is added, but the first time the incremental action is run, it doesn't pick up the new column. My guess is that the table schema is being fetched too early in the execution pipeline.

Note: I made sure the incremental action is run after the operation using dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants