Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Cannot override discovered schema #128

Open
deanmorin opened this issue Oct 7, 2021 · 2 comments · May be fixed by #129
Open

Cannot override discovered schema #128

deanmorin opened this issue Oct 7, 2021 · 2 comments · May be fixed by #129
Labels
bug Something isn't working

Comments

@deanmorin
Copy link
Contributor

deanmorin commented Oct 7, 2021

Describe the bug
Changing the schema for a stream in the catalog file has no affect, since it's alway overwritten with the discovered stream in refresh_streams_schema.

To Reproduce
Steps to reproduce the behavior:

  1. Create a test postgres database with a couple of tables:

    CREATE TABLE a (a integer PRIMARY KEY, data jsonb);
    INSERT INTO a VALUES (1, '{}');
    
  2. Create config files for the tap and target, for example:

    tap_config.json

    {
      "host": "127.0.0.1",
      "port": 5432,
      "user": "myuser",
      "password": "mypass",
      "dbname": "tap_postgres",
      "filter_schemas": "public",
      "logical_poll_total_seconds": 60
    }

    target_config.json

    {
      "host": "127.0.0.1",
      "port": 5432,
      "user": "myuser",
      "password": "mypass",
      "dbname": "target_postgres",
      "default_target_schema": "public"
    }
  3. Install the tap and create catalog.json

    $ mkvirtualenv tap-postgres
    $ pip install pipelinewise-tap-postgres==1.8.1
    $ tap-postgres --config tap_config.json --discover > catalog.json
    # Modify the catalog
    # In the metadata section where breadcrumb = [],  add:
    #             "selected": true,
    #             "replication-method": "FULL_TABLE",
    # and under schema->properties->data->type change it to:
    #             ["null", "string"]
    $ deactivate
  4. Install the target

    $ mkvirtualenv target-postgres
    $ pip install pipelinewise-target-postgres==2.1.1
    $ deactivate
  5. Run the pipeline

    $ ~/.virtualenvs/tap-postgres/bin/tap-postgres \
          --config tap_config.json \
          --properties catalog.json \
        | ~/.virtualenvs/target-postgres/bin/target-postgres \
          --config target_config.json
  6. Check the table created in the target

    target_postgres=# SELECT pg_typeof("data") FROM a;
     pg_typeof
    -----------
     jsonb
    (1 row)
    

Expected behavior
If a catalog file is provided, its schema should take precedence over the discovered schema for that stream. The data type in the target should be character varying.

Screenshots
N/A

Your environment

  • Version of tap: [e.g. 1.8.1]
  • Version of python [e.g. 3.9.7]

Additional context
I discovered this while using meltano.

@deanmorin deanmorin added the bug Something isn't working label Oct 7, 2021
deanmorin added a commit to deanmorin/pipelinewise-tap-postgres that referenced this issue Oct 7, 2021
Fixes transferwise#128

- Moves the bulk of the code out of the context manager
  (open_connection) since it isn't needed.
- Creates private functions for merging existing metadata and schema to
  dicts with those in `new_discovery`.
- Aliases `metadata` (from singer) to `metadata_util`. I kept using
  `metadata` as a local var when developing by accident and breaking
  things.
@deanmorin deanmorin linked a pull request Oct 7, 2021 that will close this issue
13 tasks
astrojuanlu added a commit to astrojuanlu/pipelinewise-tap-postgres that referenced this issue May 11, 2022
@Limess
Copy link

Limess commented Nov 8, 2022

This issue prevents us from upgrading this tap as it broke behaviour in an older version - we have a requirement to mutate the catalog with out current ingestion workflow.

@astrojuanlu
Copy link

See more discussion in #129 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants