Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Keep non-discoverable metadata from the nodes #88

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

groman-me
Copy link

@groman-me groman-me commented Apr 27, 2021

Problem

#59 introduced changes to update streams' schemas and metadata before sync.
refresh_streams_schema in stream_utils.py preserves non-discoverable metadata for a stream, but discards field's metadata. So even if a field is excluded with "selected": false in a provided catalog the tap still includes it.

Proposed changes

For all fields in the newly discovered schema keep non-discoverable metadata from the original catalog.
For instance this preserves selected key. Later this key is used here

to determine whether a column should be synced or not.

In a nutshell: it restores the fields metadata behaviour to the version before #59 .

The risk: this can be a breaking change in cases where a provided catalog has wrong/outdated fields metadata. Examples: 1) fields metadata has wrong keys which are being overwritten with newly discovered schema 2) metadata excludes some fields, but they are still being synced. After the change the excluded columns will not be synced.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

This can be breaking changes in cases if fields metadata is used in a not expected way.

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

Copy link

@ers81239 ers81239 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this PR after creating my own for the same issue. Pull #102 looks to take a similar approach except that I also handle the case of field-level metadata for fields that aren't discovered. I also renamed the variables in this function for clarity.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants