Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add sql-datatype to the SDK discovery and catalog #1872

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

BuzzCutNorman
Copy link
Contributor

@BuzzCutNorman BuzzCutNorman commented Jul 21, 2023

This is an attempt to add the sql_datatype to the discovery process and will be reflected in the catalog.

Closes #1323


📚 Documentation preview 📚: https://meltano-sdk--1872.org.readthedocs.build/en/1872/

@codecov
Copy link

codecov bot commented Jul 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.91%. Comparing base (8474a24) to head (fe8e2a7).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1872      +/-   ##
==========================================
+ Coverage   90.88%   90.91%   +0.02%     
==========================================
  Files          62       62              
  Lines        5165     5182      +17     
  Branches      667      671       +4     
==========================================
+ Hits         4694     4711      +17     
  Misses        330      330              
  Partials      141      141              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BuzzCutNorman
Copy link
Contributor Author

I have run across instances where the discovery process runs into SQL data types that SQLAlchemy cannot handle. When this occurs, the following warnings are logged to the console.

PS C:\development\projects\my-test-stuff> meltano invoke tap-mssql --discover > discovery.json
2023-07-21T19:40:07.498116Z [info     ] Environment 'dev' is active
C:\development\MeltanoContribute\sdk\singer_sdk\connectors\sql.py:463: SAWarning: Did not recognize type 'hierarchyid' of column 'OrganizationNode'
  for column_def in inspected.get_columns(table_name, schema=schema_name):
C:\development\MeltanoContribute\sdk\singer_sdk\connectors\sql.py:463: SAWarning: Did not recognize type 'geography' of column 'SpatialLocation'
  for column_def in inspected.get_columns(table_name, schema=schema_name):
C:\development\MeltanoContribute\sdk\singer_sdk\connectors\sql.py:463: SAWarning: Did not recognize type 'hierarchyid' of column 'DocumentNode'
  for column_def in inspected.get_columns(table_name, schema=schema_name):

The sql-datatype is presented as a NullType()

        {
          "breadcrumb": [
            "properties",
            "OrganizationNode"
          ],
          "metadata": {
            "inclusion": "available",
            "sql-datatype": "NullType()"
          }
        },

cc @edgarrmondragon

@BuzzCutNorman
Copy link
Contributor Author

The warnings are being captured, the message expanded, and presented back to the console at the info level . The warnings now look like this.

PS C:\development\projects\dev-tap-mssql-discover-sqltype> meltano invoke tap-mssql --discover > discovery.jsonl
2023-08-08T17:09:38.074328Z [info     ] Environment 'dev' is active
2023-08-08 10:09:39,636 | INFO     | sqlconnector         | Discovery warning: 'HumanResources-Employee' Did not recognize type 'hierarchyid' of column 'OrganizationNode'
2023-08-08 10:09:39,730 | INFO     | sqlconnector         | Discovery warning: 'Person-Address' Did not recognize type 'geography' of column 'SpatialLocation'
2023-08-08 10:09:39,859 | INFO     | sqlconnector         | Discovery warning: 'Production-Document' Did not recognize type 'hierarchyid' of column 'DocumentNode'
2023-08-08 10:09:39,890 | INFO     | sqlconnector         | Discovery warning: 'Production-ProductDocument' Did not recognize type 'hierarchyid' of column 'DocumentNode'

@BuzzCutNorman
Copy link
Contributor Author

After looking at this more I update the message to this format:

PS C:\development\projects\dev-tap-mssql-discover-sqltype> meltano invoke tap-mssql --discover > discovery.jsonl
2023-08-08T17:50:34.642130Z [info     ] Environment 'dev' is active
2023-08-08 10:50:36,401 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'hierarchyid' of column 'OrganizationNode' in 'HumanResources-Employee'
2023-08-08 10:50:36,512 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'geography' of column 'SpatialLocation' in 'Person-Address'
2023-08-08 10:50:36,684 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'hierarchyid' of column 'DocumentNode' in 'Production-Document'
2023-08-08 10:50:36,747 | INFO     | sqlconnector         | Discovery warning: Did not recognize type 'hierarchyid' of column 'DocumentNode' in 'Production-ProductDocument'

Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BuzzCutNorman! Left a few suggestions and opened #1903 which will make sense once taps output this type of metadata 😁

singer_sdk/connectors/sql.py Outdated Show resolved Hide resolved
Comment on lines 494 to 497
for column_def in inspected.get_columns(table_name, schema=schema_name):
sql_datatypes[
str(column_def["name"])
] = self.discover_catalog_entry_sql_datatype(column_def["type"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you feel comfortable adding tests for this? A good place might be in tests/core/test_connector_sql.py since this is a connector method. I think it could even be parametrized with many examples of column types.

Copy link

codspeed-hq bot commented Nov 28, 2024

CodSpeed Performance Report

Merging #1872 will not alter performance

Comparing BuzzCutNorman:1323-add-sql-datatype-to-discovery-and-catalog (fe8e2a7) with main (8474a24)

Summary

✅ 6 untouched benchmarks

@edgarrmondragon edgarrmondragon force-pushed the 1323-add-sql-datatype-to-discovery-and-catalog branch from 553dec4 to fc5ecc6 Compare November 28, 2024 16:05
@edgarrmondragon
Copy link
Collaborator

One thing that unfortunately isn't very clear to me is how are targets supposed to consume this information since a target isn't aware of the tap's metadata. It might be better if this metadata lived in the schema, at least that way it would be emitted with SCHEMA messages and could be used by the target.

The official Singer docs only explain the field as Represents the datatype of a database column, and pipelinewise's taps implement but the target don't seem to use it in any way:

https://github.com/search?q=org%3Atransferwise+sql-datatype+language%3APython&type=code&l=Python

Otherwise, who is the consumer of this metadata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants