Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use empty dict for schema of JSON/JSONB columns #239

Conversation

edgarrmondragon
Copy link
Member

No description provided.

@visch
Copy link
Member

visch commented Sep 18, 2023

Hmm but you still want the target to know this is an object right? Are you saying all of the values here

id (int) data (jsonb)
-------- ------------
1        {"nested": "data"}
2        ["an", "array"]
3        3.14

Don't currently work? If so that's surprising to me!

If they don't work the type_dict function should get updated. Another solution that solves my problem and I think yours would be something like returning

{"type": ["object", "null"]}

Which at least lets the target know that the data should be an object and can be put into the target system (database etc) as a json / variant object.

By just returning "" we're saying that the type can be anything not just an object which isn't true.

@edgarrmondragon
Copy link
Member Author

By just returning "" we're saying that the type can be anything not just an object which isn't true.

I think it's actually true because according to the spec, object represents a map:

Objects are the mapping type in JSON. They map “keys” to “values”. In JSON, the “keys” must always be strings. Each of these pairs is conventionally referred to as a “property”.

But a JSON/JSONB column can have not just objects, but also arrays, strings and numbers.

Hmm but you still want the target to know this is an object right? Are you saying all of the values here

I'd want the target to know this can be any JSON stuff. The target should be able to handle that in a similar manner to MeltanoLabs/target-postgres#183.

Wdyt @visch?

@edgarrmondragon edgarrmondragon changed the title fix: Use empty object for schema of JSON/JSONB columns fix: Use empty dict for schema of JSON/JSONB columns Sep 18, 2023
@visch
Copy link
Member

visch commented Sep 18, 2023

By just returning "" we're saying that the type can be anything not just an object which isn't true.

I think it's actually true because according to the spec, object represents a map:

Objects are the mapping type in JSON. They map “keys” to “values”. In JSON, the “keys” must always be strings. Each of these pairs is conventionally referred to as a “property”.

But a JSON/JSONB column can have not just objects, but also arrays, strings and numbers.

Hmm but you still want the target to know this is an object right? Are you saying all of the values here

I'd want the target to know this can be any JSON stuff. The target should be able to handle that in a similar manner to MeltanoLabs/target-postgres#183.

Wdyt @visch?

Thanks for the lesson in JSON Schema :D I didn't realize that for objects. Implementation makes sense :D

@edgarrmondragon
Copy link
Member Author

edgarrmondragon commented Sep 18, 2023

I think tests are failing due to meltano/sdk#1969

That was it 🎉

@edgarrmondragon edgarrmondragon force-pushed the 238-generate-fully-variant-json-schema-type-for-columns-of-type-jsonjsonb branch from 9301bf0 to 7281442 Compare September 25, 2023 16:04
@edgarrmondragon edgarrmondragon marked this pull request as ready for review September 25, 2023 16:08
@edgarrmondragon edgarrmondragon requested a review from visch October 12, 2023 22:41
@visch
Copy link
Member

visch commented Oct 16, 2023

Tried testing this locally and hit an issue

create table test_jsonb(jsonbcol jsonb)
insert into test_jsonb (jsonbcol) values ('{}'), ('3.14'), ('["list","o","stuff"]')
Run invocation could not be completed as block failed: Cannot start plugin tap-postgres: Catalog discovery failed: command ['/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/bin/tap-postgres', '--config', '/home/visch/git/tap-postgres/.meltano/run/tap-postgres/tap.f516c931-fbcf-4a46-b66e-080fa5aa7c5c.config.json', '--discover'] returned 1 with stderr:
 Traceback (most recent call last):
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/bin/tap-postgres", line 8, in <module>
    sys.exit(TapPostgres.cli())
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 516, in cli
    tap = cls(  # type: ignore[operator]
  File "/home/visch/git/tap-postgres/tap_postgres/tap.py", line 37, in __init__
    super().__init__(*args, **kwargs)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 568, in __init__
    super().__init__(
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 103, in __init__
    self.mapper.register_raw_streams_from_catalog(self.catalog)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 165, in catalog
    self._catalog = self.input_catalog or self._singer_catalog
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 286, in _singer_catalog
    for stream in self.streams.values()
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 128, in streams
    for stream in self.load_streams():
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 318, in load_streams
    for stream in self.discover_streams():
  File "/home/visch/git/tap-postgres/tap_postgres/tap.py", line 498, in discover_streams
    for catalog_entry in self.catalog_dict["streams"]
  File "/home/visch/git/tap-postgres/tap_postgres/tap.py", line 485, in catalog_dict
    result["streams"].extend(self.connector.discover_catalog_entries())
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/connectors/sql.py", line 494, in discover_catalog_entries
    catalog_entry = self.discover_catalog_entry(
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/connectors/sql.py", line 445, in discover_catalog_entry
    schema = table_schema.to_dict()
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/typing.py", line 254, in to_dict
    return self.type_dict  # type: ignore[no-any-return]
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/typing.py", line 704, in type_dict
    merged_props.update(w.to_dict())
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/typing.py", line 578, in to_dict
    type_dict = append_type(type_dict, "null")
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/helpers/_typing.py", line 59, in append_type
    raise ValueError(
ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.

Copy link
Member

@visch visch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on PR

@visch
Copy link
Member

visch commented Oct 16, 2023

Tried testing this locally and hit an issue

create table test_jsonb(jsonbcol jsonb)
insert into test_jsonb (jsonbcol) values ('{}'), ('3.14'), ('["list","o","stuff"]')
Run invocation could not be completed as block failed: Cannot start plugin tap-postgres: Catalog discovery failed: command ['/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/bin/tap-postgres', '--config', '/home/visch/git/tap-postgres/.meltano/run/tap-postgres/tap.f516c931-fbcf-4a46-b66e-080fa5aa7c5c.config.json', '--discover'] returned 1 with stderr:
 Traceback (most recent call last):
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/bin/tap-postgres", line 8, in <module>
    sys.exit(TapPostgres.cli())
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 516, in cli
    tap = cls(  # type: ignore[operator]
  File "/home/visch/git/tap-postgres/tap_postgres/tap.py", line 37, in __init__
    super().__init__(*args, **kwargs)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 568, in __init__
    super().__init__(
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 103, in __init__
    self.mapper.register_raw_streams_from_catalog(self.catalog)
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 165, in catalog
    self._catalog = self.input_catalog or self._singer_catalog
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 286, in _singer_catalog
    for stream in self.streams.values()
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 128, in streams
    for stream in self.load_streams():
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/tap_base.py", line 318, in load_streams
    for stream in self.discover_streams():
  File "/home/visch/git/tap-postgres/tap_postgres/tap.py", line 498, in discover_streams
    for catalog_entry in self.catalog_dict["streams"]
  File "/home/visch/git/tap-postgres/tap_postgres/tap.py", line 485, in catalog_dict
    result["streams"].extend(self.connector.discover_catalog_entries())
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/connectors/sql.py", line 494, in discover_catalog_entries
    catalog_entry = self.discover_catalog_entry(
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/connectors/sql.py", line 445, in discover_catalog_entry
    schema = table_schema.to_dict()
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/typing.py", line 254, in to_dict
    return self.type_dict  # type: ignore[no-any-return]
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/typing.py", line 704, in type_dict
    merged_props.update(w.to_dict())
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/typing.py", line 578, in to_dict
    type_dict = append_type(type_dict, "null")
  File "/home/visch/git/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/helpers/_typing.py", line 59, in append_type
    raise ValueError(
ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.

I'm confused though as my local pytest fails with the same error on multiple tests

FAILED tests/test_core.py::test_temporal_datatypes - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.
FAILED tests/test_core.py::test_jsonb_json - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.
FAILED tests/test_core.py::test_decimal - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.
FAILED tests/test_core.py::test_invalid_python_dates - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.
FAILED tests/test_replication_key.py::test_null_replication_key_with_start_date - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.
FAILED tests/test_replication_key.py::test_null_replication_key_without_start_date - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.
FAILED tests/test_ssl.py::test_ssl - ValueError: Could not append type because the JSON schema for the dictionary `{}` appears to be invalid.

Trying to understand this maybe it's a local setup thing hmm

@edgarrmondragon
Copy link
Member Author

Tried testing this locally and hit an issue

@visch You might need to do poetry install

@visch
Copy link
Member

visch commented Nov 1, 2023

Since the tests pass this seems good, merging

@visch visch merged commit 3d4db67 into main Nov 1, 2023
4 checks passed
@visch visch deleted the 238-generate-fully-variant-json-schema-type-for-columns-of-type-jsonjsonb branch November 1, 2023 03:14
@Lawiss
Copy link

Lawiss commented Nov 28, 2023

Hello @edgarrmondragon, I tried the tap with your new changes but I get an error when extracting a table with a JSONB column containing only arrays of objects :

singer_sdk.helpers._typing.EmptySchemaTypeError: Could not detect type from empty type_dict. Did you forget to define a property in the stream schema?
Full stack trace

Traceback (most recent call last):
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/bin/tap-postgres", line 8, in <module>
    sys.exit(TapPostgres.cli())
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 501, in invoke
    tap.sync_all()
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 460, in sync_all
    stream.sync()
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1194, in sync
    raise ex
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1187, in sync
    for _ in self._sync_records(context=context):
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1109, in _sync_records
    self._write_record_message(record)
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 851, in _write_record_message
    for record_message in self._generate_record_messages(record):
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 827, in _generate_record_messages
    record = conform_record_data_types(
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/helpers/_typing.py", line 383, in conform_record_data_types
    rec, unmapped_properties = _conform_record_data_types(record, schema, level, None)
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/helpers/_typing.py", line 423, in _conform_record_data_types
    if isinstance(elem, list) and is_uniform_list(property_schema):
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/helpers/_typing.py", line 128, in is_uniform_list
    is_array_type(property_schema) is True
  File "/Users/luis/projects/entropeak/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.10/site-packages/singer_sdk/helpers/_typing.py", line 242, in is_array_type
    raise EmptySchemaTypeError
singer_sdk.helpers._typing.EmptySchemaTypeError: Could not detect type from empty type_dict. Did you forget to define a property in the stream schema?

My test table can be created and populated with the following SQL script:

CREATE TABLE public.test_meltano (
	id int4 NOT NULL,
	packagings jsonb NOT NULL,
	CONSTRAINT test_meltano_pkey PRIMARY KEY (id)
);

INSERT INTO public.test_meltano (id, packagings) VALUES(1, '[{"type": "DEPOT_BAG", "other": "", "quantity": 1}]'::jsonb);
INSERT INTO public.test_meltano (id, packagings) VALUES(2, '[{"type": "BIG_BAG", "other": "", "quantity": 2}, {"type": "DEPOT_BAG", "other": "", "quantity": 1}]'::jsonb);
INSERT INTO public.test_meltano (id, packagings) VALUES(3, '[]'::jsonb);

Do you have any clue ?

@edgarrmondragon
Copy link
Member Author

@Lawiss Thanks for trying it out. I think we need a guard upstream to end early if the property schema is empty: https://github.com/meltano/sdk/blob/039c06f8b6c07a3cb33028759ca65e806226a6ad/singer_sdk/helpers/_typing.py#L424. I might get to it at some point this week, but feel free to send us a PR is you wish.

Another option is to revert this PR and think of a better approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Generate fully variant JSON schema type for columns of type JSON/JSONB
3 participants