Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: singer_sdk.exceptions.InvalidStreamSortException: Unsorted data detected in stream. Latest value '24676548053040' is smaller than previous max '24676550681472'. #297

Closed
florian-ernst-alan opened this issue Nov 27, 2023 · 8 comments · Fixed by #301

Comments

@florian-ernst-alan
Copy link

florian-ernst-alan commented Nov 27, 2023

When trying to use LOG_BASED replication, the import randomly fails with the following error message:

2023-11-27T16:07:15.433338Z [info     ] 2023-11-27 17:07:15,430 | ERROR    | tap-postgres         | An unhandled error occurred while syncing 'public-metabase_field' cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.433468Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.433572Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1187, in sync cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.433669Z [info     ]     for _ in self._sync_records(context=context): cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.433964Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1111, in _sync_records cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.434096Z [info     ]     self._increment_stream_state(record, context=current_context) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.434768Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/tap_postgres/client.py", line 342, in _increment_stream_state cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.434887Z [info     ]     increment_state(           cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.434977Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/helpers/_state.py", line 228, in increment_state cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.435618Z [info     ]     raise InvalidStreamSortException(msg) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.435704Z [info     ] singer_sdk.exceptions.InvalidStreamSortException: Unsorted data detected in stream. Latest value '24676548053040' is smaller than previous max '24676550681472'. cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.436494Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.436594Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/bin/tap-postgres", line 8, in <module> cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.438098Z [info     ]     sys.exit(TapPostgres.cli()) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.438191Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.438280Z [info     ]     return self.main(*args, **kwargs) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.438946Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/click/core.py", line 1078, in main cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.439050Z [info     ]     rv = self.invoke(ctx)      cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.439806Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.439911Z [info     ]     return ctx.invoke(self.callback, **ctx.params) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.440004Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.440687Z [info     ]     return __callback(*args, **kwargs) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.440774Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 501, in invoke cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.441410Z [info     ]     tap.sync_all()             cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.441491Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/tap_base.py", line 460, in sync_all cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.442144Z [info     ]     stream.sync()              cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.442223Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1194, in sync cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.442301Z [info     ]     raise ex                   cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.443135Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1187, in sync cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.443227Z [info     ]     for _ in self._sync_records(context=context): cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.443908Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/streams/core.py", line 1111, in _sync_records cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.443989Z [info     ]     self._increment_stream_state(record, context=current_context) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.444068Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/tap_postgres/client.py", line 342, in _increment_stream_state cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.444670Z [info     ]     increment_state(           cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.444749Z [info     ]   File "/Users/florian.ernst/meltano_test/.meltano/extractors/tap-postgres--metabase-base/venv/lib/python3.10/site-packages/singer_sdk/helpers/_state.py", line 228, in increment_state cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.445412Z [info     ]     raise InvalidStreamSortException(msg) cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental
2023-11-27T16:07:15.445492Z [info     ] singer_sdk.exceptions.InvalidStreamSortException: Unsorted data detected in stream. Latest value '24676548053040' is smaller than previous max '24676550681472'. cmd_type=elb consumer=False name=tap-postgres--metabase-incremental producer=True stdio=stderr string_id=tap-postgres--metabase-incremental

Tap configuration:

version: 1
project_id: c35e4650-7eda-4f6e-aae8-f331ae5615a3
default_environment: default
environments:
- name: default
  extractors:
  - name: tap-postgres--metabase-base
    inherit_from: tap-postgres
    config:
      host: redacted.amazonaws.com
      port: 5432
      user: redacted
      database: metabase
    load_schema: meltano__metabase
  - name: tap-postgres--metabase-incremental
    inherit_from: tap-postgres--metabase-base
    select:
    - public-login_history.*
    - public-metabase_field.*
    - public-metabase_fieldvalues.*
    - public-metabase_table.*
    - public-query_execution.*
    - public-report_card.*
    - public-report_dashboardcard.*
    - public-revision.*
    - public-task_history.*
    - public-view_log.*

    metadata:
      "*":
        replication_method: LOG_BASED
        replication_key: _sdc_lsn

      public-query_execution:
        replication_method: INCREMENTAL
        replication_key: started_at

      public-task_history:
        replication_method: INCREMENTAL
        replication_key: started_at

      public-view_log:
        replication_method: INCREMENTAL
        replication_key: timestamp
elt:
  buffer_size: 52428800

Command is: meltano run tap-postgres--metabase-incremental target-snowflake

If I remove the "culprit" stream metabase-field, then the next one in alphabetical order will fail with the same error message.

Postgres version is 12.

It's super annoying as there's 0 reason for this to even happen 😅

@florian-ernst-alan
Copy link
Author

Actually, it looks like I had a config error: my password field was empty in the YAML (password:), but it still managed to connect and fetch some rows (no idea how 😅). Setting the password correctly fixed the issue.

@edgarrmondragon
Copy link
Member

my password field was empty in the YAML (password:), but it still managed to connect and fetch some rows (no idea how 😅)

weird indeed 😅. thanks for reporting your solution!

@florian-ernst-alan
Copy link
Author

Okay I still have the issue in the end 😭 I isolated the command (with all the necessary statefiles/property files) that crashes so I can easily reproduce. Is there something I should do to help debugging?

@florian-ernst-alan
Copy link
Author

Okay, I found the issue.

I used this to make all tables LOG_BASED:

    metadata:
      "*":
        replication_method: LOG_BASED

But then I got the following error:

ValueError: Could not detect replication key for 'public-login_history' stream(replication method=LOG_BASED)

To solve the issue, I manually added the replication key:

    metadata:
      "*":
        replication_method: LOG_BASED
        replication_key: _sdc_lsn

But then it threw the error I mentionned:

singer_sdk.exceptions.InvalidStreamSortException: Unsorted data detected in stream. Latest value '24677943473296' is smaller than previous max '24677943480672'.

I was able to fix the error by removing the "*" metadata, and instead specifying:

config:
  default_replication_method: LOG_BASED

I think there are a few errors that definitely should not happen here, but at least I've got my fix.

@visch
Copy link
Member

visch commented Nov 30, 2023

Nice @florian-ernst-alan , we should dissect this and make issues for the things that shouldn't have happened here. @sebastianswms can you take peek

@visch visch reopened this Nov 30, 2023
@florian-ernst-alan
Copy link
Author

florian-ernst-alan commented Nov 30, 2023

Aaaand actually

config:
  default_replication_method: LOG_BASED

this does not apply the LOG_BASED replication mode to my table (statefile is empty at the end, and subsequent run re-imports everything).

@florian-ernst-alan
Copy link
Author

florian-ernst-alan commented Nov 30, 2023

Okay, looks like Postgres does not guarantee WAL messages are ordered. Indeed, I manually queried my WAL logs and 4 of them are out-of-order, raising the error I see on Meltano.

Can we relax the is_sorted parameter? This would solve the issue, as having ordered WAL messages is not guaranteed.

@edgarrmondragon
Copy link
Member

edgarrmondragon commented Nov 30, 2023

@florian-ernst-alan FYI I opened #301 to patch this behavior.

UPDATE: Ready to be tested on main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants