-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stream): normalize unmatched updates #20350
Conversation
This reminds me (again) of the discussion in (#12539. In my opinion, if maintaining the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix LGTM, thanks!
Generally agree. We may lose some traceability, but we will gain increased performance and code simplicity. We can now encode ops in a bitmap. |
71d02b8
to
043f697
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Closes: #20342
The bug is that stream_key and storage_key can mismatch for an MV, and can result in UpdateDelete + UpdateInsert pairs being split up.
Details:
This bug is sort of worked around with https://github.com/risingwavelabs/risingwave/pull/20176/files. Since dispatcher will rewrite U+/U- going to different vnodes into +/-, and dist_key is set to order key. But this PR defensively handles it in backfill as well.
For unmatched update delete, i.e. storage_key <= current_pos, we can rewrite it to a normal delete. Because in the next snapshot read, we should read the new update insert.
For unmatched update insert, we have not backfilled the corresponding update delete key from storage yet. So there won’t be a conflict to just insert it. We can rewrite it to a normal insert.
We also reconstruct the ops lazily, since in most cases it should not be necessary, only when there' a hanging update.
Checklist
Documentation
Release note