Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36165][source-connector/mysql] Support capturing snapshot data with conditions #3776

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

uicosp
Copy link

@uicosp uicosp commented Dec 4, 2024

Hey, this is an implementation designed to capture snapshot data with filtering conditions.

For example, by specifying scan.snapshot.filters: db.user_table:id > 200;, we can synchronize only the user data where the id is greater than 200.

issue link: https://issues.apache.org/jira/browse/FLINK-36165

@github-actions github-actions bot added the docs Improvements or additions to documentation label Dec 4, 2024
@yuxiqian
Copy link
Contributor

yuxiqian commented Dec 6, 2024

Thanks for @uicosp's great work! It's indeed a long awaited feature.

Seems Debezium has a similar option called snapshot.select.statement.overrides, which allows users to project out unwanted columns and filter rows based on custom predicates.

As these options aren't available in incremental framework, it would be nice if we could support both row and column level pushdown with similar syntax, since they're both related to tweaking snapshot querying SQL. WDYT?

@Thorne-coder
Copy link
Contributor

Thanks for @uicosp's great work! It's indeed a long awaited feature.

Seems Debezium has a similar option called snapshot.select.statement.overrides, which allows users to project out unwanted columns and filter rows based on custom predicates.

As these options aren't available in incremental framework, it would be nice if we could support both row and column level pushdown with similar syntax, since they're both related to tweaking snapshot querying SQL. WDYT?

it is also support table api and datastream api with incremental framework

@Thorne-coder
Copy link
Contributor

Should we consider this pr with only snapshot startoption and back fill ?

@uicosp
Copy link
Author

uicosp commented Dec 12, 2024

Column pruning is certainly a feature worth considering. In theory, it would only require minor adjustments to the current design, such as implementing a syntax like scan.snapshot.filters: db.user:id>200:id,name,age.

Taking this a step further, since the pipeline config already includes a transform option, we could leverage it to automatically extract filter conditions and projection fields, dynamically generating scan.snapshot.filters and thereby simplifying the config.

That said, I’m currently focused on other tasks and don't have time to work on this. Once I have time, I’d be happy to revisit and enhance this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation mysql-cdc-connector mysql-pipeline-connector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants