-
Notifications
You must be signed in to change notification settings - Fork 701
add changefeed doc #21273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-8.1
Are you sure you want to change the base?
add changefeed doc #21273
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @shiyuhang0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces comprehensive documentation for the new TiDB Cloud Changefeed feature. It provides an overview of the changefeed's capabilities, how to manage it using the TiDB Cloud CLI, and detailed instructions for configuring data streaming to Apache Kafka, including various filter and Kafka-specific settings.
Highlights
- New Feature Documentation: Added a new overview document for the TiDB Cloud Changefeed feature, explaining its purpose (streaming data from TiDB Cloud to other services like Kafka), its current beta status, and general management operations.
- CLI Command Reference: Documented the TiDB Cloud CLI commands for interacting with changefeeds, covering listing, creating, pausing, resuming, editing, and deleting changefeeds.
- Apache Kafka Integration Guide: Provided a detailed guide on how to configure a changefeed to stream data to Apache Kafka, including network prerequisites, Kafka ACL authorization, and extensive configuration options for data formats (e.g., Canal-JSON, Avro, Open Protocol, Debezium), authentication, and topic partitioning strategies.
- Changefeed States and Limitations: Outlined the various states a changefeed can be in (e.g.,
CREATING
,RUNNING
,PAUSED
,WARNING
,RUNNING_FAILED
) and noted current limitations such as the number of changefeeds per cluster and table filter rules.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes add documentation for the Changefeed feature in TiDB Cloud, including an overview and a guide for sinking data to Apache Kafka. The documentation covers restrictions, prerequisites, configuration, and other details. The style guide was followed, with minor suggestions for clarity and grammar.
- `<cluster-id>`: the ID of the TiDB Cloud cluster that you want to create the changefeed for. | ||
- `<changefeed-name>`: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. | ||
- type: the type of the changefeed, which is `KAFKA` in this case. | ||
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a valid link here. Please provide a valid reference or remove the empty link.
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. | |
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See the documentation for more information about the configurations. |
- `<changefeed-name>`: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. | ||
- type: the type of the changefeed, which is `KAFKA` in this case. | ||
- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. | ||
- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See []() for more information about the configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a valid link here. Please provide a valid reference or remove the empty link.
- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See []() for more information about the configurations. | |
- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See the documentation for more information about the configurations. |
TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka. | ||
> **Note:** | ||
> | ||
> - Currently, you can manager changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the style guide 1, when addressing users, use the second person ("you"). Consider rephrasing to directly address the user.
Style Guide References
> - Currently, you can manager changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). | |
> - Currently, you can manage changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). |
Footnotes
> | ||
> TiDB Cloud currently only allows editing changefeeds in the paused status. | ||
|
||
To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit with the TiDB Cloud CLI command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the style guide 1, when addressing users, use the second person ("you"). Consider rephrasing to directly address the user.
Style Guide References
To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit with the TiDB Cloud CLI command: | |
To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit it with the TiDB Cloud CLI command: |
Footnotes
ticloud serverless changefeed template --explain | ||
``` | ||
|
||
The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `filter` configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configurations in the kafka
JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a kafka
configuration, not a filter
configuration.
The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `filter` configuration: | |
The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `kafka` configuration: |
- `tls_enable`: Whether to enable TLS for the connection. | ||
- `compression`: The compression type for messages, support `NONE`, `GZIP`, `LZ4`, `SNAPPY`, and `ZSTD`. | ||
|
||
"DISABLE", "SASL_PLAIN", "SASL_SCRAM_SHA_256", "SASL_SCRAM_SHA_512" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
- **Distribute changelogs by timestamp to Kafka partition** | ||
|
||
If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`.. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "is send" should be "is sent".
If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`.. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. | |
If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelog with the same column values is sent to the same partition. |
Co-authored-by: Grace Cai <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Co-authored-by: Grace Cai <[email protected]>
@shiyuhang0: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
||
TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. | ||
TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Dedicated. Currently, TiDB Cloud Dedicated supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Dedicated. Currently, TiDB Cloud Dedicated supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. | |
TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Dedicated. Currently, TiDB Cloud Dedicated supports streaming data to Apache Kafka, Apache Pulsar, MySQL, TiDB Cloud and cloud storage. |
@@ -5,18 +5,22 @@ summary: Learn about data streaming concepts for TiDB Cloud. | |||
|
|||
# Data Streaming | |||
|
|||
TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems like Kafka, MySQL, and object storage. | |||
TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems such as Apache Kafka, MySQL, and object storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems such as Apache Kafka, MySQL, and object storage. | |
TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems such as Apache Kafka, Apache Pulsar, MySQL, and object storage. |
|
||
This document describes how to stream data from TiDB Cloud to MySQL using the **Sink to MySQL** changefeed. | ||
This document describes how to stream data from TiDB Cloud Dedicated to MySQL using the **Sink to MySQL** changefeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document describes how to stream data from TiDB Cloud Dedicated to MySQL using the **Sink to MySQL** changefeed. | |
This document describes how to stream data from TiDB Cloud Dedicated to MySQL using the **Stream Data to MySQL** changefeed. |
|
||
To create a changefeed, refer to the following document: | ||
|
||
- [Sink to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Sink to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) | |
- [Stream Data from TiDB Cloud Serverless to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) |
|
||
> **Note:** | ||
> | ||
> TiDB Cloud currently only allows editing changefeeds in the paused status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> TiDB Cloud currently only allows editing changefeeds in the paused status. | |
> TiDB Cloud currently only allows editing changefeeds in the `PAUSED` status. |
|
||
> **Note:** | ||
> | ||
> - For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Stream Data from TiDB Cloud Dedicated to Apache Kafka](/tidb-cloud/changefeed-sink-to-apache-kafka.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> - For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Stream Data from TiDB Cloud Dedicated to Apache Kafka](/tidb-cloud/changefeed-sink-to-apache-kafka.md). | |
> For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Stream Data from TiDB Cloud Dedicated to Apache Kafka](/tidb-cloud/changefeed-sink-to-apache-kafka.md). |
|
||
### Network | ||
|
||
Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB Serverless clusters can only connect to Apache Kafka through public IP addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB Serverless clusters can only connect to Apache Kafka through public IP addresses. | |
Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB Cloud Serverless clusters can only connect to Apache Kafka through public IP addresses. |
|
||
### Filter configurations | ||
|
||
You can specify `--filter <filter-json>` to filter tables and events that you want to replicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can specify `--filter <filter-json>` to filter tables and events that you want to replicate. | |
You can specify `--filter <filter-json>` to filter tables and events that you want to replicate. |
> | ||
> TiDB Cloud currently only allows editing changefeeds in the paused status. | ||
|
||
To edit a changefeed to kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To edit a changefeed to kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: | |
To edit a changefeed to Kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: |
- `IGNORE_NOT_SUPPORT_TABLE`: skip tables that do not support replication (for example, tables without primary or unique keys). | ||
- `FORCE_SYNC`: force replication of all tables regardless of support status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `IGNORE_NOT_SUPPORT_TABLE`: skip tables that do not support replication (for example, tables without primary or unique keys). | |
- `FORCE_SYNC`: force replication of all tables regardless of support status. | |
- `"IGNORE_NOT_SUPPORT_TABLE"`: skips tables that do not support replication (for example, tables without primary or unique keys). | |
- `"FORCE_SYNC"`: forces replication of all tables regardless of support status. |
|
||
- `topic_partition_config.partition_num`: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the total number of Kafka brokers]`. | ||
|
||
- `topic_partition_config.partition_dispatchers`: controls which partition a Kafka message will be sent to. Support values: `INDEX_VALUE`, `TABLE`, `TS` and `COLUMN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `topic_partition_config.partition_dispatchers`: controls which partition a Kafka message will be sent to. Support values: `INDEX_VALUE`, `TABLE`, `TS` and `COLUMN`. | |
- `topic_partition_config.partition_dispatchers.partition_type`: controls which partition a Kafka message will be sent to. Support values: `INDEX_VALUE`, `TABLE`, `TS` and `COLUMN`. |
- `matcher`: specifies which tables the column selector applies to. For tables that do not match any rule, all columns are sent. | ||
- `columns`: specifies which columns of the matched tables will be sent to the downstream. | ||
|
||
For more information about the matching rules, see [Column selectors](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#column-selectors). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more information about the matching rules, see [Column selectors](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#column-selectors). | |
For more information about the matching rules, see [Column selectors](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#column-selectors). |
[LGTM Timeline notifier]Timeline:
|
|
||
For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). | ||
|
||
- `column_selectors`: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `column_selectors`: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. | |
- `column_selectors`: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?