Skip to content

Refine tiflash FAQ and configuration docs #20252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Apr 23, 2025

First-time contributors' checklist

What is changed, added or deleted? (Required)

  • tiflash/create-tiflash-replicas.md
    • Added instructions for adjusting the remove-peer restriction when rebalancing Regions from old to new TiFlash nodes.
  • tiflash/tiflash-configuration.md
    • Removed the PD scheduling parameters section.
    • Clarified that floating-point numbers can be used for profiles.default.max_memory_usage_for_all_queries since v6.6.0.
    • Removed the multi-disk deployment eariler than v4.0.9 because that version reaches EOL
  • tiflash/troubleshoot-tiflash.md
    • Added a check for CPU support for vector extension instruction sets (AVX2 for AMD64, ARMv8 for ARM64).
    • Reorganized the 'TiFlash replica is always unavailable' section, providing a more structured approach to troubleshooting.
    • Removed the 'Data replication gets stuck' section and integrated its content into the 'TiFlash replica is always unavailable' section.

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions (in Chinese).

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.4 (TiDB 8.4 versions)
  • v8.3 (TiDB 8.3 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Signed-off-by: JaySon-Huang <[email protected]>
Copy link

ti-chi-bot bot commented Apr 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oreoxmt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 23, 2025
Comment on lines -11 to -27
## PD 调度参数

可通过 [pd-ctl](/pd-control.md) 调整参数。如果你使用 TiUP 部署,可以用 `tiup ctl:v<CLUSTER_VERSION> pd` 代替 `pd-ctl -u <pd_ip:pd_port>` 命令。

- [`replica-schedule-limit`](/pd-configuration-file.md#replica-schedule-limit):用来控制 replica 相关 operator 的产生速度(涉及到下线、补副本的操作都与该参数有关)

> **注意:**
>
> 不要超过 `region-schedule-limit`,否则会影响正常 TiKV 之间的 Region 调度。

- `store-balance-rate`:用于限制每个 TiKV store 或 TiFlash store 的 Region 调度速度。注意这个参数只对新加入集群的 store 有效,如果想立刻生效请用下面的方式。

> **注意:**
>
> 4.0.2 版本之后(包括 4.0.2 版本)废弃了 `store-balance-rate` 参数且 `store limit` 命令有部分变化。该命令变化的细节请参考 [store-limit 文档](/configure-store-limit.md)。

- 使用 `pd-ctl -u <pd_ip:pd_port> store limit <store_id> <value>` 命令单独设置某个 store 的 Region 调度速度。(`store_id` 可通过 `pd-ctl -u <pd_ip:pd_port> store` 命令获得)如果没有单独设置,则继承 `store-balance-rate` 的设置。你也可以使用 `pd-ctl -u <pd_ip:pd_port> store limit` 命令查看当前设置值。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These content are somehow outdated. And how to speed up tiflash replication are duplicated with that described in create-tiflash-replicas.md#speed-up-tiflash-replication.


- 使用 `pd-ctl -u <pd_ip:pd_port> store limit <store_id> <value>` 命令单独设置某个 store 的 Region 调度速度。(`store_id` 可通过 `pd-ctl -u <pd_ip:pd_port> store` 命令获得)如果没有单独设置,则继承 `store-balance-rate` 的设置。你也可以使用 `pd-ctl -u <pd_ip:pd_port> store limit` 命令查看当前设置值。

- [`replication.location-labels`](/pd-configuration-file.md#location-labels):用来表示 TiKV 实例的拓扑关系,其中 key 的顺序代表了不同标签的层次关系。在 TiFlash 开启的情况下需要使用 [`pd-ctl config placement-rules`](/pd-control.md#config-show--set-option-value--placement-rules) 来设置默认值,详细可参考 [geo-distributed-deployment-topology](/geo-distributed-deployment-topology.md)。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated with the later section "通过拓扑 label 进行副本调度"

Comment on lines -600 to -610
#### TiDB 集群版本低于 v4.0.9

TiDB v4.0.9 之前的版本中,TiFlash 只支持将存储引擎中的主要数据分布在多盘上。通过 `path`(TiUP 中为 `data_dir`)和 `path_realtime_mode` 这两个参数配置多盘部署。

多个数据存储目录在 `path` 中以英文逗号分隔,比如 `/nvme_ssd_a/data/tiflash,/sata_ssd_b/data/tiflash,/sata_ssd_c/data/tiflash`。如果你的节点上有多块硬盘,推荐把性能最好的硬盘目录放在最前面,以更好地利用节点性能。

如果节点上有多块相同规格的硬盘,可以把 `path_realtime_mode` 参数留空(或者把该值明确地设为 `false`)。这表示数据会在所有的存储目录之间进行均衡。但由于最新的数据仍然只会被写入到第一个目录,因此该目录所在的硬盘会较其他硬盘繁忙。

如果节点上有多块规格不一致的硬盘,推荐把 `path_relatime_mode` 参数设置为 `true`,并且把性能最好的硬盘目录放在 `path` 参数内的最前面。这表示第一个目录只会存放最新数据,较旧的数据会在其他目录之间进行均衡。注意此情况下,第一个目录规划的容量大小需要占总容量的约 10%。

#### TiDB 集群版本为 v4.0.9 及以上
Copy link
Contributor Author

@JaySon-Huang JaySon-Huang Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v4.0.x are EOL at 2024-04-02
And user are less likely to deploy a tidb cluster lower than v4.0.9 when checking the latest version of docs.

Signed-off-by: JaySon-Huang <[email protected]>
Copy link

ti-chi-bot bot commented Apr 23, 2025

@Lloyd-Pottiger: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: JaySon-Huang <[email protected]>
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

ti-chi-bot bot commented Apr 23, 2025

@CalvinNeo: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 26, 2025
@JaySon-Huang JaySon-Huang changed the title Refine some tiflash docs Refine tiflash FAQ and configuration docs Apr 26, 2025
Comment on lines 248 to 262
## TiFlash 数据同步卡住

如果 TiFlash 数据一开始可以正常同步,过一段时间后全部或者部分数据无法继续同步,你可以通过以下步骤确认或解决问题:

1. 检查磁盘空间。

检查磁盘使用空间比例是否高于 `low-space-ratio` 的值(默认值 0.8,即当节点的空间占用比例超过 80% 时,为避免磁盘空间被耗尽,PD 会尽可能避免往该节点迁移数据)。

- 如果磁盘使用率大于等于 `low-space-ratio`,说明磁盘空间不足。此时,请删除不必要的文件,如 `${data}/flash/` 目录下的 `space_placeholder_file` 文件(必要时可在删除文件后将 `reserve-space` 设置为 0MB)。
- 如果磁盘使用率小于 `low-space-ratio`,说明磁盘空间正常,进入下一步。

2. 检查是否有 `down peer` (`down peer` 没有清理干净可能会导致同步卡住)。

- 执行 `pd-ctl region check-down-peer` 命令检查是否有 `down peer`。
- 如果存在 `down peer`,执行 `pd-ctl operator add remove-peer <region-id> <tiflash-store-id>` 命令将其清除。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is merged to the "TiFlash 副本始终处于不可用状态"

@@ -34,46 +34,6 @@ aliases: ['/docs-cn/dev/tiflash/troubleshoot-tiflash/','/docs-cn/dev/tiflash/tif

如果遇到上述方法无法解决的问题,可以打包 TiFlash 的 log 文件夹,并在 [AskTUG](http://asktug.com) 社区中提问。

## TiFlash 副本始终处于不可用状态
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is moved to before the "TiFlash 数据不同步" part

Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

ti-chi-bot bot commented Apr 28, 2025

@CalvinNeo: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants