TiFlash: add missing TiFlash Grafana metrics#21427
TiFlash: add missing TiFlash Grafana metrics#21427hfxsd wants to merge 1 commit intopingcap:masterfrom
Conversation
Expand TiFlash monitoring doc by adding many new metrics and sections across the Grafana dashboards. Clarifies that TiFlash proxy/raft metrics overlap heavily with TiKV. Added/renamed entries (e.g. Read Index OPS -> Raft Read Index OPS, Wait Index Duration -> Raft Wait Index Duration) and introduced Write & Delta Management Total. New sections include Imbalance read/write, Memory trace, Storage Read Pool & Data Sharing, PageStorage, Rate Limiter, Raft Snapshot / IngestSST, Disaggregated-Write/Compute, S3, Pipeline Model, TiFlash Resource Control, Status Server, Vector Search, and extensive expansions to TiFlash-Proxy-Summary and TiFlash-Proxy-Details (cluster, errors, server, thread CPU, PD, raft IO/process/message/propose/admin, unified read pool, storage, scheduler, snapshot, task, threads, RocksDB, encryption, etc.). These additions improve coverage and clarity for TiFlash cluster monitoring.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@hfxsd: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/assign |
| - Region:每个 TiFlash 实例持有的 Region 数量。 | ||
| - IO Throughput:每个 TiFlash 实例的 I/O 吞吐量。 | ||
| - Threads CPU:各线程 CPU 使用情况。 | ||
| - SST Import Service:SST 导入服务相关指标。 | ||
| - SST Apply:SST 应用相关指标。 | ||
| - Region Task:Region 任务统计。 | ||
| - Region Worker:Region worker 线程统计。 | ||
| - Raft Store:Raft Store 相关状态与统计。 | ||
| - Apply Worker:Apply worker 相关统计。 | ||
| - Storage Background (Small Tasks):存储层小型后台任务统计。 | ||
| - Storage Background (Large Tasks):存储层大型后台任务统计。 | ||
| - Manual Compaction:手动压缩任务统计。 | ||
| - GRPC Async Server:gRPC 异步服务端相关统计。 | ||
| - GRPC Async Client:gRPC 异步客户端相关统计。 | ||
| - FAP builder:FAP 构建相关统计。 | ||
| - Snapshot Sender:Snapshot 发送相关统计。 | ||
| - Segment Scheduler:Segment 调度器相关统计。 | ||
| - Local Index Pool:本地索引池相关统计。 | ||
| - Segment Reader:Segment Reader 相关统计。 | ||
| - Threads:线程数统计。 | ||
| - Threads state:线程状态分布。 | ||
| - Threads IO:线程 I/O 相关统计。 | ||
| - Thread Voluntary Context Switches:线程自愿上下文切换次数。 | ||
| - Thread Nonvoluntary Context Switches:线程非自愿上下文切换次数。 |
There was a problem hiding this comment.
| - Region:每个 TiFlash 实例持有的 Region 数量。 | |
| - IO Throughput:每个 TiFlash 实例的 I/O 吞吐量。 | |
| - Threads CPU:各线程 CPU 使用情况。 | |
| - SST Import Service:SST 导入服务相关指标。 | |
| - SST Apply:SST 应用相关指标。 | |
| - Region Task:Region 任务统计。 | |
| - Region Worker:Region worker 线程统计。 | |
| - Raft Store:Raft Store 相关状态与统计。 | |
| - Apply Worker:Apply worker 相关统计。 | |
| - Storage Background (Small Tasks):存储层小型后台任务统计。 | |
| - Storage Background (Large Tasks):存储层大型后台任务统计。 | |
| - Manual Compaction:手动压缩任务统计。 | |
| - GRPC Async Server:gRPC 异步服务端相关统计。 | |
| - GRPC Async Client:gRPC 异步客户端相关统计。 | |
| - FAP builder:FAP 构建相关统计。 | |
| - Snapshot Sender:Snapshot 发送相关统计。 | |
| - Segment Scheduler:Segment 调度器相关统计。 | |
| - Local Index Pool:本地索引池相关统计。 | |
| - Segment Reader:Segment Reader 相关统计。 | |
| - Threads:线程数统计。 | |
| - Threads state:线程状态分布。 | |
| - Threads IO:线程 I/O 相关统计。 | |
| - Thread Voluntary Context Switches:线程自愿上下文切换次数。 | |
| - Thread Nonvoluntary Context Switches:线程非自愿上下文切换次数。 |
| - Internal Tasks Duration:所有 TiFlash 实例进行内部数据整理任务消耗的时间。 | ||
| - Page GC Tasks OPM:所有 TiFlash 实例每分钟进行 Delta 部分数据整理任务的次数。 | ||
| - Page GC Tasks Duration:所有 TiFlash 实例进行 Delta 部分数据整理任务消耗的时间分布。 | ||
| - FSync Status:fsync 状态统计。 |
There was a problem hiding this comment.
| - FSync Status:fsync 状态统计。 |
Expand TiFlash monitoring doc by adding many new metrics and sections across the Grafana dashboards. Clarifies that TiFlash proxy/raft metrics overlap heavily with TiKV. Added/renamed entries (e.g. Read Index OPS -> Raft Read Index OPS, Wait Index Duration -> Raft Wait Index Duration) and introduced Write & Delta Management Total. New sections include Imbalance read/write, Memory trace, Storage Read Pool & Data Sharing, PageStorage, Rate Limiter, Raft Snapshot / IngestSST, Disaggregated-Write/Compute, S3, Pipeline Model, TiFlash Resource Control, Status Server, Vector Search, and extensive expansions to TiFlash-Proxy-Summary and TiFlash-Proxy-Details (cluster, errors, server, thread CPU, PD, raft IO/process/message/propose/admin, unified read pool, storage, scheduler, snapshot, task, threads, RocksDB, encryption, etc.). These additions improve coverage and clarity for TiFlash cluster monitoring.
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions (in Chinese).
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?