Skip to content

Fix all the /media links to relative, since it tends to break pandoc conversion to pdf #20839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion analyze-slow-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The procedures above are explained in the following sections.

## Identify the performance bottleneck of the query

First, you need to have a general understanding of the query process. The key stages of the query execution process in TiDB are illustrated in [TiDB performance map](/media/performance-map.png).
First, you need to have a general understanding of the query process. The key stages of the query execution process in TiDB are illustrated in [TiDB performance map](./media/performance-map.png).

You can get the duration information using the following methods:

Expand Down
28 changes: 14 additions & 14 deletions benchmark/online-workloads-and-add-index-operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ sysbench $testname \
| 32 | 54 | 229.2 | 4583 |
| 48 | 57 | 230.1 | 4601 |

![add-index-load-1-b32](/media/add-index-load-1-b32.png)
![add-index-load-1-b32](./media/add-index-load-1-b32.png)

#### `tidb_ddl_reorg_batch_size = 64`

Expand All @@ -126,7 +126,7 @@ sysbench $testname \
| 32 | 42 | 185.2 | 3715 |
| 48 | 45 | 189.2 | 3794 |

![add-index-load-1-b64](/media/add-index-load-1-b64.png)
![add-index-load-1-b64](./media/add-index-load-1-b64.png)

#### `tidb_ddl_reorg_batch_size = 128`

Expand All @@ -140,7 +140,7 @@ sysbench $testname \
| 32 | 35 | 130.8 | 2629 |
| 48 | 35 | 120.5 | 2425 |

![add-index-load-1-b128](/media/add-index-load-1-b128.png)
![add-index-load-1-b128](./media/add-index-load-1-b128.png)

#### `tidb_ddl_reorg_batch_size = 256`

Expand All @@ -154,7 +154,7 @@ sysbench $testname \
| 32 | 36 | 113.5 | 2268 |
| 48 | 33 | 86.2 | 1715 |

![add-index-load-1-b256](/media/add-index-load-1-b256.png)
![add-index-load-1-b256](./media/add-index-load-1-b256.png)

#### `tidb_ddl_reorg_batch_size = 512`

Expand All @@ -168,7 +168,7 @@ sysbench $testname \
| 32 | 33 | 72.5 | 1503 |
| 48 | 33 | 54.2 | 1318 |

![add-index-load-1-b512](/media/add-index-load-1-b512.png)
![add-index-load-1-b512](./media/add-index-load-1-b512.png)

#### `tidb_ddl_reorg_batch_size = 1024`

Expand All @@ -182,7 +182,7 @@ sysbench $testname \
| 32 | 42 | 93.2 | 1835 |
| 48 | 51 | 115.7 | 2261 |

![add-index-load-1-b1024](/media/add-index-load-1-b1024.png)
![add-index-load-1-b1024](./media/add-index-load-1-b1024.png)

#### `tidb_ddl_reorg_batch_size = 2048`

Expand All @@ -196,7 +196,7 @@ sysbench $testname \
| 32 | 1130 | 26.69 | 547 |
| 48 | 893 | 27.5 | 552 |

![add-index-load-1-b2048](/media/add-index-load-1-b2048.png)
![add-index-load-1-b2048](./media/add-index-load-1-b2048.png)

#### `tidb_ddl_reorg_batch_size = 4096`

Expand All @@ -210,7 +210,7 @@ sysbench $testname \
| 32 | 942 | 114 | 2267 |
| 48 | 187 | 54.2 | 1416 |

![add-index-load-1-b4096](/media/add-index-load-1-b4096.png)
![add-index-load-1-b4096](./media/add-index-load-1-b4096.png)

### Test conclusion

Expand Down Expand Up @@ -247,7 +247,7 @@ When you perform frequent write operations (this test involves `UPDATE`, `INSERT
| 32 | 46 | 533.4 | 8103 |
| 48 | 46 | 532.2 | 8074 |

![add-index-load-2-b32](/media/add-index-load-2-b32.png)
![add-index-load-2-b32](./media/add-index-load-2-b32.png)

#### `tidb_ddl_reorg_batch_size = 1024`

Expand All @@ -261,7 +261,7 @@ When you perform frequent write operations (this test involves `UPDATE`, `INSERT
| 32 | 31 | 467.5 | 7516 |
| 48 | 30 | 562.1 | 7442 |

![add-index-load-2-b1024](/media/add-index-load-2-b1024.png)
![add-index-load-2-b1024](./media/add-index-load-2-b1024.png)

#### `tidb_ddl_reorg_batch_size = 4096`

Expand All @@ -275,7 +275,7 @@ When you perform frequent write operations (this test involves `UPDATE`, `INSERT
| 32 | 30 | 441.9 | 7057 |
| 48 | 30 | 440.1 | 7004 |

![add-index-load-2-b4096](/media/add-index-load-2-b4096.png)
![add-index-load-2-b4096](./media/add-index-load-2-b4096.png)

### Test conclusion

Expand Down Expand Up @@ -309,7 +309,7 @@ When you only perform query operations to the target column of the `ADD INDEX` s
| 32 | 42 | 343.1 | 6695 |
| 48 | 42 | 333.4 | 6454 |

![add-index-load-3-b32](/media/add-index-load-3-b32.png)
![add-index-load-3-b32](./media/add-index-load-3-b32.png)

#### `tidb_ddl_reorg_batch_size = 1024`

Expand All @@ -323,7 +323,7 @@ When you only perform query operations to the target column of the `ADD INDEX` s
| 32 | 32 | 300.6 | 6017 |
| 48 | 31 | 279.5 | 5612 |

![add-index-load-3-b1024](/media/add-index-load-3-b1024.png)
![add-index-load-3-b1024](./media/add-index-load-3-b1024.png)

#### `tidb_ddl_reorg_batch_size = 4096`

Expand All @@ -337,7 +337,7 @@ When you only perform query operations to the target column of the `ADD INDEX` s
| 32 | 32 | 220.2 | 4924 |
| 48 | 33 | 214.8 | 4544 |

![add-index-load-3-b4096](/media/add-index-load-3-b4096.png)
![add-index-load-3-b4096](./media/add-index-load-3-b4096.png)

### Test conclusion

Expand Down
4 changes: 2 additions & 2 deletions best-practices-for-security-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ It is recommended to immediately change the Grafana password to a strong one dur

- Upon first login to Grafana, follow the prompts to change the password.

![Grafana Password Reset Guide](/media/grafana-password-reset1.png)
![Grafana Password Reset Guide](./media/grafana-password-reset1.png)

- Access the Grafana personal configuration center to change the password.

![Grafana Password Reset Guide](/media/grafana-password-reset2.png)
![Grafana Password Reset Guide](./media/grafana-password-reset2.png)

## Enhance TiDB Dashboard security

Expand Down
8 changes: 4 additions & 4 deletions best-practices-on-public-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,8 @@ In a TiDB cluster, a single active Placement Driver (PD) server is used to handl

The following diagrams show the symptoms of a large-scale TiDB cluster consisting of three PD servers, each equipped with 56 CPUs. From these diagrams, it is observed that when the query per second (QPS) exceeds 1 million and the TSO (Timestamp Oracle) requests per second exceed 162,000, the CPU utilization reaches approximately 4,600%. This high CPU utilization indicates that the PD leader is experiencing a significant load and is running out of available CPU resources.

![pd-server-cpu](/media/performance/public-cloud-best-practice/baseline_cpu.png)
![pd-server-metrics](/media/performance/public-cloud-best-practice/baseline_metrics.png)
![pd-server-cpu](./media/performance/public-cloud-best-practice/baseline_cpu.png)
![pd-server-metrics](./media/performance/public-cloud-best-practice/baseline_metrics.png)

### Tune PD performance

Expand Down Expand Up @@ -210,5 +210,5 @@ After the tuning, the following effects can be observed:

These improvements indicate that the tuning adjustments have successfully reduced the CPU utilization of the PD server while maintaining stable TSO handling performance.

![pd-server-cpu](/media/performance/public-cloud-best-practice/after_tuning_cpu.png)
![pd-server-metrics](/media/performance/public-cloud-best-practice/after_tuning_metrics.png)
![pd-server-cpu](./media/performance/public-cloud-best-practice/after_tuning_cpu.png)
![pd-server-metrics](./media/performance/public-cloud-best-practice/after_tuning_metrics.png)
32 changes: 16 additions & 16 deletions best-practices/grafana-monitor-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ When you [deploy a TiDB cluster using TiUP](/production-deployment-using-tiup.md

[Prometheus](https://prometheus.io/) is a time series database with a multi-dimensional data model and a flexible query language. [Grafana](https://grafana.com/) is an open source monitoring system for analyzing and visualizing metrics.

![The monitoring architecture in the TiDB cluster](/media/prometheus-in-tidb.png)
![The monitoring architecture in the TiDB cluster](./media/prometheus-in-tidb.png)

For TiDB 2.1.3 or later versions, TiDB monitoring supports the pull method. It is a good adjustment with the following benefits:

Expand Down Expand Up @@ -51,7 +51,7 @@ tidb_executor_statement_total{type="Use"} 466016

The data above is stored in Prometheus and displayed on Grafana. Right-click the panel and then click the **Edit** button (or directly press the <kbd>E</kbd> key) shown in the following figure:

![The Edit entry for the Metrics tab](/media/best-practices/metric-board-edit-entry.png)
![The Edit entry for the Metrics tab](./media/best-practices/metric-board-edit-entry.png)

After clicking the **Edit** button, you can see the query expression with the `tidb_executor_statement_total` metric name on the Metrics tab. The meanings of some items on the panel are as follows:

Expand All @@ -63,7 +63,7 @@ After clicking the **Edit** button, you can see the query expression with the `t

The query expression on the **Metrics** tab is as follows:

![The query expression on the Metrics tab](/media/best-practices/metric-board-expression.jpeg)
![The query expression on the Metrics tab](./media/best-practices/metric-board-expression.jpeg)

Prometheus supports many query expressions and functions. For more details, refer to [Prometheus official website](https://prometheus.io/docs/prometheus/latest/querying).

Expand All @@ -75,11 +75,11 @@ This section introduces seven tips for efficiently using Grafana to monitor and

In the example shown in the [source and display of monitoring data](#source-and-display-of-monitoring-data) section, the data is grouped by type. If you want to know whether you can group by other dimensions and quickly check which dimensions are available, you can use the following method: **Only keep the metric name on the query expression, no calculation, and leave the `Legend format` field blank**. In this way, the original metrics are displayed. For example, the following figure shows that there are three dimensions (`instance`, `job` and `type`):

![Edit query expression and check all dimensions](/media/best-practices/edit-expression-check-dimensions.jpg)
![Edit query expression and check all dimensions](./media/best-practices/edit-expression-check-dimensions.jpg)

Then you can modify the query expression by adding the `instance` dimension after `type`, and adding `{{instance}}` to the `Legend format` field. In this way, you can check the QPS of different types of SQL statements that are executed on each TiDB server:

![Add an instance dimension to the query expression](/media/best-practices/add-instance-dimension.jpeg)
![Add an instance dimension to the query expression](./media/best-practices/add-instance-dimension.jpeg)

### Tip 2: Switch the scale of the Y-axis

Expand All @@ -89,11 +89,11 @@ Of course, a linear scale is not suitable for all situations. For example, if yo

The Y-axis uses a binary logarithmic scale by default:

![The Y-axis uses a binary logarithmic scale](/media/best-practices/default-axes-scale.jpg)
![The Y-axis uses a binary logarithmic scale](./media/best-practices/default-axes-scale.jpg)

Switch the Y-axis to a linear scale:

![Switch to a linear scale](/media/best-practices/axes-scale-linear.jpg)
![Switch to a linear scale](./media/best-practices/axes-scale-linear.jpg)

> **Tip:**
>
Expand All @@ -105,45 +105,45 @@ You might still cannot see the trend after switching to the linear scale. For ex

The baseline defaults to `0`:

![Baseline defaults to 0](/media/best-practices/default-y-min.jpeg)
![Baseline defaults to 0](./media/best-practices/default-y-min.jpeg)

Change the baseline to `auto`:

![Change the baseline to auto](/media/best-practices/y-min-auto.jpg)
![Change the baseline to auto](./media/best-practices/y-min-auto.jpg)

### Tip 4: Use Shared crosshair or Tooltip

In the **Settings** panel, there is a **Graph Tooltip** panel option which defaults to **Default**.

![Graphic presentation tools](/media/best-practices/graph-tooltip.jpeg)
![Graphic presentation tools](./media/best-practices/graph-tooltip.jpeg)

You can use **Shared crosshair** and **Shared Tooltip** respectively to test the effect as shown in the following figures. Then, the scales are displayed in linkage, which is convenient to confirm the correlation of two metrics when diagnosing problems.

Set the graphic presentation tool to **Shared crosshair**:

![Set the graphical presentation tool to Shared crosshair](/media/best-practices/graph-tooltip-shared-crosshair.jpeg)
![Set the graphical presentation tool to Shared crosshair](./media/best-practices/graph-tooltip-shared-crosshair.jpeg)

Set the graphical presentation tool to **Shared Tooltip**:

![Set the graphic presentation tool to Shared Tooltip](/media/best-practices/graph-tooltip-shared-tooltip.jpg)
![Set the graphic presentation tool to Shared Tooltip](./media/best-practices/graph-tooltip-shared-tooltip.jpg)

### Tip 5: Enter `IP address:port number` to check the metrics in history

PD's dashboard only shows the metrics of the current leader. If you want to check the status of a PD leader in history and it no longer exists in the drop-down list of the `instance` field, you can manually enter `IP address:2379` to check the data of the leader.

![Check the metrics in history](/media/best-practices/manually-input-check-metric.jpeg)
![Check the metrics in history](./media/best-practices/manually-input-check-metric.jpeg)

### Tip 6: Use the `Avg` function

Generally, only `Max` and `Current` functions are available in the legend by default. When the metrics fluctuate greatly, you can add other summary functions such as the `Avg` function to the legend to check the overall trend for the duration of time.

Add summary functions such as the `Avg` function:

![Add summary functions such as Avg](/media/best-practices/add-avg-function.jpeg)
![Add summary functions such as Avg](./media/best-practices/add-avg-function.jpeg)

Then check the overall trend:

![Add Avg function to check the overall trend](/media/best-practices/add-avg-function-check-trend.jpg)
![Add Avg function to check the overall trend](./media/best-practices/add-avg-function-check-trend.jpg)

### Tip 7: Use the API of Prometheus to obtain the result of query expressions

Expand All @@ -155,7 +155,7 @@ Grafana obtains data through the API of Prometheus and you can use this API to o

The API of Prometheus is shown as follows:

![The API of Prometheus](/media/best-practices/prometheus-api-interface.jpg)
![The API of Prometheus](./media/best-practices/prometheus-api-interface.jpg)

{{< copyable "shell-regular" >}}

Expand Down
2 changes: 1 addition & 1 deletion best-practices/haproxy-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ aliases: ['/docs/dev/best-practices/haproxy-best-practices/','/docs/dev/referenc

This document describes best practices for configuration and usage of [HAProxy](https://github.com/haproxy/haproxy) in TiDB. HAProxy provides load balancing for TCP-based applications. From TiDB clients, you can manipulate data just by connecting to the floating virtual IP address provided by HAProxy, which helps to achieve load balance in the TiDB server layer.

![HAProxy Best Practices in TiDB](/media/haproxy.jpg)
![HAProxy Best Practices in TiDB](./media/haproxy.jpg)

> **Note:**
>
Expand Down
22 changes: 11 additions & 11 deletions best-practices/high-concurrency-best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ To address the above challenges, it is necessary to start with the data segmenta

TiDB splits data into Regions, each representing a range of data with a size limit of 96M by default. Each Region has multiple replicas, and each group of replicas is called a Raft Group. In a Raft Group, the Region Leader executes the read and write tasks (TiDB supports [Follower-Read](/follower-read.md)) within the data range. The Region Leader is automatically scheduled by the Placement Driver (PD) component to different physical nodes evenly to distribute the read and write pressure.

![TiDB Data Overview](/media/best-practices/tidb-data-overview.png)
![TiDB Data Overview](./media/best-practices/tidb-data-overview.png)

In theory, if an application has no write hotspot, TiDB, by the virtue of its architecture, can not only linearly scale its read and write capacities, but also make full use of the distributed resources. From this point of view, TiDB is especially suitable for the high-concurrent and write-intensive scenario.

Expand Down Expand Up @@ -94,19 +94,19 @@ In theory, the above operation seems to comply with the TiDB best practices, and

For the cluster topology, 2 TiDB nodes, 3 PD nodes and 6 TiKV nodes are deployed. Ignore the QPS performance, because this test is to clarify the principle rather than for benchmark.

![QPS1](/media/best-practices/QPS1.png)
![QPS1](./media/best-practices/QPS1.png)

The client starts "intensive" write requests in a short time, which is 3K QPS received by TiDB. In theory, the load pressure should be evenly distributed to 6 TiKV nodes. However, from the CPU usage of each TiKV node, the load distribution is uneven. The `tikv-3` node is the write hotspot.

![QPS2](/media/best-practices/QPS2.png)
![QPS2](./media/best-practices/QPS2.png)

![QPS3](/media/best-practices/QPS3.png)
![QPS3](./media/best-practices/QPS3.png)

[Raft store CPU](/grafana-tikv-dashboard.md) is the CPU usage rate for the `raftstore` thread, usually representing the write load. In this scenario, `tikv-3` is the Leader of this Raft Group; `tikv-0` and `tikv-1` are the followers. The loads of other nodes are almost empty.

The monitoring metrics of PD also confirms that hotspot has been caused.

![QPS4](/media/best-practices/QPS4.png)
![QPS4](./media/best-practices/QPS4.png)

## Hotspot causes

Expand All @@ -118,13 +118,13 @@ In the above test, the operation does not reach the ideal performance expected i

In a short period of time, a huge volume of data is continuously written to the same Region.

![TiKV Region Split](/media/best-practices/tikv-Region-split.png)
![TiKV Region Split](./media/best-practices/tikv-Region-split.png)

The above diagram illustrates the Region splitting process. As data is continuously written into TiKV, TiKV splits a Region into multiple Regions. Because the leader election is started on the original store where the Region Leader to be split is located, the leaders of the two newly split Regions might be still on the same store. This splitting process might also happen on the newly split Region 2 and Region 3. In this way, write pressure is concentrated on TiKV-Node 1.

During the continuous write process, after finding that hotspot is caused on Node 1, PD evenly distributes the concentrated Leaders to other nodes. If the number of TiKV nodes is more than the number of Region replicas, TiKV will try to migrate these Regions to idle nodes. These two operations during the write process are also reflected in the PD's monitoring metrics:

![QPS5](/media/best-practices/QPS5.png)
![QPS5](./media/best-practices/QPS5.png)

After a period of continuous writes, PD automatically schedules the entire TiKV cluster to a state where pressure is evenly distributed. By that time, the capacity of the whole cluster can be fully used.

Expand All @@ -150,7 +150,7 @@ SPLIT TABLE table_name [INDEX index_name] BY (value_list) [, (value_list)]

However, TiDB does not automatically perform this pre-split operation. The reason is related to the data distribution in TiDB.

![Table Region Range](/media/best-practices/table-Region-range.png)
![Table Region Range](./media/best-practices/table-Region-range.png)

From the diagram above, according to the encoding rule of a row's key, the `rowID` is the only variable part. In TiDB, `rowID` is an `Int64` integer. However, you might not need to evenly split the `Int64` integer range to the desired number of ranges and then to distribute these ranges to different nodes, because Region split must also be based on the actual situation.

Expand Down Expand Up @@ -192,11 +192,11 @@ ORDER BY

Then operate the write load again:

![QPS6](/media/best-practices/QPS6.png)
![QPS6](./media/best-practices/QPS6.png)

![QPS7](/media/best-practices/QPS7.png)
![QPS7](./media/best-practices/QPS7.png)

![QPS8](/media/best-practices/QPS8.png)
![QPS8](./media/best-practices/QPS8.png)

You can see that the apparent hotspot problem has been resolved now.

Expand Down
Loading