Skip to content

Commit

Permalink
[skipci] doc: update doc(mds, monitor, developers_guide)
Browse files Browse the repository at this point in the history
Signed-off-by: Xinlong Chen <[email protected]>
  • Loading branch information
Xinlong-Chen authored and ilixiaocui committed May 11, 2023
1 parent 3aa7435 commit c67b7ea
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 36 deletions.
2 changes: 1 addition & 1 deletion developers_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ For PR we have the following requirements:

Repush will trigger CI, If github page have no reaction. Please wait.

If CI is not stabled, comment ```recheck``` will trigger CI.
If CI is not stabled, repeatedly comment ```cicheck``` will trigger CI again.

## Communication

Expand Down
2 changes: 1 addition & 1 deletion developers_guide_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ CI 检查点有:

重新 push 会触发 CI, 若暂时无反应, 请耐心等待, 测试处于排队中.

若 CI 不稳定, comment ```recheck```, 重新触发 CI.
若 CI 不稳定, 可重复 comment ```cicheck``` 以重新触发 CI.

## 社区交流

Expand Down
37 changes: 18 additions & 19 deletions docs/cn/mds.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,15 @@

MDS是中心节点,负责元数据管理、集群状态收集与调度。MDS包含以下几个部分:

- Topoloy: 管理集群的 **topo 元数据**信息。
- Nameserver: 管理**文件的元数据**信息。
- Copyset: 副本放置策略。

- Topology: 管理集群的 **topo 元数据**信息。
- NameServer: 管理**文件的元数据**信息。
- CopySet: 副本放置策略。
- Heartbeat: 心跳模块。跟chunkserver进行交互,收集chunkserver上的负载信息,copyset信息等。
- Schedule: 调度模块。用于自动容错和负载均衡。

## Topology

topology用于管理和组织机器,利用底层机器的放置、网络的规划以面向业务提供如下功能和非功能需求。
topology用于管理和组织机器,利用底层机器的放置、网络的规划以面向业务提供如下功能和非功能需求。

1. **故障域的隔离**:比如副本的方式分布在不同机器,不同机架,或是不同的交换机下面。
2. **隔离和共享**:不同用户的数据可以实现固定物理资源的隔离和共享。
Expand All @@ -26,13 +25,13 @@ curve整体的拓扑结构如下图:

<img src="../images/mds-topology-all.png" alt="mds-topology-all.png" width="900">

**chunkserver**用于抽象描述物理服务器上的一块物理磁盘(SSD),chunkserver以一块磁盘作为最小的服务单元。
**chunkserver**: 用于抽象描述物理服务器上的一块物理磁盘(SSD),chunkserver以一块磁盘作为最小的服务单元。

**server:** 用于抽象描述一台物理服务器,chunkserver必须归属于server。
**server**: 用于抽象描述一台物理服务器,chunkserver必须归属于server。

**zone:** 故障隔离的基本单元,一般来说属于不同zone的机器至少是部署在不同的机架,再要求严格一点的话,属于不同zone的机器可以部署在不同机架组下面(一个机架组共享一组堆叠 leaf switch),一个server必须归属于一个zone。
**zone**: 故障隔离的基本单元,一般来说属于不同zone的机器至少是部署在不同的机架,再要求严格一点的话,属于不同zone的机器可以部署在不同机架组下面(一个机架组共享一组堆叠 leaf switch),一个server必须归属于一个zone。

**pool:** 用于实现对机器资源进行物理隔离,pool中server之间的交互仅限于pool之内的server。运维上,可以在上架一批新的机器的时候,规划一个全新的pool,以pool为单元进行物理资源的扩容(pool内扩容也可以支持,但是不建议pool内扩容,因为会影响每个chunkserver上的copyset的数量)。
**pool**: 用于实现对机器资源进行物理隔离,pool中server之间的交互仅限于pool之内的server。运维上,可以在上架一批新的机器的时候,规划一个全新的pool,以pool为单元进行物理资源的扩容(pool内扩容也可以支持,但是不建议pool内扩容,因为会影响每个chunkserver上的copyset的数量)。

借鉴ceph的设计,curve在如上物理pool之上又引入逻辑pool的概念,以实现统一存储系统的需求,即在单个存储系统中多副本PageFile支持块设备、三副本AppendFile(待开发)支持在线对象存储、AppendECFile(待开发)支持近线对象存储可以共存。

Expand All @@ -42,7 +41,7 @@ curve整体的拓扑结构如下图:

通过结合curve的用户系统,LogicalPool可以通过配置限定特定user使用的方式,实现多个租户数据物理隔离(待开发)。

**logicalPool**用于在逻辑层面建立不同特性的pool,比如如上AppendECFile pool、AppendEC pool 、PageFile pool;实现user级别的数据隔离和共享。
**logicalPool**: 用于在逻辑层面建立不同特性的pool,比如如上AppendECFile pool、AppendEC pool 、PageFile pool;实现user级别的数据隔离和共享。

## NameServer

Expand Down Expand Up @@ -94,7 +93,7 @@ ChunkServer,Copyset和Chunk三者之间的关系如下图:

​ 3. 通过上述信息的定期更新,作为schedule 模块进行均衡及配置变更的依据

​ 4. 通过chunkserver定期上报copyset的copyset的epoch, 检测chunkserver的copyset与mds差异,同步两者的copyset信息
​ 4. 通过chunkserver定期上报copyset的epoch,检测chunkserver的copyset与mds差异,同步两者的copyset信息

​ 5. 支持配置变更功能,在心跳回复报文中下发mds发起的配置变更命令,并在后续心跳中获取配置变更进度。

Expand All @@ -106,19 +105,19 @@ ChunkServer,Copyset和Chunk三者之间的关系如下图:

mds 端的心跳主要由三个部分组成:

*TopoUpdater:* 根据 chunkserver 上报的 copyset 信息更新拓扑中的信息。
*TopoUpdater*: 根据 chunkserver 上报的 copyset 信息更新拓扑中的信息。

*ConfGenerator:* 将当前上报的 copyset 信息提交给调度模块,获取该 copyset 上可能需要执行的任务。
*ConfGenerator*: 将当前上报的 copyset 信息提交给调度模块,获取该 copyset 上可能需要执行的任务。

*HealthyChecker:* 检查集群中的 chunkserver 在当前时间点距离上一次心跳的时间,根据这个时间差更新chunkserver状态。
*HealthyChecker*: 检查集群中的 chunkserver 在当前时间点距离上一次心跳的时间,根据这个时间差更新chunkserver状态。

##### Chunkserver端

chunkserver 端的心跳由两个部分组成:

*ChunkServerInfo/CopySetInfo* 获取当前 chunkserver 上的 copyset 信息上报给 MDS。
*ChunkServerInfo/CopySetInfo*: 获取当前 chunkserver 上的 copyset 信息上报给 MDS。

*Order ConfigChange:* 将 MDS 下发的任务提交给对应的 copyset 复制组。
*Order ConfigChange*: 将 MDS 下发的任务提交给对应的 copyset 复制组。

## Schedule

Expand All @@ -130,9 +129,9 @@ chunkserver 端的心跳由两个部分组成:



**Coordinator:** 调度模块的对外接口。心跳会将chunkserver上报上来的copyset信息提交给Coordinator,内部根据该信息判断当前copyset是否有配置变更任务执行,如果有任务则下发。
**Coordinator**: 调度模块的对外接口。心跳会将chunkserver上报上来的copyset信息提交给Coordinator,内部根据该信息判断当前copyset是否有配置变更任务执行,如果有任务则下发。

**任务计算** 任务计算模块包含了多个*定时任务**触发任务**定时任务* 中,``CopySetScheduler`` 是copyset均衡调度器,根据集群中copyset的分布情况生成copyset迁移任务;``LeaderScheduler`` 是leader均衡调度器,根据集群中leader的分布情况生成leader变更任务;``ReplicaScheduler`` 是副本数量调度器,根据当前copyset的副本数生成副本增删任务;``RecoverScheduler`` 是恢复调度器,根据当前copyset副本的存活状态生成迁移任务。*触发任务* 中,``RapidLeaderScheduler`` 是快速leader均衡器,由外部触发,一次生成多个leader变更任务,使得集群的leader尽快大达到均衡状态``TopoAdapter`` 用于获取Topology中调度需要使用的数据。``Common Strategy`` 中是通用的副本添加和移除策略。
**任务计算**: 任务计算模块包含了多个*定时任务**触发任务**定时任务* 中,``CopySetScheduler`` 是copyset均衡调度器,根据集群中copyset的分布情况生成copyset迁移任务;``LeaderScheduler`` 是leader均衡调度器,根据集群中leader的分布情况生成leader变更任务;``ReplicaScheduler`` 是副本数量调度器,根据当前copyset的副本数生成副本增删任务;``RecoverScheduler`` 是恢复调度器,根据当前copyset副本的存活状态生成迁移任务。*触发任务* 中,``RapidLeaderScheduler`` 是快速leader均衡器,由外部触发,一次生成多个leader变更任务,使得集群的leader尽快达到均衡状态``TopoAdapter`` 用于获取Topology中调度需要使用的数据。``Common Strategy`` 中是通用的副本添加和移除策略。

**任务管理** 任务管理模块用于管理计算模块产生的任务。``operatorController`` 是任务集合,用于存放和获取任务;``operatorStateUpdate`` 根据上报的copyset信息更新状态;``Metric``用于统计不同任务个数。
**任务管理**: 任务管理模块用于管理计算模块产生的任务。``operatorController`` 是任务集合,用于存放和获取任务;``operatorStateUpdate`` 根据上报的copyset信息更新状态;``Metric``用于统计不同任务个数。

4 changes: 2 additions & 2 deletions docs/cn/monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ CURVE 中 bvar 的具体使用方式可以查看:

[chunkserver metric](../../src/chunkserver/chunkserver_metrics.h)

[mds topoloy metric](../../src/mds/topology/topology_metric.h)
[mds topology metric](../../src/mds/topology/topology_metric.h)

[mds shedule metric](../../src/mds/schedule/scheduleMetrics.h)
[mds schedule metric](../../src/mds/schedule/scheduleMetrics.h)

## prometheus + grafana

Expand Down
22 changes: 11 additions & 11 deletions docs/en/mds_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
MDS is the center node of the system, responsible for managing metadata, collecting cluster status data and scheduling. MDS consists of following components:

- Topology: Managing topology metadata of the cluster
- Nameserver: Managing file metadata
- Copyset: Replica placement strategy
- NameServer: Managing file metadata
- CopySet: Replica placement strategy
- Heartbeat: Receiving and replying to heartbeat message from chunkserver, collecting load status and copyset info of chunkserver
- Schedule: Module for fault tolerance and load balance

Expand All @@ -26,13 +26,13 @@ Figure 1 shows the topological diagram of CURVE and the explanation of correspon
<font size=3>Figure 1: Topological diagram of CURVE</font>
</p>

**chunkserver**A chunkserver is an abstraction of a physical disk (SSD in our scenario) in a server (physical), and disk is the service unit of chunkserver.
**chunkserver**: A chunkserver is an abstraction of a physical disk (SSD in our scenario) in a server (physical), and disk is the service unit of chunkserver.

**server:** Server represent an actual physical server, to one of which any chunkservers must belong.
**server**: Server represent an actual physical server, to one of which any chunkservers must belong.

**zone:** Zone is the unit of failure isolation. In common cases, servers (a physical machine) of different zones should at least be deployed under different racks. To become stricter for some scenarios, they should be deployed under different groups of racks (racks that share the same set of leaf switches). A server must be owned by a certain zone.
**zone**: Zone is the unit of failure isolation. In common cases, servers (a physical machine) of different zones should at least be deployed under different racks. To become stricter for some scenarios, they should be deployed under different groups of racks (racks that share the same set of leaf switches). A server must be owned by a certain zone.

**pool:** Pool is for implementing physical isolation of resources. Servers are not able to communicate across their pool. In the maintenance of the system, we can arrange a pool for a new set of machines, and extend the storage by pools. Extending storage by adding machines inside a pool is supported, but this is not recommended since it will affect the copyset number of every chunkserver.
**pool**: Pool is for implementing physical isolation of resources. Servers are not able to communicate across their pool. In the maintenance of the system, we can arrange a pool for a new set of machines, and extend the storage by pools. Extending storage by adding machines inside a pool is supported, but this is not recommended since it will affect the copyset number of every chunkserver.

Learned from the design of Ceph, CURVE introduced the concept of logical pool on top of a physical pool in order to satisfy the requirement of building a unified storage system. In our design, we support the coexist of block storage (based on multi-replica), online object storage (based on three replicas storage that support appends, to be implemented) and nearline object storage (based on Erasure Code storage that support appends, to be implemented).

Expand Down Expand Up @@ -100,7 +100,7 @@ Figure 5 demonstrates the relation between ChunkServer, Copyset and Chunk:

## Heartbeat

Heartbeat is for data exchange between center node and data nodes, and it works in following ways:
Heartbeat is for data exchange between center nodes and data nodes, and it works in following ways:

1. Monitor online status(online/offline) of chunkservers by regular heartbeats from chunkserver.
2. Record status information(disk capacity, disk load, copyset load etc.) reported by chunkservers for Ops tools.
Expand All @@ -123,7 +123,7 @@ On MDS side, heartbeat module consists of three parts:

*ConfGenerator*: Forward info reported by copyset to scheduler, and fetch operations for copyset to execute.

*HealthyChecker:* Update chunkserver status by checking the time gap between current time and the last heartbeat of a chunkserver.
*HealthyChecker*: Update chunkserver status by checking the time gap between current time and the last heartbeat of a chunkserver.

##### Chunkserver side

Expand All @@ -144,8 +144,8 @@ System scheduling is for implementing auto fault tolerance and load balancing, w

Figure 7 shows the structure of the scheduler module.

**Coordinator:** Coordinator serves as the interface of the scheduler module. After receiving copyset info provided by heartbeats from chunkserver, coordinator will decide whether there's any configuration change for current copyset, and will distribute the change if there is.
**Coordinator**: Coordinator serves as the interface of the scheduler module. After receiving copyset info provided by heartbeats from chunkserver, coordinator will decide whether there's any configuration change for current copyset, and will distribute the change if there is.

**Task calculation:**Task calculation module is for generating tasks by calculating data of corresponding status. This module consists of a few regular tasks and a triggerable task. Regular tasks include CopySetScheduler, LeaderScheduler, ReplicaScheduler and RecoverScheduler. CopySetScheduler is the scheduler for copyset balancing, generating copysets immigration tasks according to their distribution. LeaderScheduler is the scheduler for leader balancing, which responsible for changing leader according to leaders' distribution. ReplicaScheduler is for scheduling replica number, managing the generation and deletion of replica by analysing current replica numbers of a copyset, while RecoverScheduler controls the immigration of copysets according to their liveness. For triggerable task, RapidLeaderScheduler is for quick leader balancing, triggered by external events, and generates multiple leader changing task at a time to make leaders of the cluster balance as quick as possible. Another two modules are TopoAdapter and CommonStrategy. The former one is for fetching data required by topology module, while the later one implements general strategies for adding and removing replica.
**Task calculation**: Task calculation module is for generating tasks by calculating data of corresponding status. This module consists of a few regular tasks and a triggerable task. Regular tasks include CopySetScheduler, LeaderScheduler, ReplicaScheduler and RecoverScheduler. CopySetScheduler is the scheduler for copyset balancing, generating copysets immigration tasks according to their distribution. LeaderScheduler is the scheduler for leader balancing, which responsible for changing leader according to leaders' distribution. ReplicaScheduler is for scheduling replica number, managing the generation and deletion of replica by analysing current replica numbers of a copyset, while RecoverScheduler controls the immigration of copysets according to their liveness. For triggerable task, RapidLeaderScheduler is for quick leader balancing, triggered by external events, and generates multiple leader changing task at a time to make leaders of the cluster balance as quick as possible. Another two modules are TopoAdapter and CommonStrategy. The former one is for fetching data required by topology module, while the later one implements general strategies for adding and removing replica.

**Task managing**Task managing module manages tasks generated by task calculation module. Inside this module we can see components OperatorController, OperatorStateUpdate and Metric, responsible for fetching and storing tasks, updating status according to copyset info reported and measuring tasks number respectively.
**Task managing**: Task managing module manages tasks generated by task calculation module. Inside this module we can see components OperatorController, OperatorStateUpdate and Metric, responsible for fetching and storing tasks, updating status according to copyset info reported and measuring tasks number respectively.
4 changes: 2 additions & 2 deletions docs/en/monitor_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ The specific usage of bvar in CURVE can be viewed:

[chunkserver metric](../../src/chunkserver/chunkserver_metrics.h)

[mds topoloy metric](../../src/mds/topology/topology_metric.h)
[mds topology metric](../../src/mds/topology/topology_metric.h)

[mds shedule metric](../../src/mds/schedule/scheduleMetrics.h)
[mds schedule metric](../../src/mds/schedule/scheduleMetrics.h)

## prometheus + grafana

Expand Down

0 comments on commit c67b7ea

Please sign in to comment.