havah-chain-node HA
/etc/init.d/
- havah_active
- havah_backup
- havah_status
/app/havah-chain-node/
- docker-compose.yml
/etc/hosts
[root@validator001 ~]$ sudo cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.101 cluster01
10.10.10.102 cluster02
- Pacemaker 설치 되는 노드 간 양방향 Port Open 필요 합니다. (TCP 22, TCP 2224, TCP 9000, UDP 5404 ~ 5412)
[root@validator01 ~]$ yum install pacemaker pcs resource-agents
[root@validator01 ~]$ systemctl enable pcsd.service
[root@validator01 ~]$ apt install pacemaker corosync fence-agents pcs
[root@validator01 ~]$ systemctl enable pcsd.service
[root@validator01 ~]$ passwd hacluster
Changing password for user hacluster.
New password:
BAD PASSWORD: The password contains the user name in some form
Retype new password:
passwd: all authentication tokens updated successfully.
[root@validator01 ~]$
[root@validator01 ~]$ pcs cluster auth cluster01 cluster02 <> /dev/tty
Username: hacluster
Password: P@ssw0rd (패스워드 입력)
cluster02: Authorized
cluster01: Authorized
[root@validator01 ~]$ pcs cluster setup --name cluster cluster01 cluster02 --transport udpu
[root@validator01 ~]$ pcs cluster start --all --wait=60
[root@validator01 ~]$ systemctl enable corosync pacemaker pcsd
[root@validator01 ~]$ pcs status
Cluster name: cluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: cluster02 (version 1.1.23-1.amzn2.1-9acf116022) - partition with quorum
Last updated: Fri Jun 9 01:48:40 2023
Last change: Fri Jun 9 01:47:43 2023 by hacluster via crmd on cluster02
2 nodes configured
0 resource instances configured
Online: [ cluster01 cluster02 ]
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@validator01 ~]$ pcs cluster cib tmp-cib.xml
[root@validator01 ~]$ pcs -f tmp-cib.xml property set stonith-enabled=false
# Active Node resource (*havah_active*) 등록
[root@validator01 ~]$ pcs -f tmp-cib.xml resource create havah_active lsb:havah_active \
op force-reload interval=0s timeout=60s monitor interval=30s timeout=60s \
restart interval=0s timeout=60s start interval=0s timeout=60s stop \
interval=0s
# Active Node Status resource (*havah_status*) 등록
[root@validator01 ~]$ pcs -f tmp-cib.xml resource create havah_status lsb:havah_status \
op force-reload interval=0s timeout=60s monitor interval=30s timeout=60s \
restart interval=0s start interval=0s stop interval=0s
# Backup Node resource (*havah_backup*) 등록
[root@validator01 ~]$ pcs -f tmp-cib.xml resource create havah_backup lsb:havah_backup \
op force-reload interval=0s timeout=60s monitor interval=30s timeout=60s \
restart interval=0s timeout=60s start interval=0s timeout=60s stop \
interval=0s
# 만약 개별 리소스에 대해 설정이 필요하다면 op 옵션과 함께 지정해 준다.
# 글로벌 Default 옵션이 필요하다면 pcs resource op defaults timeout=XX 형식으로 지정해 줄 수 있다.
# 글로벌 Default 옵션에 대한 확인은 pcs resource op defaults 로 확인 할 수 있다. (운영환경에서는 권고하지않음)
# Group 생성
# Active 그룹을 만들고 Active 관련 (havah_active, havah_status) Resource 를 추가
[root@validator01 ~]$ pcs -f tmp-cib.xml resource group add Active havah_active havah_status
# Backup 그룹을 만들고 Backup 관련 (havah_backup) Resource 를 추가
[root@validator01 ~]$ pcs -f tmp-cib.xml resource group add Backup havah_backup
# Resource Group 설정 ( Active 와 Backup 그룹이 같은 노드에 실행 되지 않도록 )
[root@validator01 ~]$ pcs -f tmp-cib.xml constraint order havah_active then havah_status
[root@validator01 ~]$ pcs -f tmp-cib.xml resource defaults resource-stickiness=3000
Warning: Defaults do not apply to resources which override them with their own defined values
[root@validator01 ~]$ pcs cluster cib-push tmp-cib.xml
초기 기본 상태 확인
[root@validator01 ~]$ pcs status
Cluster name: cluster
Stack: corosync
Current DC: cluster02 (version 1.1.23-1.amzn2.1-9acf116022) - partition with quorum
Last updated: Fri Jun 9 16:31:14 2023
Last change: Fri Jun 9 16:23:17 2023 by root via cibadmin on cluster02
2 nodes configured
3 resource instances configured (1 DISABLED)
Online: [ cluster01 cluster02 ]
Full list of resources:
Resource Group: Active
havah_active (lsb:havah_active): Started cluster01
havah_status (lsb:havah_status): Started cluster01
Resource Group: Backup
havah_backup (lsb:havah_backup): Stopped (disabled)
Failed Resource Actions:
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
- Active Group의 havah_active, havah_status Resource 가 cluster01 서버에서 실행 중 ✅
- Backup Group의 havah_backup Resource 가 Stopped(disabled) 되어 있음. ✅
[root@validator01 ~]$ pcs resource enable havah_backup
- pcs status 로 확인하면 cluster02 노드에서 havah_backup resource 기동 되는게 확인 됨. ✅
[root@validator01 ~]$ pcs status
2 nodes configured
3 resource instances configured (1 DISABLED) ## 활성화 및 비활성화 Resource 개수 표시
Online: [ cluster02 ]
OFFLINE: [ cluster01 ] ## OFFLINE 상태 인 클러스터 노드 표시
Full list of resources:
Resource Group: Active
havah_active (lsb:havah_active): Started cluster02 ## cluster02 노드로 Active 전환
havah_status (lsb:havah_status): Started cluster02 ## cluster02 노드로 Active 전환
Resource Group: Backup
havah_backup (lsb:havah_backup): Stopped (disabled) ## havah_backup disable
Failed Resource Actions:
* havah_status_monitor_60000 on cluster02 'unknown error' (1): call=73, status=Timed Out, exitreason='',
last-rc-change='Mon Jun 12 18:28:51 2023', queued=0ms, exec=60000ms
* havah_active_monitor_60000 on cluster02 'not running' (7): call=69, status=complete, exitreason='',
last-rc-change='Mon Jun 12 18:28:51 2023', queued=0ms, exec=0ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
그 이유는 FailCount 설정 때문이다. 이를 해결할 수 있는 방법은
- 수동으로 FailCount Reset 해준다.
[root@validator01 ~]$ pcs resource failcount show <resource-name>
[root@validator01 ~]$ pcs resource failcount reset <resource-name>
[root@validator01 ~]$ pcs status
...
2 nodes configured
3 resource instances configured (1 DISABLED)
Online: [ cluster01 cluster02 ] ## cluster02 노드 Online 상태 확인.
Full list of resources:
Resource Group: Active
havah_active (lsb:havah_active): Started cluster02
havah_status (lsb:havah_status): Started cluster02
Resource Group: Backup
havah_backup (lsb:havah_backup): Stopped (disabled)
...
[root@validator01 ~]$ pcs resource enable havah_backup
[root@validator01 ~]$ pcs status
...
2 nodes configured
3 resource instances configured
Online: [ cluster01 cluster02 ]
Full list of resources:
Resource Group: Active
havah_active (lsb:havah_active): Started cluster02
havah_status (lsb:havah_status): Started cluster02
Resource Group: Backup
havah_backup (lsb:havah_backup): Started cluster01 ## cluster01 노드에서 활성 확인.
...
[root@validator01 ~]$ pcs resource move Active cluster01
...
Online: [ cluster01 cluster02 ]
Full list of resources:
Resource Group: Active
havah_active (lsb:havah_active): Started cluster01
havah_status (lsb:havah_status): Started cluster01
Resource Group: Backup
havah_backup (lsb:havah_backup): Stopped (disabled)
...
# havah_active 기동 시 스크립트 상 havah_bakcup disable 시키는 로직이 있음. (active 기동 간섭 방지용)
[root@validator01 ~]$ pcs resource enable havah_backup
[root@validator01 ~]$ pcs status
...
Online: [ cluster01 cluster02 ]
Full list of resources:
Resource Group: Active
havah_active (lsb:havah_active): Started cluster01
havah_status (lsb:havah_status): Started cluster01
Resource Group: Backup
havah_backup (lsb:havah_backup): Started cluster02
...
[root@validator01 ~]$ pcs constraint location --full
Location Constraints:
Resource: Active
Enabled on: cluster02 (score:INFINITY) (role: Started) (id:cli-prefer-Active)
Resource: havah_active
Enabled on: cluster01 (score:INFINITY) (role: Started) (id:cli-prefer-havah_active)
[root@validator01 ~]$ pcs constraint location remove cli-prefer-Active ## 위 명령 결과 값 id 입력
[root@validator01 ~]$ pcs constraint location remove cli-prefer-havah_active ## 위 명령 결과 값 id 입력
# 해당 location 설정이 되어 있다면, Resource 기동 시 설정 되어 있는 노드 우선 적으로 실행 하게 됨.
[root@validator01 ~]$ pcs cluster setup --name cluster cluster01 cluster02 --transport udpu
시작
[root@validator01 ~]$ pcs cluster start # Start on One node
[root@validator01 ~]$ pcs cluster start -all # Start on all cluster members
중지
[root@validator01 ~]$ pcs cluster stop # Stop One node
[root@validator01 ~]$ pcs cluster stop -all # Stop all cluster members
[root@validator01 ~]$ pcs status
[root@validator01 ~]$ pcs resource enable havah_active
[root@validator01 ~]$ pcs resource disable havah_active
[root@validator01 ~]$ pcs resource move havah_active cluster02
[root@validator01 ~]$ pcs resource move Active cluster02
[root@validator01 ~]$ pcs resource failcount show
[root@validator01 ~]$ pcs resource failcount reset havah_active
[root@validator01 ~]$ pcs cluster destroy
- pcs cluster: 클러스터 노드 관련 작업
- pcs property: 클러스터 속성 설정
- pcs resource: 리소스 관련 작업
- pcs constraint: 제약 조건 관련 작업
- pcs stonith: STONITH 관련 작업
- pcs status: 클러스터 상태 확인
- pcs config: 클러스터 구성 파일 생성 및 관리