Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue:4173694 adding Standby node health check #283

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/cidemo-init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ cd .ci
changed_files=$(git diff --name-only remotes/origin/$ghprbTargetBranch)

# Check for changes excluding .gitmodules and root .ci directory
changes_excluding_gitmodules_and_root_ci=$(echo "$changed_files" | grep -v -e '.gitmodules' -e '.gitignore' -e '^\.ci/' -e '^\.github/workflows' -e '\utils' -e '\plugins/ufm_log_analyzer_plugin') #Removing ufm_log_analyzer_plugin as for now it does not need a formal build
changes_excluding_gitmodules_and_root_ci=$(echo "$changed_files" | grep -v -e '.gitmodules' -e '^scripts/' -e '.gitignore' -e '^\.ci/' -e '^\.github/workflows' -e '\utils' -e '\plugins/ufm_log_analyzer_plugin') #Removing ufm_log_analyzer_plugin as for now it does not need a formal build
kedeme marked this conversation as resolved.
Show resolved Hide resolved

# Check if changes exist and only in a single plugin directory (including its .ci directory)
if [ -n "$changes_excluding_gitmodules_and_root_ci" ] && [ $(echo "$changes_excluding_gitmodules_and_root_ci" | cut -d '/' -f1,2 | uniq | wc -l) -eq 1 ]; then
Expand Down
28 changes: 28 additions & 0 deletions scripts/standby_node_health_check/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Stand by node health check

## What
This script is meant to help a UFM HA user, to make sure that his standby server is configured correctly and ready to become the new Master, in case of failover.

## How to run
1. Using python 3.6 and above
2. No prequesition are needed
3. The script is meant to run on a standby node only.
4. Place the script in a directory, for example under `/tmp`
5. Run the command `python3 standby_node_health_check --fabric-interfaces ib0 ib1 --mgmt-interfaces ens192`

## What the script is checking
1. checking if all given fabric interface are up.
boazhaim marked this conversation as resolved.
Show resolved Hide resolved
2. Checking if all given management interface are up.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Checking if all given management interface are up.
2. Checking if all given management interfaces are up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, per Kobi, we are checking only one management interface. I updated a few lines above to reflect it also

3. Checking ufm ha is configured.
4. If one of the previous validation fails, stop and set the return code is 1.
5. Checking if the node is a standby, if not, stop and set the return code is 1.
6. Checking Pacemaker status.
7. Checking corosync service is active.
8. Checking pacemaker service is active.
9. Checking pcsd service is active.
10. Checking the DRBD role is Secondary.
11. DRBD connectivity state is Connected.
12. DRBD disk state is UpToDate.
13. If any of the previous tests fails, stop and set the return code is 1.

Note - In case that all tests have passed, the return code is 0.
Loading