Skip to content

Commit 1856045

Browse files
sandeepkndclaude
andcommitted
feat: Add etcd health and performance analysis commands
Add two new OpenShift plugin commands for etcd monitoring and diagnostics: 1. /openshift:etcd-check-health - Performs comprehensive health checks on etcd cluster - Validates member status, quorum, and connectivity - Reports disk space, database size, and fragmentation - Detects configuration issues and performance problems 2. /openshift:etcd-analyze-performance - Analyzes etcd performance metrics and latency - Examines disk I/O, compaction times, and snapshot performance - Monitors leader stability and detects frequent changes - Identifies slow operations and bottlenecks - Provides actionable recommendations for optimization Both commands support OpenShift clusters with proper authentication and provide detailed analysis with severity-based recommendations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 48e260d commit 1856045

File tree

7 files changed

+1268
-0
lines changed

7 files changed

+1268
-0
lines changed

.claude-plugin/marketplace.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@
6464
"source": "./plugins/openshift",
6565
"description": "OpenShift development utilities and helpers"
6666
},
67+
{
68+
"name": "etcd",
69+
"source": "./plugins/etcd",
70+
"description": "Etcd cluster health monitoring and performance analysis utilities"
71+
},
6772
{
6873
"name": "yaml",
6974
"source": "./plugins/yaml",

PLUGINS.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This document lists all available Claude Code plugins and their commands in the
66
- [Ci](#ci-plugin)
77
- [Component Health](#component-health-plugin)
88
- [Doc](#doc-plugin)
9+
- [Etcd](#etcd-plugin)
910
- [Git](#git-plugin)
1011
- [Hcp](#hcp-plugin)
1112
- [Hello World](#hello-world-plugin)
@@ -61,6 +62,16 @@ A plugin for engineering documentation and notes
6162

6263
See [plugins/doc/README.md](plugins/doc/README.md) for detailed documentation.
6364

65+
### Etcd Plugin
66+
67+
Etcd cluster health monitoring and performance analysis utilities
68+
69+
**Commands:**
70+
- **`/etcd:analyze-performance` `"[--duration <minutes>]"`** - Analyze etcd performance metrics, latency, and identify bottlenecks
71+
- **`/etcd:health-check` `"[--verbose]"`** - Check etcd cluster health, member status, and identify issues
72+
73+
See [plugins/etcd/README.md](plugins/etcd/README.md) for detailed documentation.
74+
6475
### Git Plugin
6576

6677
Git workflow automation and utilities

docs/data.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -522,6 +522,27 @@
522522
"skills": [],
523523
"has_readme": true
524524
},
525+
{
526+
"name": "etcd",
527+
"description": "Etcd cluster health monitoring and performance analysis utilities",
528+
"version": "0.0.1",
529+
"commands": [
530+
{
531+
"name": "analyze-performance",
532+
"description": "Analyze etcd performance metrics, latency, and identify bottlenecks",
533+
"synopsis": "/etcd:analyze-performance [--duration <minutes>]",
534+
"argument_hint": "\"[--duration <minutes>]\""
535+
},
536+
{
537+
"name": "health-check",
538+
"description": "Check etcd cluster health, member status, and identify issues",
539+
"synopsis": "/etcd:health-check [--verbose]",
540+
"argument_hint": "\"[--verbose]\""
541+
}
542+
],
543+
"skills": [],
544+
"has_readme": true
545+
},
525546
{
526547
"name": "yaml",
527548
"description": "YAML documentation and utilities",
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"name": "etcd",
3+
"description": "Etcd cluster health monitoring and performance analysis utilities",
4+
"version": "0.0.1",
5+
"author": {
6+
"name": "github.com/openshift-eng"
7+
}
8+
}

plugins/etcd/README.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# Etcd Plugin
2+
3+
A Claude Code plugin for monitoring etcd cluster health and analyzing performance in OpenShift environments.
4+
5+
## Overview
6+
7+
This plugin provides commands to help diagnose and troubleshoot etcd-related issues in OpenShift clusters. Etcd is the critical distributed key-value store that holds all cluster state for Kubernetes/OpenShift, and maintaining its health and performance is essential for cluster stability.
8+
9+
## Commands
10+
11+
### `/etcd:health-check`
12+
13+
Performs a comprehensive health check of the etcd cluster, examining:
14+
- Etcd pod status and availability
15+
- Cluster health and member status
16+
- Leadership election status
17+
- Database size and fragmentation
18+
- Disk space utilization
19+
- Recent error logs
20+
- Performance metrics (with `--verbose` flag)
21+
22+
**Usage:**
23+
```
24+
/etcd:health-check [--verbose]
25+
```
26+
27+
**Example:**
28+
```
29+
/etcd:health-check
30+
/etcd:health-check --verbose
31+
```
32+
33+
### `/etcd:analyze-performance`
34+
35+
Analyzes etcd performance metrics to identify latency issues and bottlenecks, including:
36+
- Disk I/O performance (commit latency, fsync duration)
37+
- Network latency between etcd peers
38+
- Request/response performance by operation type
39+
- Leader stability and proposal metrics
40+
- Database size and fragmentation
41+
- Performance warnings from logs
42+
43+
**Usage:**
44+
```
45+
/etcd:analyze-performance [--duration <minutes>]
46+
```
47+
48+
**Example:**
49+
```
50+
/etcd:analyze-performance
51+
/etcd:analyze-performance --duration 15
52+
```
53+
54+
## Prerequisites
55+
56+
All commands require:
57+
58+
1. **OpenShift CLI (oc)** - Install from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/
59+
2. **Active cluster connection** - Must be authenticated to an OpenShift cluster
60+
3. **Cluster admin permissions** - Required to access etcd pods and metrics
61+
4. **Running etcd pods** - At least one etcd pod must be running
62+
63+
## Installation
64+
65+
### From Marketplace
66+
67+
```bash
68+
# Add the marketplace (if not already added)
69+
/plugin marketplace add openshift-eng/ai-helpers
70+
71+
# Install the etcd plugin
72+
/plugin install etcd@ai-helpers
73+
```
74+
75+
### Manual Installation
76+
77+
```bash
78+
# Clone the repository
79+
git clone https://github.com/openshift-eng/ai-helpers.git
80+
81+
# Link to your Claude Code plugins directory
82+
ln -s $(pwd)/ai-helpers/plugins/etcd ~/.claude/plugins/etcd
83+
```
84+
85+
## Use Cases
86+
87+
### Troubleshooting Cluster Issues
88+
89+
When experiencing cluster-wide problems:
90+
1. Run `/etcd:health-check` to verify etcd cluster status
91+
2. If issues are found, run `/etcd:analyze-performance` to identify bottlenecks
92+
3. Follow the recommendations provided in the output
93+
94+
### Performance Tuning
95+
96+
For proactive performance monitoring:
97+
1. Run `/etcd:analyze-performance --duration 30` for comprehensive analysis
98+
2. Review disk I/O and network latency metrics
99+
3. Compare against recommended thresholds
100+
4. Implement suggested optimizations
101+
102+
### Capacity Planning
103+
104+
Before scaling operations:
105+
1. Check current database size with `/etcd:health-check`
106+
2. Analyze performance trends with `/etcd:analyze-performance`
107+
3. Identify if hardware upgrades are needed
108+
109+
## Common Issues and Solutions
110+
111+
### High Disk Latency
112+
113+
**Problem:** Backend commit P99 > 100ms or WAL fsync P99 > 10ms
114+
115+
**Solutions:**
116+
- Migrate to SSD or NVMe storage
117+
- Use dedicated disks for etcd (not shared with OS)
118+
- Check for competing I/O workloads
119+
120+
### Frequent Leader Changes
121+
122+
**Problem:** Leader changes > 5
123+
124+
**Solutions:**
125+
- Check network connectivity between etcd nodes
126+
- Ensure nodes are in same datacenter/availability zone
127+
- Verify no clock skew between nodes
128+
129+
### Large Database Size
130+
131+
**Problem:** Database size > 8GB or high fragmentation
132+
133+
**Solutions:**
134+
- Run etcd defragmentation
135+
- Review event retention policies
136+
- Check for excessive key creation
137+
138+
## Performance Benchmarks
139+
140+
Recommended thresholds for healthy etcd:
141+
- **Backend commit P99:** < 100ms
142+
- **WAL fsync P99:** < 10ms
143+
- **Peer RTT P99:** < 50ms
144+
- **Leader changes:** < 5 total
145+
- **Database size:** < 8GB
146+
- **Disk usage:** < 80%
147+
148+
## Security Considerations
149+
150+
- Commands require cluster-admin or equivalent permissions
151+
- Access to etcd allows viewing all cluster secrets
152+
- Metrics and logs may contain sensitive information
153+
- Performance data should be treated as confidential
154+
155+
## Resources
156+
157+
- **Etcd Documentation:** https://etcd.io/docs/
158+
- **OpenShift Etcd Docs:** https://docs.openshift.com/container-platform/latest/backup_and_restore/control_plane_backup_and_restore/
159+
- **Performance Tuning:** https://etcd.io/docs/latest/tuning/
160+
161+
## Contributing
162+
163+
To contribute improvements or report issues:
164+
1. Visit https://github.com/openshift-eng/ai-helpers
165+
2. Open an issue or pull request
166+
3. Follow the contribution guidelines in the repository
167+
168+
## License
169+
170+
This plugin is part of the ai-helpers project and follows the same license terms.

0 commit comments

Comments
 (0)