Skip to content

Commit d8af83d

Browse files
Add LVMS plugin for storage troubleshooting
Adds comprehensive LVMS (Logical Volume Manager Storage) troubleshooting plugin for diagnosing storage issues in OpenShift clusters. Features: - Analyzes LVMCluster health and readiness - Volume group status per node with device information - PVC/PV binding issues and provisioning failures - LVMS operator and vg-manager pod health - Pod log analysis with JSON parsing and error deduplication - TopoLVM CSI driver configuration - Root cause analysis with remediation recommendations Works with: - Live OpenShift clusters (via oc CLI) - LVMS must-gather data (offline analysis) Compatibility: - Supports both openshift-storage (old) and openshift-lvm-storage (new) namespaces - Automatically detects namespace and must-gather structure - Component-specific analysis (storage, operator, volumes, logs) Implementation includes: - Command definition: /lvms:analyze - Python analysis script with comprehensive must-gather parsing - Detailed skill documentation with real-world examples - Plugin README with common use cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent abd54a3 commit d8af83d

File tree

8 files changed

+2462
-0
lines changed

8 files changed

+2462
-0
lines changed

.claude-plugin/marketplace.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,11 @@
6868
"name": "must-gather",
6969
"source": "./plugins/must-gather",
7070
"description": "A plugin to analyze and report on must-gather data"
71+
},
72+
{
73+
"name": "lvms",
74+
"source": "./plugins/lvms",
75+
"description": "LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues"
7176
}
7277
]
7378
}

PLUGINS.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ This document lists all available Claude Code plugins and their commands in the
88
- [Git](#git-plugin)
99
- [Hello World](#hello-world-plugin)
1010
- [Jira](#jira-plugin)
11+
- [Lvms](#lvms-plugin)
1112
- [Must Gather](#must-gather-plugin)
1213
- [Olm](#olm-plugin)
1314
- [Openshift](#openshift-plugin)
@@ -82,6 +83,15 @@ A plugin to automate tasks with Jira
8283

8384
See [plugins/jira/README.md](plugins/jira/README.md) for detailed documentation.
8485

86+
### Lvms Plugin
87+
88+
LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues
89+
90+
**Commands:**
91+
- **`/lvms:analyze` `[must-gather-path|--live] [--component storage|operator|volumes]`** - Comprehensive LVMS troubleshooting - analyzes LVMCluster, volume groups, PVCs, and storage issues on live clusters or must-gather
92+
93+
See [plugins/lvms/README.md](plugins/lvms/README.md) for detailed documentation.
94+
8595
### Must Gather Plugin
8696

8797
A plugin to analyze and report on must-gather data

docs/data.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,27 @@
445445
}
446446
],
447447
"has_readme": true
448+
},
449+
{
450+
"name": "lvms",
451+
"description": "LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues",
452+
"version": "0.1.0",
453+
"commands": [
454+
{
455+
"name": "analyze",
456+
"description": "Comprehensive LVMS troubleshooting - analyzes LVMCluster, volume groups, PVCs, and storage issues on live clusters or must-gather",
457+
"synopsis": "/lvms:analyze [must-gather-path] [--live] [--component <component>]",
458+
"argument_hint": "[must-gather-path|--live] [--component storage|operator|volumes]"
459+
}
460+
],
461+
"skills": [
462+
{
463+
"name": "LVMS Analyzer",
464+
"id": "lvms-analyzer",
465+
"description": "Analyzes LVMS must-gather data to diagnose storage issues"
466+
}
467+
],
468+
"has_readme": true
448469
}
449470
]
450471
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"name": "lvms",
3+
"description": "LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues",
4+
"version": "0.1.0",
5+
"author": {
6+
"name": "github.com/openshift-eng"
7+
}
8+
}

plugins/lvms/README.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# LVMS Plugin
2+
3+
Comprehensive troubleshooting and debugging plugin for LVMS (Logical Volume Manager Storage).
4+
5+
## Overview
6+
7+
The LVMS plugin provides powerful commands for diagnosing and troubleshooting storage issues in OpenShift clusters using LVMS. It analyzes LVMCluster resources, volume groups, PVCs, TopoLVM CSI driver, and node-level storage configuration to identify root causes of storage failures.
8+
9+
## Commands
10+
11+
### `/lvms:analyze`
12+
13+
Comprehensive LVMS troubleshooting that analyzes cluster health, storage resources, and identifies common issues.
14+
15+
**Works with:**
16+
- Live OpenShift clusters (via `oc` CLI)
17+
- LVMS must-gather data (offline analysis)
18+
19+
**Features:**
20+
- LVMCluster health and readiness analysis
21+
- Volume group status across all nodes
22+
- PVC/PV binding issues and pending volumes
23+
- LVMS operator and TopoLVM CSI driver health
24+
- Node-level device availability and configuration (live clusters)
25+
- Thin pool capacity and usage
26+
- Pod log analysis with error deduplication
27+
- Root cause analysis with specific remediation steps
28+
29+
**Usage Examples:**
30+
31+
```bash
32+
# Analyze live cluster
33+
/lvms:analyze --live
34+
35+
# Analyze must-gather data
36+
/lvms:analyze ./must-gather/registry-ci-openshift-org-origin-4-18.../
37+
38+
# Focus on specific component
39+
/lvms:analyze --live --component storage
40+
/lvms:analyze ./must-gather/... check pending PVCs
41+
42+
# Analyze pod logs only
43+
/lvms:analyze --live --component logs
44+
/lvms:analyze ./must-gather/... --component logs
45+
```
46+
47+
## Common Use Cases
48+
49+
### 1. PVCs Stuck in Pending State
50+
51+
When PVCs using LVMS storage classes are not binding:
52+
53+
```bash
54+
/lvms:analyze --live check pending PVCs
55+
```
56+
57+
The command will:
58+
- Identify which PVCs are pending
59+
- Check volume group free space
60+
- Verify TopoLVM CSI driver is running
61+
- Check for node affinity issues
62+
- Provide specific remediation steps
63+
64+
### 2. LVMCluster Not Ready
65+
66+
When LVMCluster resource is not reaching Ready state:
67+
68+
```bash
69+
/lvms:analyze --live analyze operator
70+
```
71+
72+
The command will:
73+
- Check LVMCluster status and conditions
74+
- Identify which nodes have volume group issues
75+
- Verify device availability and configuration
76+
- Check for conflicting filesystems on devices
77+
- Provide steps to clean devices and recreate VGs
78+
79+
### 3. Volume Group Creation Failures
80+
81+
When volume groups are not being created on nodes:
82+
83+
```bash
84+
/lvms:analyze --live --component volumes
85+
```
86+
87+
The command will:
88+
- Show volume group status per node
89+
- Identify missing or failed volume groups
90+
- Check device selector configuration
91+
- Detect devices already in use
92+
- Provide commands to wipe devices and retry
93+
94+
### 4. Must-Gather Analysis
95+
96+
When analyzing a must-gather from a failed cluster:
97+
98+
```bash
99+
/lvms:analyze ./must-gather/path/
100+
```
101+
102+
The command will:
103+
- Perform offline analysis of all LVMS resources
104+
- Generate comprehensive health report
105+
- Identify critical issues and warnings
106+
- Provide prioritized remediation recommendations
107+
- Suggest which logs to review
108+
109+
## Installation
110+
111+
### From Marketplace
112+
113+
```bash
114+
# Add the marketplace
115+
/plugin marketplace add openshift-eng/ai-helpers
116+
117+
# Install LVMS plugin
118+
/plugin install lvms@ai-helpers
119+
120+
# Use the command
121+
/lvms:analyze --live
122+
```
123+
124+
### Manual Installation
125+
126+
```bash
127+
# Clone the repository
128+
git clone https://github.com/openshift-eng/ai-helpers.git
129+
130+
# Link to Claude Code plugins directory
131+
ln -s $(pwd)/ai-helpers/plugins/lvms ~/.claude/plugins/lvms
132+
```
133+
134+
## Prerequisites
135+
136+
**For Live Cluster Analysis:**
137+
- `oc` CLI installed and configured
138+
- Active cluster connection
139+
- Read access to `openshift-lvm-storage` or older `openshift-storage` namespace
140+
- Ability to read cluster-scoped resources
141+
142+
**For Must-Gather Analysis:**
143+
- Python 3.6+ (for analysis script)
144+
- PyYAML library: `pip install pyyaml`
145+
146+
## What the Plugin Checks
147+
148+
### LVMCluster Resources
149+
- Overall state (Ready, Progressing, Failed, Degraded)
150+
- Status conditions (ResourcesAvailable, VolumeGroupsReady)
151+
- Device class configurations
152+
- Node coverage and readiness
153+
154+
### Volume Groups
155+
- Volume group creation status per node
156+
- Physical volume availability
157+
- Free space and capacity
158+
- Thin pool configuration and usage
159+
- Missing or failed volume groups
160+
161+
### Storage (PVCs/PVs)
162+
- PVC binding status
163+
- Pending volume provisioning failures
164+
- Storage class configuration
165+
- Capacity issues
166+
- Node affinity constraints
167+
168+
### Operator Health
169+
- LVMS operator deployment status
170+
- TopoLVM controller readiness
171+
- TopoLVM node daemonset coverage
172+
- VG-manager daemonset status
173+
- Pod crashes and restarts
174+
175+
### Node Devices
176+
- Block device availability
177+
- Existing filesystems on devices
178+
- Device selector matches
179+
- Disk capacity and usage
180+
181+
### Pod Logs
182+
- Error and warning messages from vg-manager pods
183+
- Error and warning messages from lvms-operator pod
184+
- Deduplication of repeated errors from reconciliation loops
185+
- JSON log parsing with timestamps and context
186+
187+
## Output Format
188+
189+
The plugin provides structured, color-coded output:
190+
191+
- ✓ Green checkmarks for healthy components
192+
- ⚠ Yellow warnings for non-critical issues
193+
- ❌ Red errors for critical problems
194+
- ℹ Blue info for additional context
195+
196+
Reports include:
197+
- Component-by-component health status
198+
- Root cause analysis
199+
- Prioritized recommendations
200+
- Specific remediation commands
201+
- Links to relevant documentation
202+
203+
## Troubleshooting the Plugin
204+
205+
**Script not found:**
206+
```bash
207+
# Verify script exists
208+
ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
209+
210+
# Make executable
211+
chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
212+
```
213+
214+
**Cannot connect to cluster:**
215+
```bash
216+
# Verify oc is configured
217+
oc whoami
218+
oc cluster-info
219+
220+
# Check LVMS namespace
221+
oc get namespace openshift-lvm-storage
222+
```
223+
224+
**Must-gather path errors:**
225+
```bash
226+
# Use the correct subdirectory (the one with the hash)
227+
ls must-gather/registry-ci-*/namespaces/openshift-lvm-storage
228+
229+
# Not the parent directory
230+
```
231+
232+
## Related Resources
233+
234+
- [LVMS GitHub Repository](https://github.com/openshift/lvm-operator)
235+
- [LVMS Troubleshooting Guide](https://github.com/openshift/lvm-operator/blob/main/docs/troubleshooting.md)
236+
- [TopoLVM Documentation](https://github.com/topolvm/topolvm)
237+
- [OpenShift Storage Documentation](https://docs.openshift.com/container-platform/latest/storage/index.html)
238+
239+
## Contributing
240+
241+
Contributions are welcome! Please see the main repository's [CLAUDE.md](../../CLAUDE.md) for guidelines on:
242+
- Adding new commands
243+
- Extending analysis capabilities
244+
- Improving diagnostic checks
245+
- Adding helper scripts
246+
247+
## Support
248+
249+
For issues or feature requests:
250+
- GitHub Issues: https://github.com/openshift-eng/ai-helpers/issues
251+
- Repository: https://github.com/openshift-eng/ai-helpers

0 commit comments

Comments
 (0)