|
| 1 | +# LVMS Plugin |
| 2 | + |
| 3 | +Comprehensive troubleshooting and debugging plugin for LVMS (Logical Volume Manager Storage). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The LVMS plugin provides powerful commands for diagnosing and troubleshooting storage issues in OpenShift clusters using LVMS. It analyzes LVMCluster resources, volume groups, PVCs, TopoLVM CSI driver, and node-level storage configuration to identify root causes of storage failures. |
| 8 | + |
| 9 | +## Commands |
| 10 | + |
| 11 | +### `/lvms:analyze` |
| 12 | + |
| 13 | +Comprehensive LVMS troubleshooting that analyzes cluster health, storage resources, and identifies common issues. |
| 14 | + |
| 15 | +**Works with:** |
| 16 | +- Live OpenShift clusters (via `oc` CLI) |
| 17 | +- LVMS must-gather data (offline analysis) |
| 18 | + |
| 19 | +**Features:** |
| 20 | +- LVMCluster health and readiness analysis |
| 21 | +- Volume group status across all nodes |
| 22 | +- PVC/PV binding issues and pending volumes |
| 23 | +- LVMS operator and TopoLVM CSI driver health |
| 24 | +- Node-level device availability and configuration (live clusters) |
| 25 | +- Thin pool capacity and usage |
| 26 | +- Pod log analysis with error deduplication |
| 27 | +- Root cause analysis with specific remediation steps |
| 28 | + |
| 29 | +**Usage Examples:** |
| 30 | + |
| 31 | +```bash |
| 32 | +# Analyze live cluster |
| 33 | +/lvms:analyze --live |
| 34 | + |
| 35 | +# Analyze must-gather data |
| 36 | +/lvms:analyze ./must-gather/registry-ci-openshift-org-origin-4-18.../ |
| 37 | + |
| 38 | +# Focus on specific component |
| 39 | +/lvms:analyze --live --component storage |
| 40 | +/lvms:analyze ./must-gather/... check pending PVCs |
| 41 | + |
| 42 | +# Analyze pod logs only |
| 43 | +/lvms:analyze --live --component logs |
| 44 | +/lvms:analyze ./must-gather/... --component logs |
| 45 | +``` |
| 46 | + |
| 47 | +## Common Use Cases |
| 48 | + |
| 49 | +### 1. PVCs Stuck in Pending State |
| 50 | + |
| 51 | +When PVCs using LVMS storage classes are not binding: |
| 52 | + |
| 53 | +```bash |
| 54 | +/lvms:analyze --live check pending PVCs |
| 55 | +``` |
| 56 | + |
| 57 | +The command will: |
| 58 | +- Identify which PVCs are pending |
| 59 | +- Check volume group free space |
| 60 | +- Verify TopoLVM CSI driver is running |
| 61 | +- Check for node affinity issues |
| 62 | +- Provide specific remediation steps |
| 63 | + |
| 64 | +### 2. LVMCluster Not Ready |
| 65 | + |
| 66 | +When LVMCluster resource is not reaching Ready state: |
| 67 | + |
| 68 | +```bash |
| 69 | +/lvms:analyze --live analyze operator |
| 70 | +``` |
| 71 | + |
| 72 | +The command will: |
| 73 | +- Check LVMCluster status and conditions |
| 74 | +- Identify which nodes have volume group issues |
| 75 | +- Verify device availability and configuration |
| 76 | +- Check for conflicting filesystems on devices |
| 77 | +- Provide steps to clean devices and recreate VGs |
| 78 | + |
| 79 | +### 3. Volume Group Creation Failures |
| 80 | + |
| 81 | +When volume groups are not being created on nodes: |
| 82 | + |
| 83 | +```bash |
| 84 | +/lvms:analyze --live --component volumes |
| 85 | +``` |
| 86 | + |
| 87 | +The command will: |
| 88 | +- Show volume group status per node |
| 89 | +- Identify missing or failed volume groups |
| 90 | +- Check device selector configuration |
| 91 | +- Detect devices already in use |
| 92 | +- Provide commands to wipe devices and retry |
| 93 | + |
| 94 | +### 4. Must-Gather Analysis |
| 95 | + |
| 96 | +When analyzing a must-gather from a failed cluster: |
| 97 | + |
| 98 | +```bash |
| 99 | +/lvms:analyze ./must-gather/path/ |
| 100 | +``` |
| 101 | + |
| 102 | +The command will: |
| 103 | +- Perform offline analysis of all LVMS resources |
| 104 | +- Generate comprehensive health report |
| 105 | +- Identify critical issues and warnings |
| 106 | +- Provide prioritized remediation recommendations |
| 107 | +- Suggest which logs to review |
| 108 | + |
| 109 | +## Installation |
| 110 | + |
| 111 | +### From Marketplace |
| 112 | + |
| 113 | +```bash |
| 114 | +# Add the marketplace |
| 115 | +/plugin marketplace add openshift-eng/ai-helpers |
| 116 | + |
| 117 | +# Install LVMS plugin |
| 118 | +/plugin install lvms@ai-helpers |
| 119 | + |
| 120 | +# Use the command |
| 121 | +/lvms:analyze --live |
| 122 | +``` |
| 123 | + |
| 124 | +### Manual Installation |
| 125 | + |
| 126 | +```bash |
| 127 | +# Clone the repository |
| 128 | +git clone https://github.com/openshift-eng/ai-helpers.git |
| 129 | + |
| 130 | +# Link to Claude Code plugins directory |
| 131 | +ln -s $(pwd)/ai-helpers/plugins/lvms ~/.claude/plugins/lvms |
| 132 | +``` |
| 133 | + |
| 134 | +## Prerequisites |
| 135 | + |
| 136 | +**For Live Cluster Analysis:** |
| 137 | +- `oc` CLI installed and configured |
| 138 | +- Active cluster connection |
| 139 | +- Read access to `openshift-lvm-storage` or older `openshift-storage` namespace |
| 140 | +- Ability to read cluster-scoped resources |
| 141 | + |
| 142 | +**For Must-Gather Analysis:** |
| 143 | +- Python 3.6+ (for analysis script) |
| 144 | +- PyYAML library: `pip install pyyaml` |
| 145 | + |
| 146 | +## What the Plugin Checks |
| 147 | + |
| 148 | +### LVMCluster Resources |
| 149 | +- Overall state (Ready, Progressing, Failed, Degraded) |
| 150 | +- Status conditions (ResourcesAvailable, VolumeGroupsReady) |
| 151 | +- Device class configurations |
| 152 | +- Node coverage and readiness |
| 153 | + |
| 154 | +### Volume Groups |
| 155 | +- Volume group creation status per node |
| 156 | +- Physical volume availability |
| 157 | +- Free space and capacity |
| 158 | +- Thin pool configuration and usage |
| 159 | +- Missing or failed volume groups |
| 160 | + |
| 161 | +### Storage (PVCs/PVs) |
| 162 | +- PVC binding status |
| 163 | +- Pending volume provisioning failures |
| 164 | +- Storage class configuration |
| 165 | +- Capacity issues |
| 166 | +- Node affinity constraints |
| 167 | + |
| 168 | +### Operator Health |
| 169 | +- LVMS operator deployment status |
| 170 | +- TopoLVM controller readiness |
| 171 | +- TopoLVM node daemonset coverage |
| 172 | +- VG-manager daemonset status |
| 173 | +- Pod crashes and restarts |
| 174 | + |
| 175 | +### Node Devices |
| 176 | +- Block device availability |
| 177 | +- Existing filesystems on devices |
| 178 | +- Device selector matches |
| 179 | +- Disk capacity and usage |
| 180 | + |
| 181 | +### Pod Logs |
| 182 | +- Error and warning messages from vg-manager pods |
| 183 | +- Error and warning messages from lvms-operator pod |
| 184 | +- Deduplication of repeated errors from reconciliation loops |
| 185 | +- JSON log parsing with timestamps and context |
| 186 | + |
| 187 | +## Output Format |
| 188 | + |
| 189 | +The plugin provides structured, color-coded output: |
| 190 | + |
| 191 | +- ✓ Green checkmarks for healthy components |
| 192 | +- ⚠ Yellow warnings for non-critical issues |
| 193 | +- ❌ Red errors for critical problems |
| 194 | +- ℹ Blue info for additional context |
| 195 | + |
| 196 | +Reports include: |
| 197 | +- Component-by-component health status |
| 198 | +- Root cause analysis |
| 199 | +- Prioritized recommendations |
| 200 | +- Specific remediation commands |
| 201 | +- Links to relevant documentation |
| 202 | + |
| 203 | +## Troubleshooting the Plugin |
| 204 | + |
| 205 | +**Script not found:** |
| 206 | +```bash |
| 207 | +# Verify script exists |
| 208 | +ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py |
| 209 | + |
| 210 | +# Make executable |
| 211 | +chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py |
| 212 | +``` |
| 213 | + |
| 214 | +**Cannot connect to cluster:** |
| 215 | +```bash |
| 216 | +# Verify oc is configured |
| 217 | +oc whoami |
| 218 | +oc cluster-info |
| 219 | + |
| 220 | +# Check LVMS namespace |
| 221 | +oc get namespace openshift-lvm-storage |
| 222 | +``` |
| 223 | + |
| 224 | +**Must-gather path errors:** |
| 225 | +```bash |
| 226 | +# Use the correct subdirectory (the one with the hash) |
| 227 | +ls must-gather/registry-ci-*/namespaces/openshift-lvm-storage |
| 228 | + |
| 229 | +# Not the parent directory |
| 230 | +``` |
| 231 | + |
| 232 | +## Related Resources |
| 233 | + |
| 234 | +- [LVMS GitHub Repository](https://github.com/openshift/lvm-operator) |
| 235 | +- [LVMS Troubleshooting Guide](https://github.com/openshift/lvm-operator/blob/main/docs/troubleshooting.md) |
| 236 | +- [TopoLVM Documentation](https://github.com/topolvm/topolvm) |
| 237 | +- [OpenShift Storage Documentation](https://docs.openshift.com/container-platform/latest/storage/index.html) |
| 238 | + |
| 239 | +## Contributing |
| 240 | + |
| 241 | +Contributions are welcome! Please see the main repository's [CLAUDE.md](../../CLAUDE.md) for guidelines on: |
| 242 | +- Adding new commands |
| 243 | +- Extending analysis capabilities |
| 244 | +- Improving diagnostic checks |
| 245 | +- Adding helper scripts |
| 246 | + |
| 247 | +## Support |
| 248 | + |
| 249 | +For issues or feature requests: |
| 250 | +- GitHub Issues: https://github.com/openshift-eng/ai-helpers/issues |
| 251 | +- Repository: https://github.com/openshift-eng/ai-helpers |
0 commit comments