microshift-ci-doctor: add openshift-ci MCP and widen analysis budget#80509
microshift-ci-doctor: add openshift-ci MCP and widen analysis budget#80509pmtk wants to merge 4 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe doctor step is updated to integrate OpenShift CI MCP by adding the ChangesDoctor Step MCP Integration and Execution Enhancements
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested labels
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`:
- Around line 241-247: The unguarded claude mcp add command on line 241 will
cause the entire script to exit immediately when set -e is active if the command
fails, preventing the intended "warn and continue" behavior. Wrap the claude mcp
add command with error handling (such as appending || true or using an if
statement) so that if the command fails, the script continues to execute the
subsequent wait_for_mcp_status check and warning message instead of exiting.
- Around line 237-239: The curl command downloading the openshift-ci-mcp binary
(in the condition starting with "if curl -sL --retry 3") lacks explicit network
timeout settings. This can cause the download to hang indefinitely during
network stalls, exhausting the step budget. Add `--connect-timeout` and
`--max-time` flags to the curl invocation to enforce reasonable timeouts (for
example, 30 seconds for connection and 60 seconds for total operation time),
ensuring the command fails fast when network conditions are poor rather than
consuming excessive budget waiting for a response.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2b7e05a2-0f75-4740-97c6-55496a25946a
📒 Files selected for processing (2)
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.shci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-ref.yaml
| if curl -sL --retry 3 -o "${ocimcp_bin}" \ | ||
| "https://github.com/openshift-eng/openshift-ci-mcp/releases/download/${ocimcp_version}/openshift-ci-mcp-linux-amd64" && | ||
| echo "${ocimcp_sha256} ${ocimcp_bin}" | sha256sum --check --quiet; then |
There was a problem hiding this comment.
Add explicit network timeouts to the MCP binary download.
The new curl call retries, but without --connect-timeout/--max-time it can hang long enough to burn step budget during network stalls.
Suggested fix
- if curl -sL --retry 3 -o "${ocimcp_bin}" \
+ if curl -sL --retry 3 --retry-delay 2 --connect-timeout 10 --max-time 120 -o "${ocimcp_bin}" \
"https://github.com/openshift-eng/openshift-ci-mcp/releases/download/${ocimcp_version}/openshift-ci-mcp-linux-amd64" &&
echo "${ocimcp_sha256} ${ocimcp_bin}" | sha256sum --check --quiet; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if curl -sL --retry 3 -o "${ocimcp_bin}" \ | |
| "https://github.com/openshift-eng/openshift-ci-mcp/releases/download/${ocimcp_version}/openshift-ci-mcp-linux-amd64" && | |
| echo "${ocimcp_sha256} ${ocimcp_bin}" | sha256sum --check --quiet; then | |
| if curl -sL --retry 3 --retry-delay 2 --connect-timeout 10 --max-time 120 -o "${ocimcp_bin}" \ | |
| "https://github.com/openshift-eng/openshift-ci-mcp/releases/download/${ocimcp_version}/openshift-ci-mcp-linux-amd64" && | |
| echo "${ocimcp_sha256} ${ocimcp_bin}" | sha256sum --check --quiet; then |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`
around lines 237 - 239, The curl command downloading the openshift-ci-mcp binary
(in the condition starting with "if curl -sL --retry 3") lacks explicit network
timeout settings. This can cause the download to hang indefinitely during
network stalls, exhausting the step budget. Add `--connect-timeout` and
`--max-time` flags to the curl invocation to enforce reasonable timeouts (for
example, 30 seconds for connection and 60 seconds for total operation time),
ensuring the command fails fast when network conditions are poor rather than
consuming excessive budget waiting for a response.
| claude mcp add --scope user --transport stdio openshift-ci -- "${ocimcp_bin}" | ||
| echo "Waiting for OpenShift CI MCP to become available..." | ||
| if wait_for_mcp_status "openshift-ci" "Connected"; then | ||
| echo "OpenShift CI MCP is available." | ||
| else | ||
| echo "WARNING: OpenShift CI MCP did not connect. Job history will not be available." | ||
| fi |
There was a problem hiding this comment.
Non-fatal MCP behavior is currently broken by set -e on registration failure.
Line 241 runs claude mcp add ... unguarded. If it fails, set -e exits the step immediately, so the intended “warn and continue” path never executes.
Suggested fix
- claude mcp add --scope user --transport stdio openshift-ci -- "${ocimcp_bin}"
- echo "Waiting for OpenShift CI MCP to become available..."
- if wait_for_mcp_status "openshift-ci" "Connected"; then
- echo "OpenShift CI MCP is available."
- else
- echo "WARNING: OpenShift CI MCP did not connect. Job history will not be available."
- fi
+ if claude mcp add --scope user --transport stdio openshift-ci -- "${ocimcp_bin}"; then
+ echo "Waiting for OpenShift CI MCP to become available..."
+ if wait_for_mcp_status "openshift-ci" "Connected"; then
+ echo "OpenShift CI MCP is available."
+ else
+ echo "WARNING: OpenShift CI MCP did not connect. Job history will not be available."
+ fi
+ else
+ echo "WARNING: Failed to register OpenShift CI MCP. Job history will not be available."
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| claude mcp add --scope user --transport stdio openshift-ci -- "${ocimcp_bin}" | |
| echo "Waiting for OpenShift CI MCP to become available..." | |
| if wait_for_mcp_status "openshift-ci" "Connected"; then | |
| echo "OpenShift CI MCP is available." | |
| else | |
| echo "WARNING: OpenShift CI MCP did not connect. Job history will not be available." | |
| fi | |
| if claude mcp add --scope user --transport stdio openshift-ci -- "${ocimcp_bin}"; then | |
| echo "Waiting for OpenShift CI MCP to become available..." | |
| if wait_for_mcp_status "openshift-ci" "Connected"; then | |
| echo "OpenShift CI MCP is available." | |
| else | |
| echo "WARNING: OpenShift CI MCP did not connect. Job history will not be available." | |
| fi | |
| else | |
| echo "WARNING: Failed to register OpenShift CI MCP. Job history will not be available." | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`
around lines 241 - 247, The unguarded claude mcp add command on line 241 will
cause the entire script to exit immediately when set -e is active if the command
fails, preventing the intended "warn and continue" behavior. Wrap the claude mcp
add command with error handling (such as appending || true or using an if
statement) so that if the command fails, the script continues to execute the
subsequent wait_for_mcp_status check and warning message instead of exiting.
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`:
- Around line 272-278: The workspace setup in the script is cloning from a
personal fork (pmtk/edge-tooling.git) on a feature branch (ci-doctor-rca)
instead of using the pre-installed EDGE_TOOLING_DIR variable. Revert to the
original configuration by uncommenting the three lines that set SRC_DIR to use
EDGE_TOOLING_DIR and PLUGIN_DIR relative to it, then remove the hardcoded
SRC_DIR assignment to /tmp/edge-tooling and the entire git clone command that
references the personal fork and feature branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3f518a06-77e3-4526-81bf-9d0dc0161e17
📒 Files selected for processing (1)
ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh
| #SRC_DIR="${EDGE_TOOLING_DIR}" | ||
| #PLUGIN_DIR="${SRC_DIR}/plugins/microshift-ci" | ||
| #cd "${SRC_DIR}" | ||
|
|
||
| SRC_DIR=/tmp/edge-tooling | ||
| PLUGIN_DIR="${SRC_DIR}/plugins/microshift-ci" | ||
| cd "${SRC_DIR}" | ||
| git clone https://github.com/pmtk/edge-tooling.git -b ci-doctor-rca "${SRC_DIR}" |
There was a problem hiding this comment.
Hardcoded personal fork and feature branch must be reverted before merge.
The workspace setup clones from pmtk/edge-tooling.git branch ci-doctor-rca instead of using the pre-installed EDGE_TOOLING_DIR. If merged, production CI would depend on a personal fork and feature branch that may disappear or diverge.
Restore the original configuration or update to use the official repository:
Suggested fix (restore original)
-#SRC_DIR="${EDGE_TOOLING_DIR}"
-#PLUGIN_DIR="${SRC_DIR}/plugins/microshift-ci"
-#cd "${SRC_DIR}"
-
-SRC_DIR=/tmp/edge-tooling
+SRC_DIR="${EDGE_TOOLING_DIR}"
PLUGIN_DIR="${SRC_DIR}/plugins/microshift-ci"
-git clone https://github.com/pmtk/edge-tooling.git -b ci-doctor-rca "${SRC_DIR}"
+cd "${SRC_DIR}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/edge-tooling/microshift-ci/doctor/openshift-edge-tooling-microshift-ci-doctor-commands.sh`
around lines 272 - 278, The workspace setup in the script is cloning from a
personal fork (pmtk/edge-tooling.git) on a feature branch (ci-doctor-rca)
instead of using the pre-installed EDGE_TOOLING_DIR variable. Revert to the
original configuration by uncommenting the three lines that set SRC_DIR to use
EDGE_TOOLING_DIR and PLUGIN_DIR relative to it, then remove the hardcoded
SRC_DIR assignment to /tmp/edge-tooling and the entire git clone command that
references the personal fork and feature branch.
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
| "https://github.com/openshift-eng/openshift-ci-mcp/releases/download/${ocimcp_version}/openshift-ci-mcp-linux-amd64" && | ||
| echo "${ocimcp_sha256} ${ocimcp_bin}" | sha256sum --check --quiet; then | ||
| chmod +x "${ocimcp_bin}" | ||
| claude mcp add --scope user --transport stdio openshift-ci -- "${ocimcp_bin}" |
There was a problem hiding this comment.
I recommend overwriting the command so that you can pass flags limiting the available tools. It will help with how much context is taken up and limit the LLM to only the needed tools for pulling the required data
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
The microshift-ci plugin's prow-job skill now performs evidence-cited root cause analysis (sosreport extraction, source correlation, causal chains) and consults job history through the openshift-ci MCP server: - Download a pinned, checksum-verified openshift-ci-mcp release binary and register it as the "openshift-ci" stdio MCP (read-only Sippy/ Release-Controller/Search.CI access, no credentials). Failures are non-fatal: the skills record the absence in their analysis gaps. - Allow mcp__openshift-ci__* in the permission settings. - Doctor session budget: 45m/100 turns -> 60m/150 turns; step timeout 1h30m -> 2h15m. The deeper per-job analysis needs more wall clock than the old surface-level scan.
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-microshift-ci-doctor |
|
@pmtk: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@pmtk: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The microshift-ci plugin's prow-job skill now performs evidence-cited root cause analysis (sosreport extraction, source correlation, causal chains) and consults job history through the openshift-ci MCP server:
Summary by CodeRabbit
This PR updates the OpenShift edge-tooling microshift-ci “doctor” step in
ci-operator/step-registryto enable deeper, evidence-cited per-job root-cause analysis by adding job-history lookups via a new OpenShift CI MCP server.OpenShift CI MCP server integration (job history & regression context)
openshift-ci-mcp(v0.5.0) binary using a SHA-256 checksum, then register it in Claude as a stdio MCP server namedopenshift-ci.openshift-ci-mcpwith tool access:core,jobs,tests,prs,search.Connected; if download/verification/connection fails, it logs a warning and analysis proceeds with the missing capability treated as a gap (graceful degradation).mcp__openshift-ci__*.Permission configuration
settings.jsonpermissions to addmcp__openshift-ci__*alongside the existingSkill(microshift-ci:*)permission.Doctor runtime/budget + step timeout
check_claude_rctimeout-min value accordingly (60 minutes).1h30m0s→2h15m0s(openshift-edge-tooling-microshift-ci-doctor-ref.yaml).Edge-tooling source handling
git cloneto/tmp/edge-tooling(branchci-doctor-rca) and setsPLUGIN_DIRfrom that clone, instead of relying on a pre-installedEDGE_TOOLING_DIR.