-
Notifications
You must be signed in to change notification settings - Fork 56
Improve the reliability of the system bring-up scripts #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.19
Are you sure you want to change the base?
Conversation
WalkthroughSet KUBECONFIG inline in many systemd units, add node- and APIServices-readiness units and scripts, replace EnvironmentFile usage, introduce TAP detection/conditional-disable flow and AWS secret fetcher, rewrite common wait helper and many scripts for stricter shell behavior and argument-driven inputs, and remove growfs unit/script. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor systemd
participant APIServer as crc-wait-apiserver-up.service
participant NodeReady as crc-wait-node-ready.service
participant APISvcs as ocp-wait-apiservices-available.service
participant CRCUnits as crc-custom.target children
systemd->>APIServer: Start (Env KUBECONFIG=/opt/kubeconfig)
APIServer-->>systemd: apiserver ready
systemd->>NodeReady: Start (After=APIServer)
NodeReady-->>systemd: node ready
systemd->>APISvcs: Start (Wants/After=NodeReady)
APISvcs-->>systemd: apiservices available
systemd->>CRCUnits: Start downstream units
sequenceDiagram
autonumber
actor systemd
participant CheckTap as crc-check-tap.service
participant Cond as crc-conditionally-disable-tap.sh
participant Needs as crc-needs-tap.sh
participant Disable as crc-disable-tap.sh
participant GVNet as [email protected]
systemd->>CheckTap: ExecStart
CheckTap->>Cond: run wrapper
Cond->>Needs: check if TAP needed
alt TAP required
Needs-->>Cond: success → do nothing
else TAP not required
Needs-->>Cond: non-zero → exec disable script
Cond->>Disable: exec /usr/local/bin/crc-disable-tap.sh
Disable-->>systemd: removed NM conn & stopped GVNet
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
systemd/ocp-cluster-ca.sh (1)
79-82
: World-writable kubeconfig is a security bug.chmod 0666 exposes cluster credentials to all users. Lock this down and restore SELinux context if applicable.
-rm -rf /opt/kubeconfig -cp /opt/crc/kubeconfig /opt/kubeconfig -chmod 0666 /opt/kubeconfig +rm -f /opt/kubeconfig +cp /opt/crc/kubeconfig /opt/kubeconfig +chown root:root /opt/kubeconfig +chmod 0600 /opt/kubeconfig +command -v restorecon >/dev/null 2>&1 && restorecon -v /opt/kubeconfig
🧹 Nitpick comments (16)
systemd/crc-self-sufficient-env.sh (1)
10-18
: Optional: broaden accepted truthy values and unify wordingIf these flags may be set to "true"/"yes", consider case-insensitive match; also keep “self‑sufficient” hyphenation consistent.
-if [ "${CRC_SELF_SUFFICIENT:-}" = "1" ]; then - echo "Running with a self sufficient bundle" +shopt -s nocasematch +if [[ "${CRC_SELF_SUFFICIENT:-}" =~ ^(1|true|yes)$ ]]; then + echo "Running with a self-sufficient bundle" exit 0 fi -if [ "${CRC_CLOUD:-}" = "1" ]; then +if [[ "${CRC_CLOUD:-}" =~ ^(1|true|yes)$ ]]; then echo "Running in a cloud environment" exit 0 fi +shopt -u nocasematchAlso applies to: 20-22
systemd/ocp-userpasswords.sh (2)
18-19
: Avoid TTY allocation in non-interactive systemd context
-t
is unnecessary and may cause issues. Consider also pinning the image or using a local tool to avoid network pulls.- podman run --rm -ti docker.io/xmartlabs/htpasswd "$1" "$2" + podman run --rm -i docker.io/xmartlabs/htpasswd "$1" "$2"Optional: prefer
htpasswd -Bb
(bcrypt) explicitly if supported by the image.
12-19
: Optional: avoid Docker Hub dependency for reliabilityTo reduce external pull failures/rate limits, consider using a local
htpasswd
(httpd-tools) or a pinned/quay-hosted image. Also ensure bcrypt is used (-Bb
).Also applies to: 33-45
systemd/ocp-growfs.sh (2)
3-6
: Harden against multiple/absent XFS devices under strict modeWith
errexit/nounset
, ambiguousroot_partition
can break the script. Select the root device explicitly and make grow ops idempotent.-root_partition=$(/usr/sbin/blkid -t TYPE=xfs -o device) -/usr/bin/growpart "${root_partition%?}" "${root_partition#/dev/???}" +root_partition=$(/usr/sbin/findmnt -n -o SOURCE /sysroot || true) +if [ -z "${root_partition:-}" ]; then + echo "Failed to determine root device for /sysroot" >&2 + exit 1 +fi +/usr/bin/growpart "${root_partition%?}" "${root_partition#/dev/???}" || trueOptionally gate
xfs_growfs
withxfs_info /sysroot
checks if needed.Also applies to: 9-14
18-21
: Optional: add a final syncA quick
sync
after filesystem growth can help on some environments before subsequent units run.echo "All done" +sync exit 0
systemd/ocp-clusterid.sh (1)
13-19
: LGTM; consider skipping if clusterID already setCurrent flow looks good. Optionally no-op when clusterID exists to avoid unnecessary patch.
-wait_for_resource_or_die clusterversion - -oc patch clusterversion version -p '{"spec":{"clusterID":"'${uuid}'"}}' --type merge +wait_for_resource_or_die "clusterversion/version" +if ! oc get clusterversion version -o jsonpath='{.spec.clusterID}' | grep -q '.'; then + oc patch clusterversion version -p '{"spec":{"clusterID":"'${uuid}'"}}' --type merge +fisystemd/ocp-mco-sshkey.sh (1)
19-20
: Wait on the specific MachineConfig name for determinism.Use the helper to wait for the concrete resource rather than the type:
-wait_for_resource_or_die machineconfig +wait_for_resource_or_die machineconfig/99-master-ssh 20 5systemd/crc-wait-apiserver-up.sh (1)
11-14
: LGTM; consistent with the new wait helper.Optional: consider adding a small
--request-timeout
in wait_for_resource_or_die (systemd/crc-systemd-common.sh) to avoid long hangs if the API is half‑up.systemd/ocp-custom-domain.sh (1)
54-60
: Harden JSON patching for apiserver and registry route (quote-safe).Avoid inline JSON concatenation with unquoted $EIP. Use patch files or jq for robust quoting.
-oc patch apiserver cluster --type=merge \ - -p '{"spec":{"servingCerts": {"namedCertificates":[{"names":["api.'$EIP'.nip.io"],"servingCertificate": {"name": "nip-secret"}}]}}}' +cat > /tmp/apiserver-patch.json <<EOF +{"spec":{"servingCerts":{"namedCertificates":[ + {"names":["api.${EIP}.nip.io"],"servingCertificate":{"name":"nip-secret"}} +]}}} +EOF +oc patch apiserver cluster --type=merge --patch-file=/tmp/apiserver-patch.json-oc patch route default-route -n openshift-image-registry --type=merge \ - -p '{"spec": {"host": "default-route-openshift-image-registry.'$EIP'.nip.io"}}' +cat > /tmp/registry-route-patch.json <<EOF +{"spec":{"host":"default-route-openshift-image-registry.${EIP}.nip.io"}} +EOF +oc patch route default-route -n openshift-image-registry --type=merge --patch-file=/tmp/registry-route-patch.jsonsystemd/crc-pullsecret.sh (3)
14-17
: Remove duplicate xtrace disable and restore tracing afterwards.Keep one
set +x
before secret handling and restore to aid debugging.-set +x # disable the logging to avoid leaking the pull secrets - -set +x # disable the logging to avoid leaking the pull secrets +set +x # disable logging to avoid leaking the pull secrets +# ... secret handling ... +# restore tracing if desired +# set -x
19-25
: Tolerate missing/invalid existing pull-secret without aborting the script.With
set -o pipefail
, a base64 decode error aborts the script. Swallow decode errors and rely on the subsequent jq check.-existingPs=$(oc get secret pull-secret -n openshift-config \ - -o jsonpath="{['data']['\.dockerconfigjson']}" \ - | base64 -d) +existingPs=$(oc get secret pull-secret -n openshift-config \ + -o jsonpath="{['data']['\.dockerconfigjson']}" \ + | base64 -d || true)
33-36
: Build the patch JSON with jq to avoid quoting pitfalls.Inline JSON with embedded $pullSecretB64 is brittle. Use jq to construct the payload.
-pullSecretB64=$(cat /opt/crc/pull-secret | base64 -w0) -oc patch secret pull-secret -n openshift-config --type merge \ - -p '{"data":{".dockerconfigjson": "'${pullSecretB64}'"}}' +pullSecretB64=$(base64 -w0 < /opt/crc/pull-secret) +PATCH=$(jq -n --arg v "$pullSecretB64" '{data:{".dockerconfigjson":$v}}') +oc patch secret pull-secret -n openshift-config --type=merge -p "$PATCH"systemd/ocp-cluster-ca.sh (4)
13-14
: nounset + sourcing env can be brittle; guard it.Avoid failures if the env file references unset vars by temporarily disabling nounset.
-source /etc/sysconfig/crc-env || echo "WARNING: crc-env not found" +set +u +source /etc/sysconfig/crc-env || echo "WARNING: crc-env not found" +set -u
18-18
: Fragile readiness probe: ‘configmap’ is namespaced.This lists the default namespace and can spuriously fail. Probe a cluster-scoped resource instead (e.g., namespaces) to reduce flakiness.
-wait_for_resource_or_die configmap +wait_for_resource_or_die namespaces
61-70
: Clarify KUBECONFIG handoff and avoid rm -rf on files.You export KUBECONFIG twice; use a distinct var for the build artifact and avoid recursive rm on a regular file.
-export KUBECONFIG=/opt/crc/kubeconfig -rm -rf "${KUBECONFIG}" +NEW_KUBECONFIG=/opt/crc/kubeconfig +export KUBECONFIG="${NEW_KUBECONFIG}" +rm -f "${KUBECONFIG}" oc config set-credentials system:admin --client-certificate=${CLIENT_CA_FILE_PATH} --client-key=${CLIENT_CA_KEY_FILE_PATH} \ --embed-certs
71-71
: Stronger readiness check for new client CA acceptance.Listing clusteroperators confirms API availability, not that the new client cert is accepted. Consider additionally waiting until auth works with the new cert.
Example:
wait_for_resource_or_die clusteroperators 90 2 # Ensure the client cert is actually honored for i in {1..60}; do if oc whoami 2>/dev/null | grep -q '^system:admin$'; then break; fi sleep 2 done
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
snc.sh
(1 hunks)systemd/crc-cluster-status.service
(1 hunks)systemd/crc-cluster-status.sh
(1 hunks)systemd/crc-dnsmasq.service
(1 hunks)systemd/crc-needs-tap.sh
(1 hunks)systemd/crc-no-tap.service
(1 hunks)systemd/crc-no-tap.sh
(1 hunks)systemd/crc-pullsecret.service
(1 hunks)systemd/crc-pullsecret.sh
(1 hunks)systemd/crc-routes-controller.service
(1 hunks)systemd/crc-routes-controller.sh
(1 hunks)systemd/crc-self-sufficient-env.sh
(1 hunks)systemd/crc-systemd-common.sh
(1 hunks)systemd/crc-user-mode-networking.sh
(1 hunks)systemd/crc-wait-apiserver-up.service
(0 hunks)systemd/crc-wait-apiserver-up.sh
(1 hunks)systemd/dnsmasq.sh.template
(1 hunks)systemd/ocp-cluster-ca.service
(1 hunks)systemd/ocp-cluster-ca.sh
(3 hunks)systemd/ocp-clusterid.service
(1 hunks)systemd/ocp-clusterid.sh
(1 hunks)systemd/ocp-custom-domain.service
(1 hunks)systemd/ocp-custom-domain.sh
(3 hunks)systemd/ocp-growfs.service
(0 hunks)systemd/ocp-growfs.sh
(2 hunks)systemd/ocp-mco-sshkey.service
(1 hunks)systemd/ocp-mco-sshkey.sh
(1 hunks)systemd/ocp-userpasswords.service
(1 hunks)systemd/ocp-userpasswords.sh
(1 hunks)tools.sh
(1 hunks)
💤 Files with no reviewable changes (2)
- systemd/crc-wait-apiserver-up.service
- systemd/ocp-growfs.service
🧰 Additional context used
🧬 Code graph analysis (9)
systemd/crc-routes-controller.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-mco-sshkey.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-cluster-ca.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/crc-wait-apiserver-up.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/crc-pullsecret.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-userpasswords.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-clusterid.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
snc.sh (1)
tools.sh (1)
retry
(110-128)
systemd/ocp-custom-domain.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
🪛 Shellcheck (0.11.0)
tools.sh
[warning] 225-225: Declare and assign separately to avoid masking return values.
(SC2155)
systemd/crc-needs-tap.sh
[error] 26-26: Can only exit with status 0-255. Other data should be written to stdout/stderr.
(SC2242)
systemd/ocp-userpasswords.sh
[error] 14-14: Couldn't parse this redirection. Fix to allow more checks.
(SC1073)
[error] 14-14: Fix any mentioned problems and try again.
(SC1072)
🔇 Additional comments (9)
systemd/crc-dnsmasq.service (1)
12-12
: Hardening the ovs prerequisite looks good.Adding the
ExecStartPre
explicit start makes this unit fail fast ifovs-configuration.service
can’t come up, tightening the dependency chain without waiting for ExecStart to stumble over missing state. Nicely targeted reliability win.snc.sh (1)
284-290
: Restore failure handling for kubeadmin login.Inside the subshell we now disable
set -e
, so ifretry ${OC} login …
exhausts its attempts it returns non-zero but the code keeps going, deletes the password file, redacts logs, and the subshell exits 0 becausesed
succeeds. We silently proceed without a login and make later steps fail in confusing ways. Please bail out immediately when the login ultimately fails.- retry ${OC} login -u kubeadmin -p "$kubeadmin_pass" --insecure-skip-tls-verify=true api.${SNC_PRODUCT_NAME}.${BASE_DOMAIN}:6443 - rm -f ${INSTALL_DIR}/auth/kubeadmin-password + if ! retry ${OC} login -u kubeadmin -p "$kubeadmin_pass" --insecure-skip-tls-verify=true api.${SNC_PRODUCT_NAME}.${BASE_DOMAIN}:6443; then + echo "Failed to authenticate as kubeadmin after retries" >&2 + exit 1 + fi + rm -f ${INSTALL_DIR}/auth/kubeadmin-passwordLikely an incorrect or invalid review comment.
systemd/ocp-custom-domain.service (1)
11-15
: LGTM on restart backoff; check env dependencyReducing RestartSec to 10 for a oneshot is reasonable given StartLimit*. With EnvironmentFile removed, please confirm
ocp-custom-domain.sh
doesn’t rely on vars formerly provided there.systemd/crc-pullsecret.service (1)
11-11
: Quicker retry window looks goodDropping
RestartSec
to 10 s should help the oneshot retry faster on transient failures without affecting the existing start-limit settings.systemd/crc-systemd-common.sh (1)
3-15
: Helper rename aligns with behaviorRenaming to
wait_for_resource_or_die
matches the existing fatal-on-timeout semantics, so the intent is clearer and the implementation stays unchanged.systemd/crc-routes-controller.sh (1)
3-20
: Hardened controller bring-upStrict shell flags plus the shared wait helper make the flow more predictable, and the explicit
ROUTE_CONTROLLER
path keeps the apply logic tidy.systemd/crc-no-tap.service (1)
2-11
: Service gating reads wellThe new description and
ExecCondition=!/usr/local/bin/crc-needs-tap.sh
make the unit self-contained, and ordering before[email protected]
ensures we clean up ahead of that instance.systemd/crc-pullsecret.sh (1)
12-13
: LGTM on adopting wait_for_resource_or_die.systemd/crc-needs-tap.sh (1)
11-17
: No swap needed: ExecCondition already inverts exit codes
The!
prefix onExecCondition
correctly inverts the script’s exit status, soNEED_TAP=0
/DONT_NEED_TAP=1
operate as intended. No changes required incrc-needs-tap.sh
.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
systemd/ocp-cluster-ca.sh (1)
79-82
: Do not world‑writable kubeconfig containing embedded private key.0666 exposes creds; set 0600 (or 0640 with controlled group).
rm -rf /opt/kubeconfig cp /opt/crc/kubeconfig /opt/kubeconfig -chmod 0666 /opt/kubeconfig +chmod 0600 /opt/kubeconfig
🧹 Nitpick comments (15)
systemd/dnsmasq.sh.template (2)
12-12
: Guard empty hostIp and quote nmcli argument.If hostname --all-ip-addresses returns nothing, $hostIp is empty; the config and nmcli call become invalid. Add a check and quote the nmcli value.
hostIp=$(hostname --all-ip-addresses | awk '{print $1}') +[[ -n "$hostIp" ]] || { echo "ERROR: could not determine host IP"; exit 1; }
-/bin/nmcli conn modify --temporary ovs-if-br-ex ipv4.dns $hostIp,1.1.1.1 +/bin/nmcli conn modify --temporary ovs-if-br-ex ipv4.dns "$hostIp,1.1.1.1"
3-6
: errtrace without an ERR trap gives little value—consider adding a simple trap.This improves diagnosability under -e/-u/-o pipefail.
trap 'rc=$?; echo "ERROR: ${BASH_SOURCE[0]}:${BASH_LINENO[0]} command \"${BASH_COMMAND}\" failed with exit $rc" >&2' ERRsystemd/crc-wait-apiserver-up.sh (4)
3-6
: Good shell strictness; add an ERR trap to get actionable logs on failure.
errtrace
only helps if anERR
trap exists. Add a tiny trap so systemd journals capture the failing command/line.set -o pipefail set -o errexit set -o nounset set -o errtrace +trap 'echo "[crc-wait-apiserver-up] error at line ${LINENO}: ${BASH_COMMAND}" >&2' ERR
13-13
: Make retry cadence configurable via env for different environments.Hard‑coding
4,60
limits reuse; allow overrides while keeping current defaults.-wait_for_resource_or_die node 4 60 +wait_for_resource_or_die node "${CRC_WAIT_MAX_RETRY:-4}" "${CRC_WAIT_WAIT_SEC:-60}"
15-17
: Status messages LGTM. Consider “API server” casing if you care about wording consistency.
19-19
:exit 0
is redundant.Bash exits 0 after the last command succeeds; safe to drop for brevity.
-exit 0
systemd/crc-cluster-status.sh (1)
27-44
: Subshell guards secrets; nit: message typoThe
(set +x; set +e; oc login …)
pattern correctly prevents password trace. Consider s/successed/succeeded/ for logs.systemd/ocp-userpasswords.sh (2)
31-32
: Be explicit about the resource being waited on
wait_for_resource_or_die secret
is vague; pass a concrete, namespaced resource.-wait_for_resource_or_die secret +# Wait for the target namespace before creating the secret +wait_for_resource_or_die "ns/openshift-config"Please confirm the helper supports this form; alternatively extend it to accept a namespace flag.
36-37
: Minor: avoid useless cat and quote
Keep consistent quoting; avoid subshell for simple file read.-dev_pass=$(gen_htpasswd developer "$(cat /tmp/crc/pass_developer)") -adm_pass=$(gen_htpasswd kubeadmin "$(cat /tmp/crc/pass_kubeadmin)") +dev_pass="$(gen_htpasswd developer "$(< /tmp/crc/pass_developer)")" +adm_pass="$(gen_htpasswd kubeadmin "$(< /tmp/crc/pass_kubeadmin)")"systemd/ocp-growfs.sh (1)
18-20
: LGTM; exit 0 is fine (optional to drop).
errexit
already ensures non‑zero exits on failure; explicitexit 0
is harmless. You can keep the “All done” message for clarity.If desired, also confirm leaving
/sysroot
rw is intentional (Line 16 shows the ro remount commented out).systemd/crc-routes-controller.sh (2)
15-16
: Make waits robust (slow clusters) by extending retries.Defaults cap at ~100s; CRC bring‑up can exceed this. Pass larger retry/intervals.
-wait_for_resource_or_die pods -wait_for_resource_or_die deployments +wait_for_resource_or_die pods 180 5 +wait_for_resource_or_die deployments 180 5
9-9
: Assert manifest presence before apply.Fail early with a clearer error if the YAML is missing.
-ROUTE_CONTROLLER=/opt/crc/routes-controller.yaml +ROUTE_CONTROLLER=/opt/crc/routes-controller.yaml +[ -s "$ROUTE_CONTROLLER" ] || { echo "routes-controller manifest not found: $ROUTE_CONTROLLER"; exit 1; }Also applies to: 18-18
systemd/ocp-cluster-ca.sh (1)
61-70
: Reduce kubeconfig side effects; consider subshell/local var.Flipping KUBECONFIG mid-script is easy to misuse. Limit scope.
-export KUBECONFIG=/opt/crc/kubeconfig -rm -rf "${KUBECONFIG}" +( + export KUBECONFIG=/opt/crc/kubeconfig + rm -f "$KUBECONFIG" oc config set-credentials system:admin --client-certificate=${CLIENT_CA_FILE_PATH} --client-key=${CLIENT_CA_KEY_FILE_PATH} \ --embed-certs oc config set-context system:admin --cluster="${cluster_name}" --namespace=default --user=system:admin oc config set-cluster "${cluster_name}" --server="${apiserver_url}" --insecure-skip-tls-verify=true oc config use-context system:admin - -wait_for_resource_or_die clusteroperators 90 2 +) + +# Optionally wait for readiness, not mere existence: +# oc wait --for=condition=Available --all clusteroperator --timeout=15msystemd/crc-needs-tap.sh (2)
3-7
: Avoid unconditional xtrace in systemd units; gate behind DEBUG.set -x spams journals and can leak data. Make it opt‑in.
set -o pipefail set -o errexit set -o nounset set -o errtrace -set -x +: "${DEBUG:=0}" +if [[ "$DEBUG" == "1" ]]; then + set -x +fi
11-12
: Mark exit codes as readonly to prevent accidental mutation.Small hardening.
-NEED_TAP=0 -DONT_NEED_TAP=1 +readonly NEED_TAP=0 +readonly DONT_NEED_TAP=1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
systemd/crc-cluster-status.service
(1 hunks)systemd/crc-cluster-status.sh
(1 hunks)systemd/crc-dnsmasq.service
(1 hunks)systemd/crc-needs-tap.sh
(1 hunks)systemd/crc-no-tap.service
(1 hunks)systemd/crc-no-tap.sh
(1 hunks)systemd/crc-pullsecret.service
(1 hunks)systemd/crc-pullsecret.sh
(1 hunks)systemd/crc-routes-controller.service
(1 hunks)systemd/crc-routes-controller.sh
(1 hunks)systemd/crc-self-sufficient-env.sh
(1 hunks)systemd/crc-systemd-common.sh
(1 hunks)systemd/crc-user-mode-networking.sh
(1 hunks)systemd/crc-wait-apiserver-up.service
(0 hunks)systemd/crc-wait-apiserver-up.sh
(1 hunks)systemd/dnsmasq.sh.template
(1 hunks)systemd/ocp-cluster-ca.service
(1 hunks)systemd/ocp-cluster-ca.sh
(3 hunks)systemd/ocp-clusterid.service
(1 hunks)systemd/ocp-clusterid.sh
(1 hunks)systemd/ocp-custom-domain.service
(1 hunks)systemd/ocp-custom-domain.sh
(3 hunks)systemd/ocp-growfs.service
(0 hunks)systemd/ocp-growfs.sh
(2 hunks)systemd/ocp-mco-sshkey.service
(1 hunks)systemd/ocp-mco-sshkey.sh
(1 hunks)systemd/ocp-userpasswords.service
(1 hunks)systemd/ocp-userpasswords.sh
(1 hunks)tools.sh
(1 hunks)
💤 Files with no reviewable changes (2)
- systemd/crc-wait-apiserver-up.service
- systemd/ocp-growfs.service
🚧 Files skipped from review as they are similar to previous changes (11)
- systemd/crc-self-sufficient-env.sh
- systemd/ocp-custom-domain.sh
- systemd/crc-pullsecret.service
- systemd/crc-user-mode-networking.sh
- systemd/crc-dnsmasq.service
- systemd/crc-routes-controller.service
- systemd/crc-pullsecret.sh
- systemd/ocp-userpasswords.service
- systemd/ocp-clusterid.sh
- systemd/crc-no-tap.service
- systemd/crc-systemd-common.sh
🧰 Additional context used
🧬 Code graph analysis (5)
systemd/crc-wait-apiserver-up.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/crc-routes-controller.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-userpasswords.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-mco-sshkey.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
systemd/ocp-cluster-ca.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-15)
🪛 Shellcheck (0.11.0)
systemd/ocp-userpasswords.sh
[error] 14-14: Couldn't parse this redirection. Fix to allow more checks.
(SC1073)
[error] 14-14: Fix any mentioned problems and try again.
(SC1072)
systemd/crc-needs-tap.sh
[error] 26-26: Can only exit with status 0-255. Other data should be written to stdout/stderr.
(SC2242)
tools.sh
[warning] 225-225: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (18)
systemd/dnsmasq.sh.template (1)
3-6
: Unbound APPS_DOMAIN under strict mode makes script fail when crc-env is absent.With set -o nounset, using $APPS_DOMAIN later (line 20) exits the script if /etc/sysconfig/crc-env is missing or does not define it. Make the env file required or validate/prime APPS_DOMAIN before use.
Apply this diff to guard sourcing and assert the variable is set:
-source /etc/sysconfig/crc-env || echo "WARNING: crc-env not found" +APPS_DOMAIN="${APPS_DOMAIN-}" +if [[ -r /etc/sysconfig/crc-env ]]; then + # shellcheck disable=SC1091 + source /etc/sysconfig/crc-env +else + echo "WARNING: /etc/sysconfig/crc-env not found; relying on environment" +fi +: "${APPS_DOMAIN:?APPS_DOMAIN must be set (export it or provide it via /etc/sysconfig/crc-env)}"And fail-fast at the point of use in the config:
-address=/${APPS_DOMAIN}/$hostIp +address=/${APPS_DOMAIN:?APPS_DOMAIN must be set}/$hostIpAlso applies to: 9-9, 20-20
systemd/crc-wait-apiserver-up.sh (1)
11-11
: Log message LGTM.systemd/crc-cluster-status.sh (1)
3-6
: Good: strict mode enabled earlypipefail/errexit/nounset/errtrace before logic is a solid hardening step.
systemd/ocp-userpasswords.sh (4)
43-45
: Make sed portable and update idempotently
Use POSIX space class; preferoc apply
overreplace
for create/update.- --from-file=htpasswd=<((echo "$dev_pass"; echo "$adm_pass") | sed '/^\s*$/d') \ + --from-file=htpasswd=<((echo "$dev_pass"; echo "$adm_pass") | sed '/^[[:space:]]*$/d') \ -n openshift-config \ - --dry-run=client -oyaml | oc replace -f- + --dry-run=client -oyaml | oc apply -f-
13-16
: Fix argument validation and stderr redirection
Condition is inverted and redirection is malformed; this can fail at runtime.- if [ -z "${1:-}" ] && [ ! -z "${2:-}" ]; then - echo "gen_htpasswd needs to arguments: username password" 1>& + if [ -z "${1:-}" ] || [ -z "${2:-}" ]; then + echo "gen_htpasswd needs two arguments: username password" >&2 return 1 fiBased on static analysis hints.
26-29
: Correct the error message (kubeadmin, not developer)
Message mismatches the file being checked.- echo "developer password does not exist" + echo "kubeadmin password does not exist"
33-35
: Do not enable tracing while handling secrets
This turns tracing on instead of off and risks leaking hashes/content later in the pipeline.-echo "generating the kubeadmin and developer passwords ..." -set -x # disable the logging to avoid leaking the passwords +echo "generating the kubeadmin and developer passwords ..." +set +x # disable tracing to avoid leaking the passwordsOptionally re‑enable with
set -x
after secret is created.systemd/crc-cluster-status.service (1)
13-13
: Revert RestartSec to preserve retry window (or retune StartLimit).With StartLimitIntervalSec=450 and StartLimitBurst=10, RestartSec=10 will exhaust retries in ~100s and hit start‑limit lockout; previously at 40s it allowed ~400s. Restore 40s or increase StartLimit* accordingly to keep resiliency.
-RestartSec=10 +RestartSec=40systemd/ocp-mco-sshkey.service (1)
10-10
: Keep restart pacing aligned with StartLimit.10s cadence trips the burst in ~100s. Revert to 40s (or proportionally raise StartLimitIntervalSec/Burst).
-RestartSec=10 +RestartSec=40systemd/ocp-clusterid.service (1)
10-10
: Restart pacing regression—restore 40s (or widen StartLimit).10s restarts consume the burst in ~100s vs. ~400s before. Revert or retune StartLimit*.
-RestartSec=10 +RestartSec=40systemd/ocp-cluster-ca.sh (3)
22-25
: Idempotency: drop brittle early-exit; use apply for create-or-update.If CM exists but apiserver isn’t patched yet, the early exit skips patch forever. Use apply and guard on actual apiserver state.
-if oc get configmap client-ca-custom -n openshift-config; then - echo "API Server Client CA already rotated..." - exit 0 -fi +current_client_ca="$(oc get apiserver cluster -o jsonpath='{.spec.clientCA.name}' || true)" +if [[ "${current_client_ca}" == "client-ca-custom" ]]; then + echo "API Server Client CA already rotated..." + exit 0 +fi @@ -oc create configmap client-ca-custom \ - -n openshift-config \ - --from-file=ca-bundle.crt=${CA_FILE_PATH} +oc create configmap client-ca-custom \ + -n openshift-config \ + --from-file=ca-bundle.crt=${CA_FILE_PATH} \ + --dry-run=client -o yaml | oc apply -f - @@ -oc patch apiserver cluster --type=merge \ +oc patch apiserver cluster --type=merge \ -p '{"spec": {"clientCA": {"name": "client-ca-custom"}}}'Also applies to: 47-49, 51-52
7-11
: Add restrictive umask before creating keys/kubeconfig.Prevents group/world-readable secrets.
set -o nounset set -o errtrace +umask 077 set -x
71-76
: First-run failure: replace errors if the CM is absent; use apply.Keep pipeline idempotent.
oc create configmap admin-kubeconfig-client-ca \ -n openshift-config \ --from-file=ca-bundle.crt=${CA_FILE_PATH} \ - --dry-run=client -o yaml | oc replace -f- + --dry-run=client -o yaml | oc apply -f-systemd/crc-no-tap.sh (1)
11-12
: Guard disable when the instance isn’t installed (avoid errexit abort).With errexit, this fails when [email protected] is absent. Gate the call or swallow the “not found” exit.
echo "Disabling the tap0 network configuration ..." rm -f /etc/NetworkManager/system-connections/tap0.nmconnection -systemctl disable --now [email protected] +if systemctl list-unit-files '[email protected]' >/dev/null 2>&1; then + systemctl disable --now [email protected] || true +fisystemd/ocp-mco-sshkey.sh (1)
22-26
: Build JSON safely with jq; fix oc patch type flag.Current string concat breaks with spaces or special chars and is fragile. Also use --type=merge.
-echo "Updating the public key resource for machine config operator" -pub_key=$(cat "${PUB_KEY_PATH}" | tr -d '\n\r') - -MC_PATCH='{"spec": {"config": {"passwd": {"users": [{"name": "core", "sshAuthorizedKeys": ["'${pub_key}'"]}]}}}}' - -if ! oc patch machineconfig 99-master-ssh -p "$MC_PATCH" --type merge; +echo "Updating the public key resource for machine config operator" +pub_key=$(tr -d '\n\r' < "${PUB_KEY_PATH}") +MC_PATCH="$(jq -n --arg key "$pub_key" \ + '{spec:{config:{passwd:{users:[{name:"core",sshAuthorizedKeys:[$key]}]}}}}')" + +if ! oc patch machineconfig 99-master-ssh --type=merge -p "$MC_PATCH"; thensystemd/ocp-cluster-ca.service (1)
11-11
: Avoid exhausting StartLimit too quickly; space retries out or raise limits.At 10s RestartSec with StartLimitBurst=10, the unit locks out after ~100s. Restore 40s (or adjust StartLimit*).
Restart=on-failure -RestartSec=10 +RestartSec=40Alternative: increase StartLimitBurst and/or StartLimitIntervalSec if you prefer faster retries.
systemd/crc-needs-tap.sh (2)
14-17
: Apple virtualization handling is inverted — should not keep tap0.Per prior feedback, Apple virtualization must not use tap0. Flip the exit code here to signal “don’t keep tap”. The message should match.
-if systemd-detect-virt | grep -q '^apple$' ; then - echo "Running with Apple virtualization. Need tap0." - exit $NEED_TAP +if systemd-detect-virt | grep -q '^apple$' ; then + echo "Running with Apple virtualization. Don't keep tap0" + exit "$DONT_NEED_TAP" fiBased on past review comments.
26-26
: Bug: exiting with literal name instead of numeric code.Expand the variable; otherwise it’s treated as a word and returns 255.
-exit DONT_NEED_TAP +exit "$DONT_NEED_TAP"
c92b3c2
to
9bafac1
Compare
this version works 🎉 9bafac1 |
bc4f6d3
to
b2b148e
Compare
Marking this PR as Ready for review, it's discuss how to merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
systemd/ocp-cluster-ca.sh (1)
90-94
: World-readable kubeconfig (0666) is a credential leak; lock it down.Kubeconfig embeds client cert/key; 0666 exposes cluster admin creds. Use 0600.
-# copy the new kubeconfig to /opt/kubeconfig -rm -rf /opt/kubeconfig -cp /opt/crc/kubeconfig /opt/kubeconfig -chmod 0666 /opt/kubeconfig +# copy the new kubeconfig to /opt/kubeconfig with strict perms +rm -f /opt/kubeconfig +cp /opt/crc/kubeconfig /opt/kubeconfig +chmod 0600 /opt/kubeconfig
♻️ Duplicate comments (5)
tools.sh (1)
222-228
: Harden secret handling: add umask 077, fix SC2155, and pass--
to htpasswdRestrict file perms, avoid declare+assign subshell masking (SC2155), and guard option parsing.
- ( - set +x # use a subshell to avoid leaking the password - - local random_password=$(cat "$auth_file_dir/auth/kubeadmin-password") - "${HTPASSWD}" -c -B -i "${pass_file}" developer <<< "developer" # use -c to create the file - "${HTPASSWD}" -B -i "${pass_file}" kubeadmin <<< "${random_password}" # append to the existing password file - ) + ( + set +x # avoid leaking the password + umask 077 + local random_password + random_password="$(<"${auth_file_dir}/auth/kubeadmin-password")" + "${HTPASSWD}" -c -B -i -- "${pass_file}" developer <<<"developer" + "${HTPASSWD}" -B -i -- "${pass_file}" kubeadmin <<<"${random_password}" + )systemd/crc-needs-tap.sh (1)
38-41
: Critical: Apple virtualization should exit with DONT_NEED_TAP, not NEED_TAP.According to the past review discussion, Apple virtualization (vfkit) does not require tap0. The current regex matches
apple
and exits withNEED_TAP
, which is incorrect.Based on the discussion and past review recommendations, the logic should be:
- libvirt (kvm/qemu): NEED_TAP
- Hyper-V (microsoft/hyperv): NEED_TAP
- Apple virtualization: DONT_NEED_TAP
Apply this diff to fix the virtualization detection:
-if systemd-detect-virt | grep -E -q '^(kvm|apple|microsoft)$' ; then - echo "Running with '$(systemd-detect-virt)' virtualization. Need tap0." +# Detect virtualization type safely under set -e +virt="$(systemd-detect-virt || true)" +case "${virt}" in + kvm|qemu|microsoft|hyperv) + echo "Running with '${virt}' virtualization. Need tap0." exit $NEED_TAP -fi + ;; + apple) + echo "Running with Apple virtualization. Don't need tap0." + exit $DONT_NEED_TAP + ;; +esacsystemd/ocp-cluster-ca.sh (2)
7-11
: Set restrictive umask before generating keys/kubeconfigs.Prevents group/world read on newly created files.
set -o errtrace set -x +umask 077
22-27
: Drop early exit or guard by apiserver state; make flow idempotent.Exiting when CM exists can skip the apiserver patch forever after a partial run. Use apply (already used) and only exit if apiserver.spec.clientCA.name is set.
-if oc get configmap client-ca-custom -n openshift-config 2>/dev/null; then - echo "API Server Client CA already rotated..." - exit 0 -fi -echo "API Server Client CA not rotated. Doing it now ..." +current_client_ca="$(oc get apiserver cluster -o jsonpath='{.spec.clientCA.name}' 2>/dev/null || true)" +if [[ "${current_client_ca}" == "client-ca-custom" ]]; then + echo "API Server Client CA already rotated..." + exit 0 +fi +echo "API Server Client CA not rotated. Doing it now ..." @@ -oc create configmap client-ca-custom \ +oc create configmap client-ca-custom \ -n openshift-config \ --from-file=ca-bundle.crt=${CA_FILE_PATH} \ --dry-run=client -o yaml \ | oc apply -f - @@ -jq -n ' +jq -n ' { "spec": { "clientCA": { "name": "client-ca-custom" } } }' | oc patch apiserver cluster --type=merge --patch-file=/dev/stdinAlso applies to: 48-53, 55-63
systemd/ocp-userpasswords.sh (1)
12-19
: Fix error message typo.“needs to arguments” → “needs two arguments”.
- echo "gen_htpasswd needs to arguments: username password" >&2 + echo "gen_htpasswd needs two arguments: username password" >&2
🧹 Nitpick comments (11)
systemd/crc-disable-tap.sh (2)
7-7
: Consider removingset -x
for production.
set -x
traces every command to stderr, which can clutter logs in production environments. Consider removing it or gating it behind a debug flag (e.g.,${DEBUG:-false} && set -x
).
14-14
: Remove redundantexit 0
.With
set -o errexit
, the script will exit 0 automatically if it reaches the end without error. The explicitexit 0
is redundant.-exit 0
systemd/crc-self-sufficient-env.sh (1)
10-17
: Document CRC_CLOUD removal if obsolete.The guard logic now only checks
CRC_SELF_SUFFICIENT
and no longer checksCRC_CLOUD
. IfCRC_CLOUD
is obsolete and being replaced byCRC_SELF_SUFFICIENT
, consider adding a comment documenting this transition for future maintainers.Based on past review comments.
systemd/ocp-clusterid.sh (1)
20-20
: Remove redundant explicit exit.With
set -o errexit
enabled, the script will automatically exit with status 0 upon successful completion. The explicitexit 0
is redundant.echo "All done" - -exit 0systemd/crc-needs-tap.sh (1)
11-28
: Consider adding error handling for sudo failure.The
sudo cat /sys/class/dmi/id/product_name
call on line 21 could fail if sudo isn't available or permissions are denied. Consider wrapping it in a conditional or using|| true
to prevent script termination underset -e
.- elif [[ $(sudo cat /sys/class/dmi/id/product_name) == *"Amazon EC2"* ]]; then + elif [[ $(sudo cat /sys/class/dmi/id/product_name 2>/dev/null || echo "") == *"Amazon EC2"* ]]; thensystemd/crc-systemd-common.sh (1)
33-33
: Redundant return statement.Line 33's
return 0
is unreachable because line 19 already returns on success, and line 25 exits on failure. Consider removing it for clarity.done - return 0 }
systemd/crc-user-mode-networking.sh (1)
13-30
: Harden mode parsing; avoid arithmetic errors and clarify ExecCondition contract.Current integer tests can emit “integer expression expected” if CRC_NETWORK_MODE_USER is non‑numeric. A case switch is simpler and safer; also send the error to stderr.
-USER_MODE=0 -SYSTEM_MODE=1 +USER_MODE=0 +SYSTEM_MODE=1 @@ -if [[ -z "${CRC_NETWORK_MODE_USER:-}" ]]; then - echo "CRC_NETWORK_MODE_USER not set. Assuming user networking." - exit $USER_MODE -fi - -if [[ "$CRC_NETWORK_MODE_USER" -eq 0 ]]; then - echo "network-mode 'system' detected" - exit $SYSTEM_MODE -fi - -if [[ "$CRC_NETWORK_MODE_USER" -eq 1 ]]; then - echo "network-mode 'user' detected" - exit $USER_MODE -fi - -echo "ERROR: unknown network mode: CRC_NETWORK_MODE_USER=$CRC_NETWORK_MODE_USER (expected 0 or 1). Assuming user networking" - -exit $USER_MODE +case "${CRC_NETWORK_MODE_USER:-1}" in + 0) + echo "network-mode 'system' detected" + exit ${SYSTEM_MODE} + ;; + 1) + echo "network-mode 'user' detected" + exit ${USER_MODE} + ;; + *) + echo "ERROR: unknown network mode: CRC_NETWORK_MODE_USER=${CRC_NETWORK_MODE_USER} (expected 0 or 1). Assuming user networking" >&2 + exit ${USER_MODE} + ;; +esacPlease confirm how units consume this (e.g., ExecCondition should succeed only in user‑mode).
systemd/crc-pullsecret.sh (1)
17-26
: Disable xtrace while touching decoded/secret material (defense‑in‑depth).Trace won’t print data, but safest to mute it around secret inspection and patch.
-# check if the .auths field is there -if oc get secret pull-secret \ +set +x +# check if the .auths field is there +if oc get secret pull-secret \ @@ -# Create the JSON patch in memory and pipe it to the oc command +# Create the JSON patch in memory and pipe it to the oc command base64 -w0 < /opt/crc/pull-secret | \ jq -R '{"data": {".dockerconfigjson": .}}' | \ oc patch secret pull-secret \ -n openshift-config \ --type merge \ --patch-file=/dev/stdin +set -xAlso applies to: 35-41
systemd/ocp-custom-domain.sh (1)
3-7
: Protect key material and clean up temp files.Add restrictive umask and trap to remove /tmp/nip.* keys after use.
set -o errtrace set -x +umask 077 +trap 'rm -f /tmp/nip.key /tmp/nip.crt' EXIT @@ openssl req -newkey rsa:2048 -new \ -nodes -x509 -days 3650 \ -keyout /tmp/nip.key -out /tmp/nip.crt \Also applies to: 22-26, 28-32
systemd/ocp-cluster-ca.sh (1)
38-46
: Clean up sensitive temp files.Add a trap to remove generated CA/key/CSR files.
+# cleanup sensitive files on exit +trap 'rm -f "${CA_FILE_PATH}" "${CA_KEY_FILE_PATH}" "${CLIENT_CA_FILE_PATH}" \ + "${CLIENT_CA_KEY_FILE_PATH}" "${CLIENT_CSR_FILE_PATH}"' EXIT @@ export KUBECONFIG=/opt/crc/kubeconfig rm -rf "${KUBECONFIG}"Also applies to: 71-78
systemd/ocp-growfs.service (1)
4-4
: Ensure producer–consumer ordering.After=crc-env-file-exists.path
only waits for the file to appear, not to be fully populated. If another unit or script creates/etc/sysconfig/crc-env
, you must also order on that producer (e.g.After=<writer-service>.service
) or have it write atomically (e.g. via a rename) to prevent this service from reading a partial file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
createdisk.sh
(2 hunks)systemd/crc-check-tap.service
(1 hunks)systemd/crc-cluster-status.service
(1 hunks)systemd/crc-cluster-status.sh
(1 hunks)systemd/crc-disable-tap.sh
(1 hunks)systemd/crc-dnsmasq.service
(1 hunks)systemd/crc-env-file-exists.path
(1 hunks)systemd/crc-needs-tap.sh
(1 hunks)systemd/crc-no-tap.service
(0 hunks)systemd/crc-no-tap.sh
(0 hunks)systemd/crc-pullsecret.service
(1 hunks)systemd/crc-pullsecret.sh
(1 hunks)systemd/crc-routes-controller.service
(2 hunks)systemd/crc-routes-controller.sh
(1 hunks)systemd/crc-self-sufficient-env.sh
(1 hunks)systemd/crc-systemd-common.sh
(1 hunks)systemd/crc-user-mode-networking.sh
(1 hunks)systemd/crc-wait-apiserver-up.service
(1 hunks)systemd/crc-wait-apiserver-up.sh
(1 hunks)systemd/dnsmasq.sh.template
(1 hunks)systemd/ocp-cluster-ca.service
(0 hunks)systemd/ocp-cluster-ca.sh
(3 hunks)systemd/ocp-clusterid.service
(1 hunks)systemd/ocp-clusterid.sh
(1 hunks)systemd/ocp-custom-domain.service
(1 hunks)systemd/ocp-custom-domain.sh
(2 hunks)systemd/ocp-growfs.service
(1 hunks)systemd/ocp-growfs.sh
(2 hunks)systemd/ocp-mco-sshkey.service
(1 hunks)systemd/ocp-mco-sshkey.sh
(1 hunks)systemd/ocp-userpasswords.service
(1 hunks)systemd/ocp-userpasswords.sh
(1 hunks)tools.sh
(1 hunks)
💤 Files with no reviewable changes (3)
- systemd/crc-no-tap.service
- systemd/ocp-cluster-ca.service
- systemd/crc-no-tap.sh
🚧 Files skipped from review as they are similar to previous changes (9)
- systemd/crc-wait-apiserver-up.service
- systemd/crc-cluster-status.service
- systemd/crc-cluster-status.sh
- systemd/dnsmasq.sh.template
- systemd/ocp-growfs.sh
- systemd/ocp-mco-sshkey.service
- systemd/crc-wait-apiserver-up.sh
- systemd/crc-routes-controller.sh
- systemd/crc-dnsmasq.service
🧰 Additional context used
🧬 Code graph analysis (6)
systemd/ocp-mco-sshkey.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-34)
systemd/ocp-cluster-ca.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-34)
systemd/ocp-clusterid.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-34)
systemd/crc-pullsecret.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-34)
systemd/ocp-userpasswords.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-34)
systemd/ocp-custom-domain.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-34)
🪛 Shellcheck (0.11.0)
tools.sh
[warning] 225-225: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (34)
systemd/crc-disable-tap.sh (4)
3-7
: LGTM: Strict shell options improve reliability.The combination of
pipefail
,errexit
,nounset
, anderrtrace
aligns with the PR's goal of improving script reliability.
12-12
: Error suppression is acceptable but less precise than unit existence check.The
|| true
approach addresses the past review concern abouterrexit
aborting when the unit is missing. However, it suppresses all errors, not just "unit does not exist." For more precise error handling, consider checking unit existence first as suggested in the past review.Based on past review comments.
3-7
: LGTM! Strict bash safety options are appropriate.The comprehensive error handling (
pipefail
,errexit
,nounset
,errtrace
) combined with execution tracing (set -x
) ensures script failures are caught early and debuggable.
11-12
: LGTM! Cleanup is properly idempotent.The
rm -f
silently handles missing files, and|| true
on the systemctl command ensures the script succeeds even when the unit doesn't exist—addressing the concern from the past review without added complexity.systemd/crc-env-file-exists.path (2)
1-8
: LGTM: Clean path unit for environment file gating.This path unit provides a clean dependency trigger for services that require
/etc/sysconfig/crc-env
to exist before starting. The implementation is straightforward and aligns with the PR's goal of improving startup reliability by removing directEnvironmentFile
references from multiple services.
1-9
: LGTM! Clean path unit for startup coordination.The path unit correctly gates dependent services on the existence of
/etc/sysconfig/crc-env
, providing a cleaner dependency mechanism than scatteredEnvironmentFile
directives.systemd/crc-self-sufficient-env.sh (7)
3-6
: LGTM: Strict shell options improve reliability.The combination of
pipefail
,errexit
,nounset
, anderrtrace
aligns with the PR's goal of improving script reliability.
8-8
: Good: Graceful fallback when environment file is missing.Using
|| echo "WARNING: crc-env not found"
allows the script to continue even when the environment file is absent, which is appropriate given the subsequent guard logic.
10-13
: LGTM: Safe parameter expansion prevents unset variable errors.Using
${CRC_SELF_SUFFICIENT:-}
with the:-
operator safely handles the case where the variable is unset, preventingnounset
from aborting the script.
15-15
: Typo fixed.The typo "environement" → "environment" has been corrected.
Based on past review comments.
3-6
: LGTM! Consistent strict bash safety.The error handling options match the pattern established across other scripts in this PR.
15-15
: Typo already fixed.The past review comment noted "environement" → "environment"; this has been corrected.
8-8
: Ignore source fallback warning; fallback is effective underset -o errexit
. Bash does not exit on a failing command that’s part of an||
chain, sosource /etc/sysconfig/crc-env || echo…
will catch any non-zero exit (missing file or errors inside) and run the warning. No change needed.Likely an incorrect or invalid review comment.
systemd/ocp-growfs.service (1)
4-4
: LGTM: Consistent dependency pattern.Adding
After=crc-env-file-exists.path
aligns with the PR's broader pattern of gating service startup on environment file presence. Sincecrc-self-sufficient-env.sh
(called in ExecCondition) sources/etc/sysconfig/crc-env
directly, removing the externalEnvironmentFile
reference is safe.systemd/ocp-custom-domain.service (4)
3-3
: LGTM: Consistent dependency pattern.Adding
After=crc-env-file-exists.path
aligns with the PR's broader pattern of gating service startup on environment file presence. Sincecrc-self-sufficient-env.sh
(called in ExecCondition) sources/etc/sysconfig/crc-env
directly, removing the externalEnvironmentFile
reference is safe.
12-12
: RestartSec value is appropriate.The
RestartSec=40
value provides reasonable spacing between restart attempts and avoids the premature start-limit lockout concern raised in past reviews about a 10s value. Combined withStartLimitIntervalSec=450
andStartLimitBurst=10
, the unit has adequate protection against rapid retry exhaustion.Based on past review comments.
3-3
: LGTM! Consistent dependency pattern.Adding
After=crc-env-file-exists.path
aligns this service with the new environment file gating strategy used across other units in this PR.
12-12
: RestartSec already at recommended value.The past review comment raised concerns about a 10s restart interval; the current code shows 40s, which is appropriate given the start limits (10 retries over 450s).
systemd/ocp-clusterid.sh (3)
3-6
: LGTM! Strict mode improves safety.Enabling
pipefail
,errexit
,nounset
, anderrtrace
ensures the script fails fast on errors, unset variables, and pipeline failures, which aligns with the PR's goal to improve reliability.
11-11
: LGTM! Fail-fast behavior improves reliability.Replacing
wait_for_resource
withwait_for_resource_or_die
ensures the script terminates immediately if the clusterversion resource is not available within the retry limit, rather than silently continuing.
15-16
: LGTM! Patch generation via jq is more robust.Generating the patch JSON with jq and piping to
--patch-file=/dev/stdin
is cleaner and safer than inline JSON strings, reducing the risk of quoting or escaping errors.systemd/crc-routes-controller.service (2)
3-3
: Consistent use of crc-env-file-exists.path.This aligns with the pattern in
systemd/ocp-clusterid.service
. Note the inconsistency withsystemd/crc-pullsecret.service
which usescloud-init-crc-env.path
(see comment onsystemd/ocp-clusterid.service
line 3).
12-12
: No issues found in networking mode gating logic
The scriptsystemd/crc-user-mode-networking.sh
exists and correctly returns exit code 0 for user-mode networking (default/empty,CRC_NETWORK_MODE_USER=1
, or unknown) and exit code 1 only for system-mode (CRC_NETWORK_MODE_USER=0
), matching the intended service gating.systemd/ocp-userpasswords.service (1)
4-4
: LGTM! Consistent path unit gating.Adding
After=crc-env-file-exists.path
is consistent with the pattern in other services. Note the inconsistency withsystemd/crc-pullsecret.service
which usescloud-init-crc-env.path
(see comment onsystemd/crc-pullsecret.service
line 3).systemd/ocp-mco-sshkey.sh (4)
3-7
: LGTM! Strict shell options improve reliability.The combination of
pipefail
,errexit
,nounset
, anderrtrace
ensures early detection of errors, which is essential for system bring-up scripts.
14-17
: LGTM! Appropriate pre-condition check.Early exit when the public key file is missing prevents downstream failures.
19-38
: LGTM! Safe JSON construction with jq resolves previous fragility.The wait_for_resource_or_die pre-check and jq-based JSON builder address the previous review concern about brittle string concatenation. Using
--arg
ensures the public key is properly escaped, and piping to--patch-file=/dev/stdin
is clean.
40-42
: LGTM! Explicit success feedback.Clear success messaging and exit code improve observability.
createdisk.sh (1)
112-119
: LGTM! Conditional service startup improves reliability.Adding
After=crc-check-tap.service
ensures proper ordering, and usingExecCondition
to gate startup based on/usr/local/bin/crc-needs-tap.sh
is cleaner than polling inExecStartPre
. This aligns with the PR's goal of improving bring-up script reliability.systemd/crc-needs-tap.sh (2)
3-7
: LGTM! Strict shell options improve reliability.The combination of strict options ensures early error detection, consistent with other scripts in this PR.
43-50
: LGTM! Self-sufficient bundle check and safe default.The self-sufficient bundle check provides appropriate override behavior, and defaulting to
DONT_NEED_TAP
is the safe choice when environment is unknown.systemd/crc-check-tap.service (1)
1-16
: LGTM! Service structure is correct.The unit ordering,
ExecCondition
negation logic, and dependency oncrc-needs-tap.sh
are appropriate. However, ensure the critical issue insystemd/crc-needs-tap.sh
(Apple virtualization returning wrong exit code) is fixed, as this unit depends on that script's correctness.systemd/crc-systemd-common.sh (2)
3-12
: LGTM! Explicit naming and input validation improve reliability.Renaming to
wait_for_resource_or_die
clearly signals the fail-fast behavior, and the input validation with usage message prevents misuse.
14-31
: LGTM! Improved retry loop with progress feedback.The
for
loop with explicit retry counter provides better observability than the previousuntil
loop. Progress messages and error output to stderr are appropriate.
(will be removed in a follow up commit)
* Make more resilient by failing on any error * Improve the logging and argument validation * Use `jq` to to guarantee that the patch file is valid JSON (will be updated in a follow up commit to avoid passing the pub key in the CLI)
* Better use of script constants * Better validation of the arguments * Better logging File will be further updated to prevent leaking passwords in the journal logs.
No need to sleep `5s` here, the SystemD dependencies should enforce the correct ordering.
Define the `KUBECONFIG` in the Systemd unit file, so that the CRC scripts don't have to care about it. ``` Environment=KUBECONFIG=/opt/kubeconfig ``` Gives a better separation of concerns.
Introduce the `ocp-apiservices-available.sh` script, which waits for the `apiservices` to be all available. The APIServices are made up of two groups: - the K8s APIs, which are always available (pods, secrets, configmaps,...) - the OCP APIs, which need OCP Operators and Pods to be ready (routes, projects, ...) This script waits for the second group to finish its initialization.
The `crc-wait-apiserver-up` wait for the K8s APIServer to be up and running. This commit makes the patch try 60 times with 5s delay, instead of 5 times with 60s delay. The script becomes more reactive to the APIServer activation.
This script add a synchronization point on the `ready` status of the CRC node. Before the node is ready, services can interact with the K8s APIServer, but user (and OCP) services won't start their deployment before the CRC node is ready. This synchronization point avoid that other services (like the `ocp-wait-apiservices`) wait in vain while their target didn't start their own deployment.
This script has been broken for a while, will be handled by cloud-init.
…om AWS-CLI This script offloads `mapt` and other AWS deployers from the task of fetching the secrets from AWS IMDS service. This script should be include in the `cloud-init` user-data configuration file, with this kind of invocation: ``` /usr/local/bin/crc-aws-fetch-secrets.sh \ "{{ .SSMPullSecretName }}" \ "{{ .SSMKubeAdminPasswordName }}" \ "{{ .SSMDeveloperPasswordName }}" ``` where the parameters specify the location of the three secrets in the IMDS store.
This commit moves the definition of the secret file locations from the scripts to the SystemD unit. This way, SystemD can enforce that the files exist before launching the relevant services.
A review of the systemd journal logs of the different services highlighted that the SystemD journal captures information about the Podman containers via a Podman-internal logging mechanism. This commit disables the logging mechanism to the containers handling secrets.
To ease the quick glance review of the CRC boot timing, this scripts adds a simple timing measurement, based on Bash's `SECONDS` special variable (automatically tracking time past after its `SECONDS=0` reset). For a stronger time tracking, refer to the journal timestamps of the services.
…onfig-ovewrite directories SystemD allows overwriting the definition of services by writing new properties in the `unit-name.service.d/override.conf` files. This commit allows the CRC image creation script to properly upload these files and directories to the VM image.
This override prevents the `ovs-configuration.service` from logging its xtrace execution into the journal and the console. This service is very verbose, and makes the console impossible to follow in real-time. Instead, its output is logged in a `/var/log` file.
This commit reduces the Restart duration of the service. The SystemD dependencies should already avoid any failure of the script.
Add a dependency on the `cloud-final.service`, to be sure that the pull-secrets have been pulled when the service starts.
Add a dependency on the `cloud-final.service` to be sure that the CRC passwords have been fetched before starting.
Reformulate the dependencies of the `crc-custom.target` to avoid startup deadlocks. Load `crc-custom.target` as a dependency of the `kubelet.service`.
Ensure that the pub key has been fetched before starting the service.
Better use of `jq` to ensure that the public key isn't exposed in the journal logs. Exposing a public key isn't a security leak, but better avoid disclosing it as a good practice.
`AssertPathExists` is checked before the condition is tested. Use a `ExecStartPre` directive instead. ``` ExecCondition=/usr/local/bin/crc-self-sufficient-env.sh ```
…test Not working before the network is established
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
systemd/ocp-custom-domain.sh (1)
64-117
: Wait for the concrete objects before patching.
wait_for_resource_or_die
only checks thatoc get <resource>
exits 0. Commands likeoc get ingresses.config.openshift.io
oroc get route.route.openshift.io
succeed even while the singleton (cluster
) ordefault-route
object is still absent, so we can race straight into the subsequentoc patch …
and abort underset -e
. Gate on the actual targets before patching.-wait_for_resource_or_die ingresses.config.openshift.io +wait_for_resource_or_die ingresses.config.openshift.io/cluster … -wait_for_resource_or_die apiserver.config.openshift.io +wait_for_resource_or_die apiserver.config.openshift.io/cluster … -wait_for_resource_or_die route.route.openshift.io +wait_for_resource_or_die "route/default-route -n openshift-image-registry"systemd/ocp-cluster-ca.sh (1)
21-24
: Don't exit just because the ConfigMap exists.If a previous run created
client-ca-custom
but failed before theoc patch
call, this guard will exit every retry, leaving the API server stuck on the old client CA forever. Query the apiserver (.spec.clientCA.name
) before deciding to skip the rotation, or drop the exit and let the flow proceed so the patch is reapplied when needed.-if oc get configmap client-ca-custom -n openshift-config 2>/dev/null; then - echo "API Server Client CA already rotated..." - exit 0 -fi +current_client_ca="$(oc get apiserver cluster -o jsonpath='{.spec.clientCA.name}' 2>/dev/null || true)" +if [[ "${current_client_ca}" == "client-ca-custom" ]]; then + echo "API Server Client CA already rotated..." + exit 0 +fisystemd/crc-test-vsock.py (1)
30-34
: Fix ioctl argument type and add portable fallback for the constant.Line 32 passes a string
' '
tofcntl.ioctl
, which will raiseTypeError
in Python 3 (requires bytes). Additionally,socket.IOCTL_VM_SOCKETS_GET_LOCAL_CID
may not be available on all Python builds, causingAttributeError
.Apply this diff to fix both issues:
print(f"Looking up the CID in {VSOCK_DEV}...") +IOCTL_VM_SOCKETS_GET_LOCAL_CID = getattr(socket, "IOCTL_VM_SOCKETS_GET_LOCAL_CID", 0x7B9) -with open(VSOCK_DEV, 'rb') as f: - r = fcntl.ioctl(f, socket.IOCTL_VM_SOCKETS_GET_LOCAL_CID, ' ') +with open(VSOCK_DEV, "rb", buffering=0) as f: + r = fcntl.ioctl(f, IOCTL_VM_SOCKETS_GET_LOCAL_CID, bytes(4)) cid = struct.unpack('I', r)[0] -print(f'Our vsock CID is {cid}.') +print(f"Our vsock CID is {cid}.")tools.sh (1)
224-228
: Tighten htpasswd creation: enforce restrictive umaskThe bcrypt hashes we drop into
pass_file
can still leak if the file inherits a permissive default umask (0644 on most hosts). That leak was flagged earlier, but the current revision still creates the file without tightening permissions, so other users on the host can read and reuse the credential store. Please restore the hardening by setting a restrictive umask before the twohtpasswd
calls, and (bonus) move thelocal
out of the subshell and pass--
to guard against paths that start with-
.function generate_htpasswd_file { local auth_file_dir=$1 local pass_file=$2 + local random_password ( set +x # use a subshell to avoid leaking the password - local random_password - random_password=$(cat "$auth_file_dir/auth/kubeadmin-password") - "${HTPASSWD}" -c -B -i "${pass_file}" developer <<< "developer" # use -c to create the file - "${HTPASSWD}" -B -i "${pass_file}" kubeadmin <<< "${random_password}" # append to the existing password file + umask 077 + random_password="$(<"${auth_file_dir}/auth/kubeadmin-password")" + "${HTPASSWD}" -c -B -i -- "${pass_file}" developer <<<"developer" # use -c to create the file + "${HTPASSWD}" -B -i -- "${pass_file}" kubeadmin <<<"${random_password}" # append to the existing password file ) }systemd/ocp-userpasswords.sh (1)
27-53
: Stop leaking passwords through process arguments.
htpasswd -nb "$1" "$2"
exposes the cleartext password in the container’s argv, so anyone able to runps
while the unit executes can read both credentials. Pipe the password via stdin and usehtpasswd -ni
(or equivalent) so it never shows up in the process list, and keep tracing disabled only around the secret-handling block.Here's a concrete fix:
- podman run --log-driver=none --rm "$CRC_HTPASSWD_IMAGE" htpasswd -nb "$1" "$2" + printf '%s' "$2" | podman run --log-driver=none --rm \ + "$CRC_HTPASSWD_IMAGE" htpasswd -ni "$1"Then re-enable
set -x
after bothgen_htpasswd
calls, once secrets are out of scope.
[suggest_s essential? no -> major only? Already given fix; no extra tag]
🧹 Nitpick comments (3)
systemd/ocp-wait-apiservices-available.sh (1)
43-54
: Display all counted unavailable services for complete diagnostics.The counting logic (lines 22-27) correctly treats services without
Available == "True"
as unavailable (includingUnknown
and missing conditions), but the diagnostic output (lines 43-51) only displays services withAvailable == "False"
. This inconsistency means operators see a count of N unavailable services but fewer than N services listed, which can be confusing during troubleshooting.Update the jq query to display all services counted as unavailable:
if ! oc get apiservices -o json | jq -r ' .items[] | . as $item | - ( - $item.status.conditions[]? | - select(.type == "Available" and .status == "False") - ) | - " - \($item.metadata.name): \(.reason) - \(.message)" + select( + (($item.status.conditions // []) + | any(.type=="Available" and .status=="True")) | not + ) | + (($item.status.conditions // []) + | map(select(.type == "Available"))[0] + | " - \($item.metadata.name): \(.status // "Unknown") - \(.reason // "N/A") - \(.message // "No message")" + ) 'systemd/crc-needs-tap.sh (1)
13-13
: Minor: Unused variable definition.
EXIT_ERROR
is defined but never used in any exit path. All exits use eitherEXIT_NEED_TAP
orEXIT_DONT_NEED_TAP
.Consider removing this unused variable:
-EXIT_ERROR=1
systemd/crc-pullsecret.sh (1)
50-63
: Optional: Consider re-enabling xtrace after secret generation.Xtrace is disabled at line 50 to protect passwords but never re-enabled. This is safe but means the final
oc create secret
operations (lines 57-61) won't have verbose logging.If verbose logging is desired for the secret creation phase, consider:
echo "Generating the kubeadmin and developer passwords ..." set +x # disable the logging to avoid leaking the passwords dev_pass=$(gen_htpasswd developer "$(cat "$CRC_PASS_DEVELOPER_PATH")") adm_pass=$(gen_htpasswd kubeadmin "$(cat "$CRC_PASS_KUBEADMIN_PATH")") +set -x # re-enable tracing after passwords are generated echo "creating the password secret ..."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (44)
createdisk-library.sh
(2 hunks)createdisk.sh
(2 hunks)docs/self-sufficient-bundle.md
(0 hunks)systemd/crc-aws-fetch-secrets.sh
(1 hunks)systemd/crc-check-tap.service
(1 hunks)systemd/crc-cluster-status.service
(1 hunks)systemd/crc-cluster-status.sh
(2 hunks)systemd/crc-conditionally-disable-tap.sh
(1 hunks)systemd/crc-custom.target
(1 hunks)systemd/crc-disable-tap.sh
(1 hunks)systemd/crc-dnsmasq.service
(0 hunks)systemd/crc-needs-tap.sh
(1 hunks)systemd/crc-no-tap.service
(0 hunks)systemd/crc-no-tap.sh
(0 hunks)systemd/crc-pullsecret.service
(1 hunks)systemd/crc-pullsecret.sh
(1 hunks)systemd/crc-routes-controller.service
(1 hunks)systemd/crc-routes-controller.sh
(1 hunks)systemd/crc-self-sufficient-env.sh
(1 hunks)systemd/crc-systemd-common.sh
(1 hunks)systemd/crc-test-vsock.py
(1 hunks)systemd/crc-user-mode-networking.sh
(1 hunks)systemd/crc-wait-apiserver-up.service
(1 hunks)systemd/crc-wait-apiserver-up.sh
(1 hunks)systemd/crc-wait-node-ready.service
(1 hunks)systemd/crc-wait-node-ready.sh
(1 hunks)systemd/dnsmasq.sh.template
(1 hunks)systemd/kubelet.service.d/wants-crc-custom.conf
(1 hunks)systemd/ocp-cluster-ca.service
(2 hunks)systemd/ocp-cluster-ca.sh
(2 hunks)systemd/ocp-clusterid.service
(1 hunks)systemd/ocp-clusterid.sh
(1 hunks)systemd/ocp-custom-domain.service
(2 hunks)systemd/ocp-custom-domain.sh
(1 hunks)systemd/ocp-growfs.service
(0 hunks)systemd/ocp-growfs.sh
(0 hunks)systemd/ocp-mco-sshkey.service
(1 hunks)systemd/ocp-mco-sshkey.sh
(1 hunks)systemd/ocp-userpasswords.service
(2 hunks)systemd/ocp-userpasswords.sh
(1 hunks)systemd/ocp-wait-apiservices-available.service
(1 hunks)systemd/ocp-wait-apiservices-available.sh
(1 hunks)systemd/ovs-configuration.service.d/mute-console.conf
(1 hunks)tools.sh
(1 hunks)
💤 Files with no reviewable changes (6)
- systemd/crc-no-tap.service
- systemd/crc-dnsmasq.service
- systemd/ocp-growfs.sh
- docs/self-sufficient-bundle.md
- systemd/crc-no-tap.sh
- systemd/ocp-growfs.service
✅ Files skipped from review due to trivial changes (1)
- systemd/kubelet.service.d/wants-crc-custom.conf
🚧 Files skipped from review as they are similar to previous changes (14)
- systemd/crc-aws-fetch-secrets.sh
- systemd/ocp-cluster-ca.service
- systemd/crc-wait-node-ready.service
- systemd/dnsmasq.sh.template
- systemd/crc-cluster-status.service
- systemd/ocp-userpasswords.service
- createdisk-library.sh
- systemd/ocp-custom-domain.service
- systemd/ocp-wait-apiservices-available.service
- systemd/crc-routes-controller.service
- systemd/crc-routes-controller.sh
- createdisk.sh
- systemd/crc-custom.target
- systemd/ocp-clusterid.sh
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-02T12:58:51.079Z
Learnt from: kpouget
PR: crc-org/snc#1168
File: systemd/crc-pullsecret.sh:11-11
Timestamp: 2025-10-02T12:58:51.079Z
Learning: In systemd/crc-pullsecret.sh, the wait_for_resource_or_die secret call is intentionally checking that the Kubernetes secret API is operational (part of the OCP/K8s boot process), not waiting for a specific secret resource to exist. The script relies on subsequent oc commands and systemd retry logic to handle cases where specific resources aren't available yet.
Applied to files:
systemd/crc-pullsecret.service
systemd/crc-wait-apiserver-up.sh
systemd/ocp-userpasswords.sh
systemd/ocp-custom-domain.sh
systemd/crc-systemd-common.sh
systemd/crc-pullsecret.sh
systemd/ocp-cluster-ca.sh
📚 Learning: 2025-10-07T13:05:57.801Z
Learnt from: kpouget
PR: crc-org/snc#1168
File: systemd/crc-systemd-common.sh:5-7
Timestamp: 2025-10-07T13:05:57.801Z
Learning: In systemd/crc-systemd-common.sh and similar system scripts, extensive input validation (like checking if parameters are numeric or within specific ranges) is not required for internal functions. The codebase prefers simpler, less defensive coding for these utilities.
Applied to files:
systemd/crc-systemd-common.sh
📚 Learning: 2025-10-02T13:13:28.229Z
Learnt from: kpouget
PR: crc-org/snc#1168
File: systemd/dnsmasq.sh.template:3-12
Timestamp: 2025-10-02T13:13:28.229Z
Learning: The file `/etc/sysconfig/crc-env` in the crc-org/snc repository contains configuration variables and file paths (e.g., `CRC_PASS_KUBEADMIN_PATH`, `CRC_PUB_KEY_PATH`, `CRC_NETWORK_MODE_USER`, `APPS_DOMAIN`), not the actual secrets. The secrets themselves are stored in separate files referenced by these paths. Sourcing `crc-env` with xtrace enabled does not leak sensitive information.
Applied to files:
systemd/ocp-mco-sshkey.service
systemd/crc-needs-tap.sh
systemd/crc-self-sufficient-env.sh
📚 Learning: 2025-10-07T07:03:06.884Z
Learnt from: kpouget
PR: crc-org/snc#1168
File: systemd/crc-self-sufficient-env.sh:10-17
Timestamp: 2025-10-07T07:03:06.884Z
Learning: In the crc-org/snc repository, the CRC_CLOUD environment variable has been removed from the codebase. The self-sufficient environment check in systemd/crc-self-sufficient-env.sh now only validates CRC_SELF_SUFFICIENT=1, not CRC_CLOUD.
Applied to files:
systemd/crc-self-sufficient-env.sh
🧬 Code graph analysis (6)
systemd/crc-wait-apiserver-up.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-52)
systemd/ocp-userpasswords.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-52)
systemd/ocp-custom-domain.sh (2)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-52)systemd/ocp-cluster-ca.sh (1)
cleanup
(38-42)
systemd/crc-pullsecret.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-52)
systemd/ocp-cluster-ca.sh (2)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-52)systemd/ocp-custom-domain.sh (1)
cleanup
(42-45)
systemd/ocp-mco-sshkey.sh (1)
systemd/crc-systemd-common.sh (1)
wait_for_resource_or_die
(3-52)
🪛 Ruff (0.14.0)
systemd/crc-test-vsock.py
1-1: Shebang is present but file is not executable
(EXE001)
🪛 Shellcheck (0.11.0)
systemd/crc-needs-tap.sh
[warning] 13-13: EXIT_ERROR appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (19)
systemd/ovs-configuration.service.d/mute-console.conf (1)
1-3
: LGTM! Previous logging issue has been addressed.The use of
append:
correctly preserves logs across service restarts, addressing the truncation concern from the previous review.systemd/crc-pullsecret.service (2)
6-6
: Increased retry attempts improve reliability.The combination of
StartLimitBurst=40
andRestartSec=10
maintains the same total retry window (~400 seconds within the 450-second interval) but provides 4× more retry attempts compared to the previous configuration. This improves the service's ability to recover from transient failures during cluster bring-up.Also applies to: 12-12
13-16
: No issues found. The script correctly accepts the path as a positional argument (PULL_SECRETS_FILE="${1:-}"
), does not depend on/etc/sysconfig/crc-env
, and has no CRC_* environment variable dependencies. The systemd service change is compatible with the current script implementation.systemd/crc-disable-tap.sh (1)
11-12
: Properly handles missing units and files.The use of
rm -f
safely removes the NM connection file even if absent, and|| true
on the systemctl command ensures the script succeeds even when the unit doesn't exist. This correctly addresses the past review concern about handling environments where[email protected]
isn't present.systemd/crc-self-sufficient-env.sh (1)
11-30
: Inverted exit logic is correct for ExecCondition semantics.The script intentionally exits 0 when vsock doesn't work (exit 1 from Python) and exits 1 when vsock works (exit 0 from Python). This inverted logic is correct for use as
ExecCondition
in systemd units: the condition succeeds (allowing the unit to run) when the system is self-sufficient (vsock unavailable), and fails (skipping the unit) when vsock is available (not self-sufficient).The return code handling is implemented correctly:
- Timeout (124): propagates error
- vsock not working (1): exits 0 to proceed with self-sufficient setup
- vsock working (0): exits 1 to skip self-sufficient setup
- Other errors: propagated
systemd/crc-conditionally-disable-tap.sh (1)
10-17
: Wrapper logic is clean and efficient.The conditional logic correctly delegates to either an early exit or the disable script based on TAP necessity. Using
exec
to replace the current process withcrc-disable-tap.sh
is efficient and ensures proper signal handling and exit status propagation.systemd/ocp-wait-apiservices-available.sh (1)
14-65
: Robust error handling and retry logic.The script correctly handles transient
oc
failures, uses a sentinel value (999) to keep retrying when the API is unreachable, and employs proper defaults for the unavailable count. The timeout and retry parameters provide reasonable balance between responsiveness and reliability.systemd/crc-wait-apiserver-up.service (1)
10-10
: crc-wait-apiserver-up.service: EnvironmentFile removal is safe; crc-wait-apiserver-up.sh only depends on KUBECONFIG (and built-in SECONDS).systemd/ocp-mco-sshkey.service (1)
12-15
: Scripts have been properly refactored and no longer depend on environment variables.Verification confirms:
ocp-mco-sshkey.sh
accepts the public key path as$1
and does not sourcecrc-env
.crc-self-sufficient-env.sh
contains no references toCRC_SELF_SUFFICIENT
,CRC_CLOUD
, orcrc-env
.The removal of
EnvironmentFile=/etc/sysconfig/crc-env
is correct. The scripts have been properly updated to not depend on environment variables from this file.systemd/ocp-clusterid.service (1)
11-11
: KUBECONFIG is sufficient for the scripts; no additional environment variables are required.Verification shows that:
ocp-clusterid.sh
sourcescrc-systemd-common.sh
and uses onlyoc
anduuidgen
commandscrc-systemd-common.sh
uses onlyoc get
to check cluster resourcescrc-self-sufficient-env.sh
runs only a Python test for vsock connectivity- None of these scripts reference CRC_* variables or other environment variables
Setting
Environment=KUBECONFIG=/opt/kubeconfig
is sufficient for theoc
commands to function correctly. The removal ofEnvironmentFile
is appropriate.systemd/crc-wait-apiserver-up.sh (1)
3-20
: LGTM! Strict shell options and improved observability.The addition of strict shell options (
pipefail
,errexit
,nounset
,errtrace
) improves reliability, and the timing metrics with progress messages provide good visibility into the boot process.systemd/crc-wait-node-ready.sh (1)
1-40
: LGTM! Node readiness polling with proper error handling.The script correctly handles
set -o errexit
by using|| true
on line 16, ensuring the loop continues whenoc get
fails during node initialization. The fallback value on line 17 provides clear status reporting.systemd/crc-systemd-common.sh (1)
3-52
: LGTM! Well-structured wait helper with appropriate validation.The function renaming to
wait_for_resource_or_die
clearly communicates the exit-on-failure behavior. The xtrace handling, timing metrics, and input validation are all appropriate. Based on learnings, the validation level is correct for internal functions in this codebase.systemd/crc-needs-tap.sh (2)
7-9
: Confirm: set -x with crc-env sourcing is safe.The script enables xtrace while sourcing
/etc/sysconfig/crc-env
. Based on learnings, this file contains configuration variables and file paths (not actual secrets), so xtrace does not leak sensitive information.
15-45
: LGTM! Clear TAP detection logic.The detection flow is well-structured:
- Self-sufficient bundles don't need TAP (exits early)
- System-mode networking doesn't need TAP (exits early)
- Virtualization detection determines TAP necessity (apple/none: no, others: yes)
systemd/crc-check-tap.service (1)
1-14
: LGTM! Proper systemd unit ordering and activation.The unit correctly orders TAP checking before NetworkManager and gv-user-network services, and the
WantedBy
directives ensure proper activation.systemd/crc-pullsecret.sh (2)
11-54
: LGTM! Secure argument-driven secret handling.The script correctly:
- Accepts the pull-secrets file path as an argument
- Uses
wait_for_resource_or_die
to verify API readiness (per learnings)- Validates input file existence and content
- Streams secrets securely via stdin to avoid command-line exposure
11-65
: LGTM! Secure argument-driven password handling.The script correctly:
- Accepts password file paths as arguments
- Uses
--log-driver=none
to prevent password leakage in podman logs- Disables xtrace (line 50) before generating passwords to avoid shell trace exposure
- Validates inputs before processing
systemd/crc-user-mode-networking.sh (1)
1-65
: LGTM! Robust user-mode networking detection.The script properly:
- Handles target argument to invert exit codes based on search criteria
- Validates CRC_NETWORK_MODE_USER is set and numeric (0 or 1)
- Handles self-sufficient bundles correctly
- Provides clear error messages
The unreachable error handler (lines 60-65) is defensive but harmless.
/retest |
install process got interrupted :/
|
Improves the reliability of the system bring-up scripts
Builds on top of #1167
Summary by CodeRabbit
New Features
Refactor
Chores