-
Notifications
You must be signed in to change notification settings - Fork 1.4k
OCPBUGS-62790: Resize /var tmpfs to 10GiB for ABI installations #10055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@zaneb: This pull request references Jira Issue OCPBUGS-62790, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@zaneb: This pull request references Jira Issue OCPBUGS-62790, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Agent-based installations on vSphere with 16GB RAM were failing with "no space left on device" errors during ostree image operations. The live ISO environment uses a tmpfs mounted at /var that is sized at 50% of available RAM. On systems with 16GB RAM, this provides only 8GB of tmpfs space. During the bootstrap process, node-image-pull.sh creates a temporary ostree repository in /var/ostree-container/repo to pull and apply the node image. This operation has a peak tmpfs usage of approximately 8.5-9GB, exceeding the available 8GB and causing ENOSPC errors. This fix resizes the /var tmpfs to 10GiB before creating the temporary ostree repository, providing sufficient space for the image operations while maintaining compatibility with the minimum 16GB RAM requirement. The resize is performed using systemd-run to escape the mount namespace and only affects systems running in the live environment (detected by the presence of /run/ostree-live). Assisted-by: Claude Code
99e2a25 to
57892c6
Compare
|
/test ? |
|
@zaneb: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test e2e-agent-compact-ipv4 |
|
/test e2e-metal-assisted |
|
@zaneb: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| echo "Resizing /run/ephemeral_base tmpfs to 10GiB for ostree operations..." | ||
| # Use systemd-run to avoid inheriting MountFlags | ||
| systemd-run --wait --service-type=oneshot mount -o remount,size=10G /run/ephemeral_base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC there isn't a way here to configure the value as % (rather than a fixed value, currently equivalent to 62.5%), isn't it? The only point is that better spec'ed nodes (with >16GB) will not grant any benefit by their additional memory during the live setup - which seems a minor point though, as the additional memory shouldn't be required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you can do it by %, but the ostree repo is the same size all the time.
And by the time we get here we have already validated the minimum total RAM, so we have at least 16GiB.
I guess the benefit of a % is that if we blew through the 10GiB barrier then you could still workaround by adding more RAM, but with a fixed size there is no workaround.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
@zaneb: This pull request references Jira Issue OCPBUGS-62790, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Agent-based installations on vSphere with 16GB RAM were failing with "no space left on device" errors during ostree image operations. The live ISO environment uses a tmpfs mounted at /var that is sized at 50% of available RAM. On systems with 16GB RAM, this provides only 8GB of tmpfs space.
During the bootstrap process, node-image-pull.sh creates a temporary ostree repository in /var/ostree-container/repo to pull and apply the node image. This operation has a peak tmpfs usage of approximately 8.5-9GB, exceeding the available 8GB and causing ENOSPC errors.
This fix resizes the /var tmpfs to 10GiB before creating the temporary ostree repository, providing sufficient space for the image operations while maintaining compatibility with the minimum 16GB RAM requirement.
The resize is performed using systemd-run to escape the mount namespace and only affects systems running in the live environment (detected by the presence of /run/ostree-live).