Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cgroups v1 and v2 for memory and CPU limits #732

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
24d383f
gen-docker-cases.sh
dzuelke Jun 12, 2024
b4e5f8b
Test fixtures for cgroups v1/v2 handling
dzuelke May 28, 2024
ecaadcc
cgroups fixtures readme
dzuelke Jun 11, 2024
c6739a4
Reusable cgroup_util_ group of functions
dzuelke Jun 14, 2024
f029e46
Tests for cgroup bash functions
dzuelke May 30, 2024
c43ced4
test both verbose and regular cgroup util invocations
dzuelke Jun 19, 2024
a744769
Implement cgroupsv2 awareness for boot scripts
dzuelke Feb 22, 2024
f7a35d4
Run cgroup_spec in separate CI job
dzuelke Jun 17, 2024
98566c5
Use nproc to read CPU core count in boot scripts
dzuelke Feb 9, 2024
52fe298
address some extra strict shellchecks
dzuelke Jun 20, 2024
654cc46
Read memory.high, memory.max, memory.low, memory.min in that order fo…
dzuelke Jun 21, 2024
4f6729b
rename find_ functions that take full /proc/self/cgroup or /proc/self…
dzuelke Jun 21, 2024
be65ae2
CGROUP_UTIL_VERBOSE instead of -v option
dzuelke Jun 21, 2024
39ab939
use CGROUP_UTIL_PROCFS_ROOT and CGROUP_UTIL_CGROUPFS_PREFIX instead o…
dzuelke Jun 21, 2024
34429f7
auto-emulate findmnt for cgroup test runs on macOS
dzuelke Jun 21, 2024
a15e4b7
make -m option an argument
dzuelke Jun 21, 2024
931b220
add cgroup_util_read_cgroup_memory_limit_with_fallback
dzuelke Jun 21, 2024
903b6a0
hard-code maximum value that serves as the 'silly' threshold
dzuelke Jun 21, 2024
32bcd54
drop controller_version switch case
dzuelke Jun 24, 2024
d7420ad
test cgroup_util_read_cgroup_memory_limit_with_fallback
dzuelke Jun 25, 2024
4ab7d2b
use 'cgroup_util_read_cgroup_memory_limit_with_fallback' in boot scripts
dzuelke Jun 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
19 changes: 18 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,21 @@ env:
S5CMD_HASH: "392c385320cd5ffa435759a95af77c215553d967e4b1c0fffe52e4f14c29cf85 s5cmd_2.2.2_linux_amd64.deb"

jobs:
unit-test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: ["ubuntu-20.04", "ubuntu-22.04", "ubuntu-24.04"]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Ruby and Bundler
uses: ruby/setup-ruby@v1
with:
bundler-cache: true
ruby-version: "3.2"
- name: Execute tests
run: bundle exec rspec test/spec/cgroup_spec.rb
integration-test:
runs-on: ubuntu-22.04
strategy:
Expand Down Expand Up @@ -67,7 +82,9 @@ jobs:
- name: Calculate number of parallel_rspec processes (half of num of lines in runtime log)
run: echo "PARALLEL_TEST_PROCESSORS=$(( ($(cat test/var/log/parallel_runtime_rspec.${STACK}.log | wc -l)+2-1)/2 ))" >> "$GITHUB_ENV"
- name: Execute tests
run: bundle exec parallel_rspec --group-by runtime --first-is-1 --unknown-runtime 1 --allowed-missing 100 --runtime-log "test/var/log/parallel_runtime_rspec.${STACK}.log" --verbose-command --combine-stderr --prefix-output-with-test-env-number test/spec/
run: |
shopt -s extglob
bundle exec parallel_rspec --group-by runtime --first-is-1 --unknown-runtime 1 --allowed-missing 100 --runtime-log "test/var/log/parallel_runtime_rspec.${STACK}.log" --verbose-command --combine-stderr --prefix-output-with-test-env-number test/spec/!(cgroup)_spec.rb
- name: Print list of executed examples
run: cat test/var/log/group.*.json | jq -r --slurp '[.[].examples[]] | sort_by(.id) | flatten[] | .full_description'
- name: Print parallel_runtime_rspec.log
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## [Unreleased]

### ADD

- Read memory limits from cgroups (v1 or v2) in boot scripts (#699) [David Zuelke]
- Read CPU core count from cgroups (v1 or v2) via `nproc` in boot scripts [David Zuelke]

## [v253] - 2024-06-13

Expand Down
28 changes: 20 additions & 8 deletions bin/heroku-php-apache2
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ fi
# we very likely got called via a symlink, so we have to realpath $0 first to find the base buildpack directory
bp_dir=$(cd $(dirname $(realpath $0)); cd ..; pwd)

source "${bp_dir}/bin/util/cgroups.sh"

verbose=
conftest=

Expand Down Expand Up @@ -387,21 +389,31 @@ echo -e "\n[global]\nlog_level = notice" >> "$fpm_config_tmp"
fpm_command=( php-fpm --pid "$fpm_pidfile" --nodaemonize -y "$fpm_config_tmp" ${php_config:+-c "$php_config"} )
httpd_command=( httpd -D NO_DETACH -c "Include $httpd_config" )

mlib="/sys/fs/cgroup/memory/memory.limit_in_bytes"
if [[ -f "$mlib" ]]; then
[[ $verbose ]] && echo "Reading available RAM from '$mlib'" >&2
ram="$(cat "$mlib")"
else
[[ $verbose ]] && echo "No '$mlib' with RAM info found" >&2
ram=
# attempt to read a "correct" limit from cgroupfs
# this will handle cgroups v1 and v2, and, for v2, prefer memory.high over memory.max over memory.low
if [[ -d "/proc" ]]; then # check for /proc, just in case someone is running this on e.g. macOS
[[ $verbose ]] && echo "Checking cgroupfs for memory limits..." >&2
ram=$(CGROUP_UTIL_VERBOSE=${verbose:+1} cgroup_util_read_cgroup_memory_limit_with_fallback) || {
# 99 means the limit was exceeded; in verbose mode, a message was then already printed
if (( $? != 99)) && [[ $verbose ]]; then
echo "No cgroup memory limits found" >&2
fi
}
fi
if [[ -z "$ram" ]]; then
ram="512M"
echo "Assuming RAM to be ${ram} Bytes" >&2
fi

# read number of available processor cores in a portable (Linux, macOS, BSDs) fashion (leading underscore is not always there)
cores=$(getconf -a | grep -E '_?NPROCESSORS_ONLN' | head -n1 | tr -s ' ' | cut -d" " -f2) || {
# we prefer 'nproc', because that applies cgroup limits for us (e.g. when using 'docker run --cpuset-cpus=1-2'), but it's from GNU coreutils
cores=$(set -o pipefail; nproc 2>/dev/null || getconf -a | grep -E '_?NPROCESSORS_ONLN' | head -n1 | tr -s ' ' | cut -d" " -f2) || {
echo "WARNING: failed to determine _NPROCESSORS_ONLN via getconf" >&2
cores=1
echo "Assuming number of CPU cores to be $cores" >&2
echo "Assuming number of CPU cores to be ${cores}" >&2
}

if [[ -z ${WEB_CONCURRENCY:-} ]]; then
if [[ -n ${DYNO:-} && $(ulimit -u) == "32768" && ( "$php_version" == 5.* || "$php_version" == 7.[0-3].* ) ]]; then
# on a Performance-L dyno, limit to 6 GB of RAM for backwards compatibility for PHP versions before 7.4
Expand Down
28 changes: 20 additions & 8 deletions bin/heroku-php-nginx
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ fi
# we very likely got called via a symlink, so we have to realpath $0 first to find the base buildpack directory
bp_dir=$(cd $(dirname $(realpath $0)); cd ..; pwd)

source "${bp_dir}/bin/util/cgroups.sh"

verbose=
conftest=

Expand Down Expand Up @@ -387,21 +389,31 @@ echo -e "\n[global]\nlog_level = notice" >> "$fpm_config_tmp"
fpm_command=( php-fpm --pid "$fpm_pidfile" --nodaemonize -y "$fpm_config_tmp" ${php_config:+-c "$php_config"} )
nginx_command=( nginx -c "$nginx_main" -g "pid $nginx_pidfile; include $nginx_config;" )

mlib="/sys/fs/cgroup/memory/memory.limit_in_bytes"
if [[ -f "$mlib" ]]; then
[[ $verbose ]] && echo "Reading available RAM from '$mlib'" >&2
ram="$(cat "$mlib")"
else
[[ $verbose ]] && echo "No '$mlib' with RAM info found" >&2
ram=
# attempt to read a "correct" limit from cgroupfs
# this will handle cgroups v1 and v2, and, for v2, prefer memory.high over memory.max over memory.low
if [[ -d "/proc" ]]; then # check for /proc, just in case someone is running this on e.g. macOS
[[ $verbose ]] && echo "Checking cgroupfs for memory limits..." >&2
ram=$(CGROUP_UTIL_VERBOSE=${verbose:+1} cgroup_util_read_cgroup_memory_limit_with_fallback) || {
# 99 means the limit was exceeded; in verbose mode, a message was then already printed
if (( $? != 99)) && [[ $verbose ]]; then
echo "No cgroup memory limits found" >&2
fi
}
fi
if [[ -z "$ram" ]]; then
ram="512M"
echo "Assuming RAM to be ${ram} Bytes" >&2
fi

# read number of available processor cores in a portable (Linux, macOS, BSDs) fashion (leading underscore is not always there)
cores=$(getconf -a | grep -E '_?NPROCESSORS_ONLN' | head -n1 | tr -s ' ' | cut -d" " -f2) || {
# we prefer 'nproc', because that applies cgroup limits for us (e.g. when using 'docker run --cpuset-cpus=1-2'), but it's from GNU coreutils
cores=$(set -o pipefail; nproc 2>/dev/null || getconf -a | grep -E '_?NPROCESSORS_ONLN' | head -n1 | tr -s ' ' | cut -d" " -f2) || {
echo "WARNING: failed to determine _NPROCESSORS_ONLN via getconf" >&2
cores=1
echo "Assuming number of CPU cores to be $cores" >&2
echo "Assuming number of CPU cores to be ${cores}" >&2
}

if [[ -z ${WEB_CONCURRENCY:-} ]]; then
if [[ -n ${DYNO:-} && $(ulimit -u) == "32768" && ( "$php_version" == 5.* || "$php_version" == 7.[0-3].* ) ]]; then
# on a Performance-L dyno, limit to 6 GB of RAM for backwards compatibility for PHP versions before 7.4
Expand Down
202 changes: 202 additions & 0 deletions bin/util/cgroups.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
#!/usr/bin/env bash

# stdin is the output of e.g. /proc/self/cgroup
cgroup_util_find_controller_from_procfs_cgroup_contents() {
local usage="Usage (stdin is /proc/self/cgroup format): ${FUNCNAME[0]} CONTROLLER"
# there may be an entry for a v1 controller like:
# 7:memory:/someprefix
# if not, then there can be an entry for a v2 unified hierarchy, e.g.:
# 0::/
# we look for the v1 first, as there may be hybrid setups where some controllers are still v1
# so if there is an entry for "memory", a v1 controller is in charge, even if others are v2
(
set -o pipefail
grep -E -e '^[0-9]+:('"${1:?$usage}"')?:/.*' | sort -r -n -k 1 -t ":" | head -n1
)
}

cgroup_util_get_controller_version_from_procfs_cgroup_line() {
readarray -d':' -t line # -t removes trailing delimiter
# with e.g. 'docker run --cgroup-parent foo:bar, the third (relative path) section would contain a colon
if (( ${#line[@]} < 3 )); then
exit 1
fi
if [[ ${line[0]} == "0" ]]; then
echo "2"
else
echo "1"
fi
}

cgroup_util_get_controller_path_from_procfs_cgroup_line() {
readarray -d':' line # no -t, we want any trailing delims for concatenation via printf
if (( ${#line[@]} < 3 )); then
exit 1
fi
# with e.g. 'docker run --cgroup-parent foo:bar, the third (relative path) section would contain a colon, so we have to output from 3 until the end
printf "%s" "${line[@]:2}"
}

# stdin is the output of e.g. /proc/self/mountinfo
# $1 is a controller name, which is matched against the mount options using -O (so it could be a comma-separated list, too)
cgroup_util_find_v1_mount_from_procfs_mountinfo_contents() {
local usage="Usage (stdin is /proc/self/cgroup format): ${FUNCNAME[0]} CONTROLLER"
# must specify --list explicitly or it might output tree parts after all...
findmnt --list --noheadings --first-only -t cgroup -O "${1:?$usage}" -o target -F <(cat)
}

# stdin is the output of e.g. /proc/self/mountinfo
cgroup_util_find_v2_mount_from_procfs_mountinfo_contents() {
# must specify --list explicitly or it might output tree parts after all...
findmnt --list --noheadings --first-only -t cgroup2 -o target -F <(cat)
}

# $1 is the controller name, $2 is the mount root from /proc/self/mountinfo, $3 is the mount relative dir from /proc/self/cgroup
cgroup_util_find_v1_path() {
local usage="Usage: ${FUNCNAME[0]} CONTROLLER MOUNT CGROUP"
local relpath=${3:?$usage}
# strip trailing slash if present (it would also be if it was just "/")
relpath=${relpath%/}
cur="${2:?$usage}${relpath}"
while true; do
if [[ -d "$cur" ]] && compgen -G "${cur}/${1:?$usage}.*" > /dev/null; then
echo "$cur"
return 0
elif [[ "$cur" == "$2" ]]; then
break # we are at the mount, and it does not exist
fi
cur=$(dirname "$cur")
done
return 1
}

# $1 is the controller name, $2 is the mount root from /proc/self/mountinfo, $3 is the mount relative dir from /proc/self/cgroup
cgroup_util_find_v2_path() {
local usage="Usage: ${FUNCNAME[0]} CONTROLLER MOUNT CGROUP"
local retval=${3:?$usage}
# strip trailing slash if present (it would also be if it was just "/")
retval=${2:?$usage}${retval%/}
if grep -Eqs '(^|\s)'"${1:?$usage}"'($|\s)' "${retval}/cgroup.controllers"; then
echo "$retval"
return 0
else
# so it captures the exit status of grep, otherwise it is that of the if
return
fi
}

# this ignores memory.soft_limit_in_bytes on purpose for the reasons outlined in https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#id1
cgroup_util_read_cgroupv1_memory_limit() {
local usage="Usage: ${FUNCNAME[0]} PATH"
local f="${1:?$usage}/memory.limit_in_bytes"
if [[ -r "$f" ]]; then
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Using limit from '${f}'" >&2
cat "$f"
return
else
return 9
fi
}

# this reads memory.high first, then falls back to memory.max, memory.low, or memory.min
cgroup_util_read_cgroupv2_memory_limit() {
local usage="Usage: ${FUNCNAME[0]} PATH"

local f
local limit
# memory.high is the the best limit to read ("This is the main mechanism to control memory usage of a cgroup.", https://www.kernel.org/doc/html/v5.15/admin-guide/cgroup-v2.html)
# we fall back to memory.max first (the final "safety net" limit), then memory.low (best-effort memory protection, e.g. OCI memory.reservation or Docker --memory-reservation), then finally memory.min (hard guaranteed minimum)
for f in "${1:?$usage}/memory.high" "${1}/memory.max" "${1}/memory.low" "${1}/memory.min"; do
if [[ -r "$f" ]]; then
limit=$(cat "$f")
if [[ "$limit" != "max" && "$limit" != "0" ]]; then
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Using limit from '${f}'" >&2
echo "$limit"
return
fi
fi
done

return 9
}

# reads a cgroup v1 (memory.limit_in_bytes) or v2 (memory.high, fallback to memory.max, fallback to memory.low, fallback to memory.min)
# if env var CGROUP_UTIL_PROCFS_ROOT is passed, it will be used instead of '/proc' to find '/proc/self/cgroup', '/proc/self/mountinfo' etc (useful for testing, defaults to '/proc')
# if env var CGROUP_UTIL_CGROUPFS_PREFIX is passed, it will be prepended to any /sys/fs/cgroup or similar path used (useful for testing, defaults to '')
# pass a value for env var CGROUP_UTIL_VERBOSE to enable verbose mode
cgroup_util_read_cgroup_memory_limit() {
if [[ -z "${CGROUP_UTIL_PROCFS_ROOT-}" ]]; then
local CGROUP_UTIL_PROCFS_ROOT=/proc
fi

# this value is used as a threshold for "silly" maximums returned e.g. by Docker on a cgroups v1 system
local maximum=$((8 * 1024 * 1024 * 1024 * 1024)) # 8 TB

local controller=memory

local procfs_cgroup_entry
procfs_cgroup_entry=$(cgroup_util_find_controller_from_procfs_cgroup_contents "$controller" < "${CGROUP_UTIL_PROCFS_ROOT}/self/cgroup") || {
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Could not find cgroup controller '${controller}' in '${CGROUP_UTIL_PROCFS_ROOT}/self/cgroup'" >&2
return 3
}

local controller_version
controller_version=$(echo "$procfs_cgroup_entry" | cgroup_util_get_controller_version_from_procfs_cgroup_line) || {
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Could not determine version for cgroup controller '${controller}' from '${CGROUP_UTIL_PROCFS_ROOT}/self/cgroup'" >&2
return 4
}

local controller_path
controller_path=$(echo "$procfs_cgroup_entry" | cgroup_util_get_controller_path_from_procfs_cgroup_line) || {
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Could not determine path for cgroup controller '${controller}' from '${CGROUP_UTIL_PROCFS_ROOT}/self/cgroup'" >&2
return 5
}

local controller_mount
controller_mount=$(cgroup_util_find_v"$controller_version"_mount_from_procfs_mountinfo_contents "$controller" < "${CGROUP_UTIL_PROCFS_ROOT}/self/mountinfo") || {
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Could not determine mount point for cgroup controller '${controller}' from '${CGROUP_UTIL_PROCFS_ROOT}/self/mountinfo'" >&2
return 6
}
# for testing purposes, a prefix can be passed to "relocate" the /sys/fs/cgroup/... location we are reading from next
controller_mount="${CGROUP_UTIL_CGROUPFS_PREFIX-}${controller_mount}"

local location
location=$(cgroup_util_find_v"$controller_version"_path "$controller" "$controller_mount" "$controller_path") || {
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Could not find a location for cgroup controller '${controller}'" >&2
return 7
}

[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Reading cgroup v${controller_version} limit from '${location}'" >&2

local limit
limit=$(cgroup_util_read_cgroupv"$controller_version"_memory_limit "$location") || return

if (( maximum > 0 && limit <= maximum )); then
echo "$limit"
return
else
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Ignoring cgroup memory limit of ${limit} Bytes (exceeds maximum of ${maximum} Bytes)" >&2
return 99
fi
}

# reads a cgroup v1 (memory.limit_in_bytes) or v2 (memory.high, fallback to memory.max, fallback to memory.low, fallback to memory.min)
# optional argument is a file path to fall back to for reading a default value, useful e.g. when reading on a system that has a "fake" limit info file (defaults to '/sys/fs/cgroup/memory/memory.limit_in_bytes')
# if env var CGROUP_UTIL_PROCFS_ROOT is passed, it will be used instead of '/proc' to find '/proc/self/cgroup', '/proc/self/mountinfo' etc (useful for testing, defaults to '/proc')
# if env var CGROUP_UTIL_CGROUPFS_PREFIX is passed, it will be prepended to any /sys/fs/cgroup or similar path used (useful for testing, defaults to '')
# pass a value for env var CGROUP_UTIL_VERBOSE to enable verbose mode
cgroup_util_read_cgroup_memory_limit_with_fallback() {
local fallback=${1-"${CGROUP_UTIL_CGROUPFS_PREFIX-}/sys/fs/cgroup/memory/memory.limit_in_bytes"}

cgroup_util_read_cgroup_memory_limit || {
local retval=$?

if ((retval != 99)) && [[ -r "$fallback" ]]; then
[[ -n ${CGROUP_UTIL_VERBOSE-} ]] && echo "Reading fallback limit from '${fallback}'" >&2
cat "$fallback"
return
fi

return "$retval"
}
}
Loading