Skip to content

Conversation

@yacovm
Copy link
Contributor

@yacovm yacovm commented Nov 15, 2025

Why this should be merged

We currently have metrics for available disk space and we fail the health check if we have less than a predefined amount of space left.

This commit adds metrics for available disk percentage, and fails the health check if we have less than a predefined percentage of space left.

The motivation is that if one wants to re-use the configuration of the validator to several nodes with different volume sizes, using absolute space is less fitting than using percentages.

For example, a volume that started with all of its 2TB free and now has 200GB space left is not the same as than a volume that started with 400GB and has half of its space left.

How this works

Adds a flag --system-tracker-disk-warning-threshold-available-space-percentage that defaults to 3% space left.

Changes AvailableBytes on storage_unix.go, storage_openbsd.go, to return also percentage, not only absolute.

How this was tested

Only tested on Linux, however I looked at the Statfs_t struct of OpenBSD to make sure I picked the right fields.

I took a Fuji node I had and compiled the avalanchego binary from this branch, and re-launched the node and then I created a big file using fallocate -l 400G bigfile and observed the metrics:

image

Soon enough, the node's health check began failing:

curl 'http://localhost:9650/ext/health'  | jq
image

I then deleted the file and observed the metrics:

image

And checked that the node returned to be healthy once more:

image

Need to be documented in RELEASES.md?

Copilot AI review requested due to automatic review settings November 15, 2025 22:53
@yacovm yacovm marked this pull request as draft November 15, 2025 22:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds disk space percentage monitoring alongside the existing absolute disk space tracking. The system now tracks both available disk bytes and available disk percentage, allowing for more flexible disk space health checks.

Key Changes:

  • Modified AvailableBytes() to return both available bytes and percentage of free disk space
  • Added new configuration option for percentage-based disk space warning threshold (default 97%)
  • Updated health checks to monitor both absolute and percentage-based disk space thresholds

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
utils/storage/storage_unix.go Modified to calculate and return percentage of available disk space
utils/resource/usage.go Added percentage tracking fields and methods to resource manager
utils/resource/resourcemock/user.go Added mock implementation for percentage method
utils/resource/no_usage.go Updated no-op implementation return values
snow/networking/tracker/resource_tracker.go Added percentage tracking to disk resource tracker
node/node.go Updated health check to validate percentage threshold
config/node/config.go Added percentage threshold configuration field
config/keys.go Added configuration key for percentage threshold
config/flags.go Added command-line flag for percentage threshold
config/config.go Updated disk space config validation to handle percentage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yacovm yacovm force-pushed the fspercentageLimits branch 8 times, most recently from 2ab9bcb to 068325c Compare November 17, 2025 17:35
@yacovm yacovm force-pushed the fspercentageLimits branch from 068325c to 035cc6b Compare November 17, 2025 17:42
@yacovm yacovm marked this pull request as ready for review November 17, 2025 17:49
Signed-off-by: Yacov Manevich <[email protected]>
node/node.go Outdated
}

if availableDiskPercentage < n.Config.WarningThresholdAvailableDiskSpacePercentage {
err = fmt.Errorf("remaining available disk space percentage (%d%%) is below minimum required available space percentage (%d%%)", availableDiskPercentage, n.Config.WarningThresholdAvailableDiskSpacePercentage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = fmt.Errorf("remaining available disk space percentage (%d%%) is below minimum required available space percentage (%d%%)", availableDiskPercentage, n.Config.WarningThresholdAvailableDiskSpacePercentage)
err = fmt.Errorf("remaining available disk space percentage (%d%%) is below the warning threshold of disk space percentage (%d%%)", availableDiskPercentage, n.Config.WarningThresholdAvailableDiskSpacePercentage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Yacov Manevich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants