Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor and alert xsnap worker memory #10842

Open
mhofman opened this issue Jan 14, 2025 · 0 comments
Open

Monitor and alert xsnap worker memory #10842

mhofman opened this issue Jan 14, 2025 · 0 comments
Labels
enhancement New feature or request telemetry

Comments

@mhofman
Copy link
Member

mhofman commented Jan 14, 2025

What is the Problem Being Solved?

#10841 reminded us that xsnap vats will fail if they attempt to allocate more than 2GB of memory (as see by xsnap's metering). We need to make sure we get alert if any networks we monitor like mainnet has a vat getting anywhere close to this.

Description of the Design

The slog currently reports the uncompressed snapshot size (uncompressedSize) in heap-snapshot-save events, but that doesn't tell us the peak memory usage since it's taken after gc. It is however a good indicator already and should be monitored.

Delivery results also contain an allocate which seems the current allocation of memory including free slots and chunks (which the snapshot seem to exclude). As such the value seem to always be higher than the snapshot size, and may be the correct value to monitor, but it is not currently observed as a metric.

A proxy measurement would be the RSS size of the worker process, but I have seen this vary during the snapshot time.

Regardless of the way we monitor this, we should configure an alert when reaching 1GB. It would be good to have an alert when reaching 500 MB as well since that's the threshold at which state sync stops working as well.

Security Considerations

None

Scaling Considerations

Monitoring this should not introduce undue processing

Test Plan

TBD

Upgrade Considerations

We need to avoid chain software changes to start monitoring this.

@mhofman mhofman added enhancement New feature or request telemetry labels Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request telemetry
Projects
None yet
Development

No branches or pull requests

1 participant