[core] Change config defaults to enable io_service metrics #57614
+2
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After some experimentation, the main culprit for the performance degredation is actually from the lag probe being too aggressive. The default lag probe previously being 250ms caused as much as a 20% degredation in performance when used in combination with with enabling io_context metrics. Setting the default to abouve 60s seems to mitigate the issue. To come to this conclusion we tested with the below:
Trail 1: ~400 actors/s <-- way too slow
-RAY_emit_main_serivce_metrics = 1
Trial 2: ~500+ actor/s <-- where we want to be
-RAY_emit_main_serivce_metrics = -1
Trial 3: ~500+ actor/s
-RAY_emit_main_serivce_metrics = 1
-RAY_io_context_event_loop_lag_collection_interval_ms = -1 <-- disabled
Trial 4: ~500+ actor/s <-- bingo!
-RAY_emit_main_serivce_metrics = 1
-RAY_io_context_event_loop_lag_collection_interval_ms = 6000
The default value of 250ms combined with the increased usage of lag probes when the metrics are enabled causes enough degredation as to be noticable. Increasing the interval sufficiently seems to be the way to go to avoid this and have our metrics.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.