-
Notifications
You must be signed in to change notification settings - Fork 169
feat: per-worker active/idle timeline + IFB size logging #1534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
28ac270 to
e419d91
Compare
📝 WalkthroughWalkthroughAdds optional vLLM metrics logging: config flags, a background metrics logger in vLLM workers, RPCs to fetch/clear collected metrics, propagation of those metrics through GRPO training, and visualization support in performance output. Changes
Sequence Diagram(s)sequenceDiagram
participant GRPO as GRPO Training
participant Gen as VllmGeneration
participant Worker as VllmGenerationWorker
participant vLLM as vLLM Engine
rect rgb(240,248,255)
GRPO->>Gen: request generation (start step)
Gen->>Worker: RPC clear_vllm_logger_metrics()
Worker-->>Gen: ack
end
rect rgb(245,255,240)
Gen->>vLLM: perform generation (async/sync)
vLLM-->>Worker: metrics exposed via Gauge
Worker->>Worker: background thread polls metrics (interval)
end
rect rgb(255,250,240)
GRPO->>Gen: generation finished
Gen->>Worker: RPC get_vllm_logger_metrics()
Worker-->>Gen: {inflight_batch_sizes, num_pending_samples}
Gen-->>GRPO: return generation results + vllm_logger_metrics
GRPO->>GRPO: attach metrics to training step metrics
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nemo_rl/algorithms/utils.py (1)
396-518: Guard vLLM metrics visualization against disabled logger and empty data; address Ruff hintsTwo functional issues plus a few style nits here:
- KeyError when vLLM metrics logger is disabled
VllmGeneration.get_vllm_logger_metricsreturns{}whenenable_vllm_metrics_loggeris false.GRPO unconditionally sets
metrics["vllm_logger_metrics"] = vllm_logger_metrics, so you’ll see{}inmetrics.The check
if vllm_logger_metrics is not None:will pass for{}, and then:vllm_metrics_logger_interval = master_config["policy"]["generation"]["vllm_cfg"]["vllm_metrics_logger_interval"]will raise
KeyErrorfor configs that don’t definevllm_metrics_logger_interval(all existing recipes except the one you updated).
visualize_per_worker_timelinecan crash on empty metrics
If
metric_dictis{}or some lists inside it are empty, this line:max_value = max(max(v) for v in metric_dict.values())will raise
ValueError.
- Ruff hints (F841/E731/B007)
dp_ranksis unused.count_zerosis a lambda assigned to a name.- The
iloop index is unused.A minimal, robust fix that also satisfies Ruff:
@@ - def resize_timeline(data, new_size): + def resize_timeline(data, new_size): old_size = len(data) x_old = np.linspace(0, 1, old_size) x_new = np.linspace(0, 1, new_size) return np.interp(x_new, x_old, data) @@ - def visualize_per_worker_timeline( - metric_dict: dict[int, list[int]], - metric_name: str, - timeline_interval: float | None, - ) -> None: - dp_ranks = list(metric_dict.keys()) + def visualize_per_worker_timeline( + metric_dict: dict[int, list[int]], + metric_name: str, + timeline_interval: float | None, + ) -> None: max_timeline_length = 50 marker = {0: "□", 1: "⧅", 2: "⛝", 3: "■"} - max_value = max(max(v) for v in metric_dict.values()) + # Skip visualization cleanly if there is no data yet + non_empty_values = [v for v in metric_dict.values() if v] + if not non_empty_values: + print(f" - {metric_name}: no data to display yet") + return + + max_value = max(max(v) for v in non_empty_values) bin_width = (max_value + 1) / len(marker) @@ - print(f" - {metric_name}:") + print(f" - {metric_name}:") print(f" - Max value: {max_value}") print(" - Timeline:") for dp_idx, metric_values in metric_dict.items(): timeline = [] length = len(metric_values) if timeline_interval is not None: - count_zeros = lambda x: sum(v == 0 for v in x) - idle = count_zeros(metric_values) * timeline_interval + def count_zeros(values: list[int]) -> int: + return sum(v == 0 for v in values) + + idle = count_zeros(metric_values) * timeline_interval active = length * timeline_interval - idle @@ - else: - resized_metric_values = metric_values - - for i, value in enumerate(resized_metric_values): + else: + resized_metric_values = metric_values + + for value in resized_metric_values: timeline.append(marker[min(int(value // bin_width), len(marker) - 1)]) @@ - if "vllm_logger_metrics" in metrics: + if "vllm_logger_metrics" in metrics: @@ - vllm_logger_metrics = metrics["vllm_logger_metrics"] - - if vllm_logger_metrics is not None: - vllm_metrics_logger_interval = master_config["policy"]["generation"][ - "vllm_cfg" - ]["vllm_metrics_logger_interval"] - print(" • vLLM Logger Metrics:") - # Visualize the inflight batch sizes timeline - visualize_per_worker_timeline( - vllm_logger_metrics["inflight_batch_sizes"], - "Inflight Batch Sizes", - vllm_metrics_logger_interval, - ) - max_num_pending_samples = max( - max(v) for v in vllm_logger_metrics["num_pending_samples"].values() - ) - # If there is at least one pending sample, visualize the timeline - if max_num_pending_samples > 0: - visualize_per_worker_timeline( - vllm_logger_metrics["num_pending_samples"], - "Num Pending Samples", - None, - ) + vllm_logger_metrics = metrics["vllm_logger_metrics"] + + # Treat {}, None, or missing metrics as "no data" and skip visualization + if vllm_logger_metrics: + vllm_cfg = master_config["policy"]["generation"].get("vllm_cfg", {}) + vllm_metrics_logger_interval = vllm_cfg.get( + "vllm_metrics_logger_interval", 1.0 + ) + print(" • vLLM Logger Metrics:") + + inflight_metrics = vllm_logger_metrics.get("inflight_batch_sizes", {}) + if inflight_metrics: + visualize_per_worker_timeline( + inflight_metrics, + "Inflight Batch Sizes", + vllm_metrics_logger_interval, + ) + + pending_metrics = vllm_logger_metrics.get("num_pending_samples", {}) + if pending_metrics: + max_num_pending_samples = max( + max(v) for v in pending_metrics.values() + ) + if max_num_pending_samples > 0: + visualize_per_worker_timeline( + pending_metrics, + "Num Pending Samples", + None, + )This keeps the visualization logic intact but prevents crashes when the logger is disabled or has not yet produced data, and it resolves the Ruff warnings.
nemo_rl/algorithms/grpo.py (1)
1068-1122: Guard vLLM‑specific metric calls ingrpo_trainto avoid crashes on non‑vLLM backendsIn
grpo_train, the new calls:policy_generation.clear_vllm_logger_metrics() ... vllm_logger_metrics = policy_generation.get_vllm_logger_metrics() ... metrics["vllm_logger_metrics"] = vllm_logger_metricsare unconditionally invoked on
policy_generation.However, earlier in the function you explicitly allow
policy_generationto bepolicy(megatron backend):if policy_generation is None: policy_generation = policy # type: ignore NEED_REFIT = False
Policydoes not implementclear_vllm_logger_metrics/get_vllm_logger_metrics, so with a non‑vLLM backend this will raiseAttributeErroron every step.A minimal fix is to detect support for these methods once and gate usage accordingly, while only attaching non‑empty metrics:
@@ - if policy_generation is None: - policy_generation = policy # type: ignore - NEED_REFIT = False - POLICY_GENERATION_STALE = True # tracks if generation needs a refit before running - assert policy_generation is not None # for mypy type check + if policy_generation is None: + policy_generation = policy # type: ignore + NEED_REFIT = False + POLICY_GENERATION_STALE = True # tracks if generation needs a refit before running + assert policy_generation is not None # for mypy type check + + # vLLM metrics are only available on vLLM-based generation backends + supports_vllm_logger_metrics = hasattr( + policy_generation, "get_vllm_logger_metrics" + ) and hasattr(policy_generation, "clear_vllm_logger_metrics") @@ - with timer.time("generation"): - # Clear vLLM logger metrics for each generation step - policy_generation.clear_vllm_logger_metrics() + with timer.time("generation"): + # Clear vLLM logger metrics for each generation step (if supported) + if supports_vllm_logger_metrics: + policy_generation.clear_vllm_logger_metrics() @@ - policy_generation.finish_generation() - # Collect vLLM logger metrics for performance reporting after each generation step - # inflight batch sizes and num pending samples are collected from each vLLM worker - vllm_logger_metrics = policy_generation.get_vllm_logger_metrics() + policy_generation.finish_generation() + # Collect vLLM logger metrics for performance reporting (if supported) + # inflight batch sizes and num pending samples are collected from each vLLM worker + vllm_logger_metrics = ( + policy_generation.get_vllm_logger_metrics() + if supports_vllm_logger_metrics + else None + ) @@ - metrics.update(rollout_metrics) - metrics["vllm_logger_metrics"] = vllm_logger_metrics + metrics.update(rollout_metrics) + if vllm_logger_metrics: + metrics["vllm_logger_metrics"] = vllm_logger_metricsThis keeps vLLM metrics for vLLM backends, while leaving Megatron (and any other non‑vLLM backend) unaffected.
Also applies to: 1340-1341
🧹 Nitpick comments (4)
nemo_rl/algorithms/grpo.py (1)
1927-1929: Async GRPO vLLM metrics wiring is reasonable; consider the samehasattrguard for robustnessIn
async_grpo_train:
- You clear metrics once at the start and after each refit:
policy_generation.clear_vllm_logger_metrics() ... policy_generation.clear_vllm_logger_metrics()- You collect them once per training step and attach to metrics:
vllm_logger_metrics = policy_generation.get_vllm_logger_metrics() ... metrics["vllm_logger_metrics"] = vllm_logger_metricsGiven the function starts with:
assert _should_use_async_rollouts(master_config)and async rollouts require vLLM, this is logically consistent and should work as intended.
For symmetry with
grpo_trainand to make the code more robust to future changes in howasync_grpo_trainis invoked, you could add the same capability check and only attach non‑empty metrics:@@ - POLICY_GENERATION_STALE = True - assert policy_generation is not None + POLICY_GENERATION_STALE = True + assert policy_generation is not None + + supports_vllm_logger_metrics = hasattr( + policy_generation, "get_vllm_logger_metrics" + ) and hasattr(policy_generation, "clear_vllm_logger_metrics") @@ - # Clear vLLM logger metrics after at start of training - policy_generation.clear_vllm_logger_metrics() + # Clear vLLM logger metrics at start of training (if supported) + if supports_vllm_logger_metrics: + policy_generation.clear_vllm_logger_metrics() @@ - # Collect vLLM logger metrics for performance reporting - # inflight batch sizes and num pending samples are collected from each vLLM worker - vllm_logger_metrics = policy_generation.get_vllm_logger_metrics() + # Collect vLLM logger metrics for performance reporting (if supported) + # inflight batch sizes and num pending samples are collected from each vLLM worker + vllm_logger_metrics = ( + policy_generation.get_vllm_logger_metrics() + if supports_vllm_logger_metrics + else None + ) @@ - # Clear vLLM logger metrics after each refit (weight sync), starting a new logging cycle - policy_generation.clear_vllm_logger_metrics() + # Clear vLLM logger metrics after each refit (weight sync), starting a new logging cycle + if supports_vllm_logger_metrics: + policy_generation.clear_vllm_logger_metrics() @@ - metrics.update(rollout_metrics) - metrics["vllm_logger_metrics"] = vllm_logger_metrics + metrics.update(rollout_metrics) + if vllm_logger_metrics: + metrics["vllm_logger_metrics"] = vllm_logger_metricsThis is mostly defensive, but keeps behavior clear and aligned with the sync GRPO path.
Also applies to: 2148-2151, 2172-2174, 2252-2253
nemo_rl/models/generation/vllm/vllm_generation.py (1)
822-865: vLLM logger metrics collection is well-structured; tighten zip withstrict=TrueThe DP-leader fan-out (
get_dp_leader_worker_idx+run_single_worker_single_data) and aggregation into"inflight_batch_sizes"/"num_pending_samples"perdp_idxlook correct and are properly gated onenable_vllm_metrics_logger.The project supports Python 3.10 as the minimum version, so you can safely use
zip(..., strict=True)to catch any future mismatch betweendp_indicesandresults, satisfying Ruff (B905):- for dp_idx, stats in zip(dp_indices, results): + for dp_idx, stats in zip(dp_indices, results, strict=True):nemo_rl/models/generation/vllm/vllm_worker.py (2)
374-374: Extract magic number to a named constant.The initial 2.0-second delay is hardcoded. Consider making it a configuration parameter or at minimum a named constant for clarity.
+ # Initial delay to allow vLLM engine to stabilize before collecting metrics + METRICS_LOGGER_INITIAL_DELAY_S = 2.0 + def _logger_loop(): # Delay a little to let engine settle - time.sleep(min(2.0, interval_s)) + time.sleep(min(METRICS_LOGGER_INITIAL_DELAY_S, interval_s))
380-382: Use proper logging framework instead of print statements.The code uses
print()for logging, which is inconsistent with the rest of the codebase (e.g., line 165 usesinit_logger). Print statements also lack log levels, timestamps, and integration with log aggregation systems.Consider initializing a logger at the module level or class level:
from vllm.logger import init_logger logger = init_logger(__name__)Then replace print statements with appropriate log levels:
logger.warning()for the exception handlerslogger.info()for the startup messagesNote: The emoji decorators may not render well in all environments, consider removing or making them optional.
Also applies to: 397-400, 404-407, 411-413, 420-422
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
examples/configs/grpo_math_1B.yaml(1 hunks)nemo_rl/algorithms/grpo.py(7 hunks)nemo_rl/algorithms/utils.py(2 hunks)nemo_rl/models/generation/vllm/vllm_generation.py(1 hunks)nemo_rl/models/generation/vllm/vllm_worker.py(2 hunks)nemo_rl/models/generation/vllm/vllm_worker_async.py(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T14:20:36.297Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.
Applied to files:
nemo_rl/algorithms/grpo.py
🧬 Code graph analysis (4)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
nemo_rl/models/generation/vllm/vllm_worker.py (2)
get_vllm_logger_metrics(425-432)clear_vllm_logger_metrics(434-438)nemo_rl/distributed/worker_groups.py (4)
dp_size(627-629)get_dp_leader_worker_idx(404-411)run_single_worker_single_data(631-656)run_all_workers_single_data(755-799)
nemo_rl/algorithms/utils.py (1)
tests/check_metrics.py (2)
min(25-27)max(30-32)
nemo_rl/algorithms/grpo.py (2)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
clear_vllm_logger_metrics(858-865)get_vllm_logger_metrics(822-856)nemo_rl/models/generation/vllm/vllm_worker.py (2)
clear_vllm_logger_metrics(434-438)get_vllm_logger_metrics(425-432)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
get_vllm_logger_metrics(822-856)clear_vllm_logger_metrics(858-865)
🪛 GitHub Actions: CICD NeMo RL
nemo_rl/models/generation/vllm/vllm_worker_async.py
[error] 1-1: Pre-commit hook 'ruff' failed. No such file or directory (os error 2). 1 file was modified by this hook. Command: 'pre-commit run --all-files --show-diff-on-failure --color=always'.
🪛 Ruff (0.14.5)
nemo_rl/models/generation/vllm/vllm_generation.py
844-844: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
nemo_rl/algorithms/utils.py
459-459: Local variable dp_ranks is assigned to but never used
Remove assignment to unused variable dp_ranks
(F841)
473-473: Do not assign a lambda expression, use a def
Rewrite count_zeros as a def
(E731)
483-483: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
nemo_rl/models/generation/vllm/vllm_worker.py
378-378: Do not catch blind exception: Exception
(BLE001)
396-396: Do not catch blind exception: Exception
(BLE001)
402-402: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-container / main
- GitHub Check: sphinx-build / Build docs
- GitHub Check: Lint check
🔇 Additional comments (4)
examples/configs/grpo_math_1B.yaml (1)
232-233: vLLM metrics logger config looks reasonableThe new
enable_vllm_metrics_loggerflag andvllm_metrics_logger_intervalvalue are consistent with the runtime wiring in the vLLM generation/worker code and provide a sensible default (disabled, 0.5s interval).nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
129-148: AsyncLLM stat_loggers integration is correct and compatiblevLLM v1 exposes PrometheusStatLogger at
vllm.v1.metrics.loggers.PrometheusStatLoggerand AsyncLLM.from_engine_args accepts astat_loggersparameter. The implementation correctly gates the logger on theenable_vllm_metrics_loggerconfig flag and passes it toAsyncLLM.from_engine_args, keeping the async path clean when disabled.nemo_rl/models/generation/vllm/vllm_worker.py (2)
19-20: LGTM!The threading and time imports are appropriate for the background metrics logger functionality.
332-338: LGTM!The conditional initialization correctly gates the metrics logger feature on both the enable flag and async engine requirement.
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
9b72c7b to
f7ca986
Compare
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
7b53c9a to
39947a2
Compare
Signed-off-by: Youngeun Kwon <[email protected]>
09e77d5 to
11a5a02
Compare
Signed-off-by: Youngeun Kwon <[email protected]>
|
Hi @parthchadha and @terrykong, this PR is now ready for review and merge. This PR will provide the most critical visibility feature of the vllm performance, the per-worker "inflight-batch-size" timeline and the "pending queue status". |
|
Thanks for this feature @youngeunkwon0405 , its super cool to be able to see rollout workload distributed across cluster! |
| use_deep_gemm: False | ||
| num_last_layers_in_bf16: 0 | ||
| num_first_layers_in_bf16: 0 | ||
| enable_vllm_metrics_logger: true # Set to true to enable vLLM internal metrics logger, turn off for better performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how much performance hit do we take with metrics enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't noticeable to me. But, theoretically, it can have some overhead. I will run some iso-config runs in our perf tracker that has exposed generation time and see how much perf impact there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the perf difference in QWEN3 30B model. There was no noticeable perf difference.
Signed-off-by: Youngeun Kwon <[email protected]>
|
Hi @parthchadha, all your initial comments are addressed. Can I ask for your final review and potentially a merge if everything looks good to you? |
What does this PR do ?
This feature will enable visualization of important per-generation-worker performance metrics (IFB size, pending requests for now).
We can expand this to visualize other metrics such as kv cache related, spec dec related (if you employ that). Theoretically, all the detailed performance metrics mentioned in https://docs.vllm.ai/en/v0.7.2/serving/metrics.html.
Here is a visualization example:
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit