feat: per-worker active/idle timeline + IFB size logging #1534

youngeunkwon0405 · 2025-11-18T06:38:43Z

What does this PR do ?

This feature will enable visualization of important per-generation-worker performance metrics (IFB size, pending requests for now).
We can expand this to visualize other metrics such as kv cache related, spec dec related (if you employ that). Theoretically, all the detailed performance metrics mentioned in https://docs.vllm.ai/en/v0.7.2/serving/metrics.html.

Here is a visualization example:

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added vLLM metrics collection and injection into training metrics so performance stats are captured per generation step.
- New configurable settings to enable vLLM metrics logging and adjust collection interval.
- Background metrics logging on model-owner workers with optional Prometheus-compatible hooks.
- Enhanced performance visualization to display vLLM metrics (inflight batch sizes, pending samples) per data-parallel shard.

coderabbitai · 2025-11-18T06:46:33Z

📝 Walkthrough

Walkthrough

Adds optional vLLM metrics logging: config flags, a background metrics logger in vLLM workers, RPCs to fetch/clear collected metrics, propagation of those metrics through GRPO training, and visualization support in performance output.

Changes

Cohort / File(s)	Summary
Configuration `examples/configs/grpo_math_1B.yaml`	Added `enable_vllm_metrics_logger: false` and `vllm_metrics_logger_interval: 0.5`.
Generation API `nemo_rl/models/generation/vllm/vllm_generation.py`	Added `get_vllm_logger_metrics()` and `clear_vllm_logger_metrics()` to collect/reset per-DP vLLM metrics via worker RPCs.
vLLM Worker (sync) `nemo_rl/models/generation/vllm/vllm_worker.py`	Spawn background metrics logger thread when enabled; implement `_start_vllm_metrics_logger()`, storage for `inflight_batch_sizes` and `num_pending_samples`, and public `get_/clear_vllm_logger_metrics()` methods.
vLLM Worker (async) `nemo_rl/models/generation/vllm/vllm_worker_async.py`	Conditionally construct `stat_loggers` (using `PrometheusStatLogger`) and pass `stat_loggers` into `AsyncLLM.from_engine_args(...)`.
Training integration `nemo_rl/algorithms/grpo.py`	Clear logger metrics at generation start/cycle boundaries and collect `vllm_logger_metrics` after generation; inject into per-step/batch training metrics under `"vllm_logger_metrics"`.
Visualization `nemo_rl/algorithms/utils.py`	Extended visualization to render vLLM metrics when present: dynamic bar length, timelines for `inflight_batch_sizes`, `num_pending_samples`, and helper timeline rendering functions; uses configured interval for time estimates.

Sequence Diagram(s)

sequenceDiagram
    participant GRPO as GRPO Training
    participant Gen as VllmGeneration
    participant Worker as VllmGenerationWorker
    participant vLLM as vLLM Engine

    rect rgb(240,248,255)
    GRPO->>Gen: request generation (start step)
    Gen->>Worker: RPC clear_vllm_logger_metrics()
    Worker-->>Gen: ack
    end

    rect rgb(245,255,240)
    Gen->>vLLM: perform generation (async/sync)
    vLLM-->>Worker: metrics exposed via Gauge
    Worker->>Worker: background thread polls metrics (interval)
    end

    rect rgb(255,250,240)
    GRPO->>Gen: generation finished
    Gen->>Worker: RPC get_vllm_logger_metrics()
    Worker-->>Gen: {inflight_batch_sizes, num_pending_samples}
    Gen-->>GRPO: return generation results + vllm_logger_metrics
    GRPO->>GRPO: attach metrics to training step metrics
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pay extra attention to:
- Thread lifecycle, stop conditions, and exception handling in vllm_worker.py.
- Correctness of RPC usage and serialization of metric snapshots in vllm_generation.py.
- Integration point in grpo.py for when metrics are cleared/collected relative to generation and refit/sync.
- The AsyncLLM.from_engine_args call-site change in vllm_worker_async.py and compatibility with the async engine API.

Possibly related PRs

feat: Adding perf metrics #1183 — Modifies GRPO training flows and runtime metrics collection; overlaps with metrics collection and propagation changes in this PR.

Suggested labels

Performance

Suggested reviewers

parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR introduces major features (metrics logging, thread background processing, visualization) without test results, performance measurements, or testing documentation.	Add comprehensive unit tests for metrics methods and visualization functions, provide performance benchmarks for background thread overhead, validate training convergence with metrics enabled, resolve thread synchronization issues identified in review, update PR description with actual test results.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main feature additions: per-worker active/idle timeline visualization and IFB (in-flight batch) size logging capabilities.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch youngeunk/vllm-info

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemo_rl/algorithms/utils.py (1)

396-518: Guard vLLM metrics visualization against disabled logger and empty data; address Ruff hints

Two functional issues plus a few style nits here:

KeyError when vLLM metrics logger is disabled

VllmGeneration.get_vllm_logger_metrics returns {} when enable_vllm_metrics_logger is false.
GRPO unconditionally sets metrics["vllm_logger_metrics"] = vllm_logger_metrics, so you’ll see {} in metrics.
The check if vllm_logger_metrics is not None: will pass for {}, and then:
```
vllm_metrics_logger_interval = master_config["policy"]["generation"]["vllm_cfg"]["vllm_metrics_logger_interval"]
```
will raise KeyError for configs that don’t define vllm_metrics_logger_interval (all existing recipes except the one you updated).

visualize_per_worker_timeline can crash on empty metrics

If metric_dict is {} or some lists inside it are empty, this line:
```
max_value = max(max(v) for v in metric_dict.values())
```
will raise ValueError.

Ruff hints (F841/E731/B007)

dp_ranks is unused.
count_zeros is a lambda assigned to a name.
The i loop index is unused.

A minimal, robust fix that also satisfies Ruff:

@@
-    def resize_timeline(data, new_size):
+    def resize_timeline(data, new_size):
         old_size = len(data)
         x_old = np.linspace(0, 1, old_size)
         x_new = np.linspace(0, 1, new_size)
         return np.interp(x_new, x_old, data)
@@
-    def visualize_per_worker_timeline(
-        metric_dict: dict[int, list[int]],
-        metric_name: str,
-        timeline_interval: float | None,
-    ) -> None:
-        dp_ranks = list(metric_dict.keys())
+    def visualize_per_worker_timeline(
+        metric_dict: dict[int, list[int]],
+        metric_name: str,
+        timeline_interval: float | None,
+    ) -> None:
         max_timeline_length = 50
         marker = {0: "□", 1: "⧅", 2: "⛝", 3: "■"}
 
-        max_value = max(max(v) for v in metric_dict.values())
+        # Skip visualization cleanly if there is no data yet
+        non_empty_values = [v for v in metric_dict.values() if v]
+        if not non_empty_values:
+            print(f"  - {metric_name}: no data to display yet")
+            return
+
+        max_value = max(max(v) for v in non_empty_values)
         bin_width = (max_value + 1) / len(marker)
@@
-        print(f"  - {metric_name}:")
+        print(f"  - {metric_name}:")
         print(f"    - Max value: {max_value}")
         print("    - Timeline:")
         for dp_idx, metric_values in metric_dict.items():
             timeline = []
             length = len(metric_values)
             if timeline_interval is not None:
-                count_zeros = lambda x: sum(v == 0 for v in x)
-                idle = count_zeros(metric_values) * timeline_interval
+                def count_zeros(values: list[int]) -> int:
+                    return sum(v == 0 for v in values)
+
+                idle = count_zeros(metric_values) * timeline_interval
                 active = length * timeline_interval - idle
@@
-            else:
-                resized_metric_values = metric_values
-
-            for i, value in enumerate(resized_metric_values):
+            else:
+                resized_metric_values = metric_values
+
+            for value in resized_metric_values:
                 timeline.append(marker[min(int(value // bin_width), len(marker) - 1)])
@@
-    if "vllm_logger_metrics" in metrics:
+    if "vllm_logger_metrics" in metrics:
@@
-        vllm_logger_metrics = metrics["vllm_logger_metrics"]
-
-        if vllm_logger_metrics is not None:
-            vllm_metrics_logger_interval = master_config["policy"]["generation"][
-                "vllm_cfg"
-            ]["vllm_metrics_logger_interval"]
-            print("  • vLLM Logger Metrics:")
-            # Visualize the inflight batch sizes timeline
-            visualize_per_worker_timeline(
-                vllm_logger_metrics["inflight_batch_sizes"],
-                "Inflight Batch Sizes",
-                vllm_metrics_logger_interval,
-            )
-            max_num_pending_samples = max(
-                max(v) for v in vllm_logger_metrics["num_pending_samples"].values()
-            )
-            # If there is at least one pending sample, visualize the timeline
-            if max_num_pending_samples > 0:
-                visualize_per_worker_timeline(
-                    vllm_logger_metrics["num_pending_samples"],
-                    "Num Pending Samples",
-                    None,
-                )
+        vllm_logger_metrics = metrics["vllm_logger_metrics"]
+
+        # Treat {}, None, or missing metrics as "no data" and skip visualization
+        if vllm_logger_metrics:
+            vllm_cfg = master_config["policy"]["generation"].get("vllm_cfg", {})
+            vllm_metrics_logger_interval = vllm_cfg.get(
+                "vllm_metrics_logger_interval", 1.0
+            )
+            print("  • vLLM Logger Metrics:")
+
+            inflight_metrics = vllm_logger_metrics.get("inflight_batch_sizes", {})
+            if inflight_metrics:
+                visualize_per_worker_timeline(
+                    inflight_metrics,
+                    "Inflight Batch Sizes",
+                    vllm_metrics_logger_interval,
+                )
+
+            pending_metrics = vllm_logger_metrics.get("num_pending_samples", {})
+            if pending_metrics:
+                max_num_pending_samples = max(
+                    max(v) for v in pending_metrics.values()
+                )
+                if max_num_pending_samples > 0:
+                    visualize_per_worker_timeline(
+                        pending_metrics,
+                        "Num Pending Samples",
+                        None,
+                    )

This keeps the visualization logic intact but prevents crashes when the logger is disabled or has not yet produced data, and it resolves the Ruff warnings.

nemo_rl/algorithms/grpo.py (1)

1068-1122: Guard vLLM‑specific metric calls in grpo_train to avoid crashes on non‑vLLM backends

In grpo_train, the new calls:

policy_generation.clear_vllm_logger_metrics()
...
vllm_logger_metrics = policy_generation.get_vllm_logger_metrics()
...
metrics["vllm_logger_metrics"] = vllm_logger_metrics

are unconditionally invoked on policy_generation.

However, earlier in the function you explicitly allow policy_generation to be policy (megatron backend):

if policy_generation is None:
    policy_generation = policy  # type: ignore
    NEED_REFIT = False

Policy does not implement clear_vllm_logger_metrics / get_vllm_logger_metrics, so with a non‑vLLM backend this will raise AttributeError on every step.

A minimal fix is to detect support for these methods once and gate usage accordingly, while only attaching non‑empty metrics:

@@
-    if policy_generation is None:
-        policy_generation = policy  # type: ignore
-        NEED_REFIT = False
-    POLICY_GENERATION_STALE = True  # tracks if generation needs a refit before running
-    assert policy_generation is not None  # for mypy type check
+    if policy_generation is None:
+        policy_generation = policy  # type: ignore
+        NEED_REFIT = False
+    POLICY_GENERATION_STALE = True  # tracks if generation needs a refit before running
+    assert policy_generation is not None  # for mypy type check
+
+    # vLLM metrics are only available on vLLM-based generation backends
+    supports_vllm_logger_metrics = hasattr(
+        policy_generation, "get_vllm_logger_metrics"
+    ) and hasattr(policy_generation, "clear_vllm_logger_metrics")
@@
-                with timer.time("generation"):
-                    # Clear vLLM logger metrics for each generation step
-                    policy_generation.clear_vllm_logger_metrics()
+                with timer.time("generation"):
+                    # Clear vLLM logger metrics for each generation step (if supported)
+                    if supports_vllm_logger_metrics:
+                        policy_generation.clear_vllm_logger_metrics()
@@
-                    policy_generation.finish_generation()
-                    # Collect vLLM logger metrics for performance reporting after each generation step
-                    # inflight batch sizes and num pending samples are collected from each vLLM worker
-                    vllm_logger_metrics = policy_generation.get_vllm_logger_metrics()
+                    policy_generation.finish_generation()
+                    # Collect vLLM logger metrics for performance reporting (if supported)
+                    # inflight batch sizes and num pending samples are collected from each vLLM worker
+                    vllm_logger_metrics = (
+                        policy_generation.get_vllm_logger_metrics()
+                        if supports_vllm_logger_metrics
+                        else None
+                    )
@@
-                metrics.update(rollout_metrics)
-                metrics["vllm_logger_metrics"] = vllm_logger_metrics
+                metrics.update(rollout_metrics)
+                if vllm_logger_metrics:
+                    metrics["vllm_logger_metrics"] = vllm_logger_metrics

This keeps vLLM metrics for vLLM backends, while leaving Megatron (and any other non‑vLLM backend) unaffected.

Also applies to: 1340-1341

🧹 Nitpick comments (4)

nemo_rl/algorithms/grpo.py (1)
1927-1929: Async GRPO vLLM metrics wiring is reasonable; consider the same hasattr guard for robustness

In async_grpo_train:
You clear metrics once at the start and after each refit:
policy_generation.clear_vllm_logger_metrics()
...
policy_generation.clear_vllm_logger_metrics()
You collect them once per training step and attach to metrics:
vllm_logger_metrics = policy_generation.get_vllm_logger_metrics()
...
metrics["vllm_logger_metrics"] = vllm_logger_metrics
Given the function starts with:
assert _should_use_async_rollouts(master_config)
and async rollouts require vLLM, this is logically consistent and should work as intended.

For symmetry with grpo_train and to make the code more robust to future changes in how async_grpo_train is invoked, you could add the same capability check and only attach non‑empty metrics:
@@
-    POLICY_GENERATION_STALE = True
-    assert policy_generation is not None
+    POLICY_GENERATION_STALE = True
+    assert policy_generation is not None
+
+    supports_vllm_logger_metrics = hasattr(
+        policy_generation, "get_vllm_logger_metrics"
+    ) and hasattr(policy_generation, "clear_vllm_logger_metrics")
@@
-    # Clear vLLM logger metrics after at start of training
-    policy_generation.clear_vllm_logger_metrics()
+    # Clear vLLM logger metrics at start of training (if supported)
+    if supports_vllm_logger_metrics:
+        policy_generation.clear_vllm_logger_metrics()
@@
-                # Collect vLLM logger metrics for performance reporting
-                # inflight batch sizes and num pending samples are collected from each vLLM worker
-                vllm_logger_metrics = policy_generation.get_vllm_logger_metrics()
+                # Collect vLLM logger metrics for performance reporting (if supported)
+                # inflight batch sizes and num pending samples are collected from each vLLM worker
+                vllm_logger_metrics = (
+                    policy_generation.get_vllm_logger_metrics()
+                    if supports_vllm_logger_metrics
+                    else None
+                )
@@
-                # Clear vLLM logger metrics after each refit (weight sync), starting a new logging cycle
-                policy_generation.clear_vllm_logger_metrics()
+                # Clear vLLM logger metrics after each refit (weight sync), starting a new logging cycle
+                if supports_vllm_logger_metrics:
+                    policy_generation.clear_vllm_logger_metrics()
@@
-                metrics.update(rollout_metrics)
-                metrics["vllm_logger_metrics"] = vllm_logger_metrics
+                metrics.update(rollout_metrics)
+                if vllm_logger_metrics:
+                    metrics["vllm_logger_metrics"] = vllm_logger_metrics
This is mostly defensive, but keeps behavior clear and aligned with the sync GRPO path.

Also applies to: 2148-2151, 2172-2174, 2252-2253
nemo_rl/models/generation/vllm/vllm_generation.py (1)
822-865: vLLM logger metrics collection is well-structured; tighten zip with strict=True

The DP-leader fan-out (get_dp_leader_worker_idx + run_single_worker_single_data) and aggregation into "inflight_batch_sizes" / "num_pending_samples" per dp_idx look correct and are properly gated on enable_vllm_metrics_logger.

The project supports Python 3.10 as the minimum version, so you can safely use zip(..., strict=True) to catch any future mismatch between dp_indices and results, satisfying Ruff (B905):
-        for dp_idx, stats in zip(dp_indices, results):
+        for dp_idx, stats in zip(dp_indices, results, strict=True):
nemo_rl/models/generation/vllm/vllm_worker.py (2)
374-374: Extract magic number to a named constant.

The initial 2.0-second delay is hardcoded. Consider making it a configuration parameter or at minimum a named constant for clarity.
+        # Initial delay to allow vLLM engine to stabilize before collecting metrics
+        METRICS_LOGGER_INITIAL_DELAY_S = 2.0
+
         def _logger_loop():
             # Delay a little to let engine settle
-            time.sleep(min(2.0, interval_s))
+            time.sleep(min(METRICS_LOGGER_INITIAL_DELAY_S, interval_s))
380-382: Use proper logging framework instead of print statements.

The code uses print() for logging, which is inconsistent with the rest of the codebase (e.g., line 165 uses init_logger). Print statements also lack log levels, timestamps, and integration with log aggregation systems.

Consider initializing a logger at the module level or class level:
from vllm.logger import init_logger
logger = init_logger(__name__)
Then replace print statements with appropriate log levels:

logger.warning() for the exception handlers

logger.info() for the startup messages

Note: The emoji decorators may not render well in all environments, consider removing or making them optional.

Also applies to: 397-400, 404-407, 411-413, 420-422

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c32778d and e419d91.

📒 Files selected for processing (6)

examples/configs/grpo_math_1B.yaml (1 hunks)
nemo_rl/algorithms/grpo.py (7 hunks)
nemo_rl/algorithms/utils.py (2 hunks)
nemo_rl/models/generation/vllm/vllm_generation.py (1 hunks)
nemo_rl/models/generation/vllm/vllm_worker.py (2 hunks)
nemo_rl/models/generation/vllm/vllm_worker_async.py (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-18T14:20:36.297Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.

Applied to files:

nemo_rl/algorithms/grpo.py

🧬 Code graph analysis (4)

nemo_rl/models/generation/vllm/vllm_generation.py (2)

nemo_rl/models/generation/vllm/vllm_worker.py (2)

get_vllm_logger_metrics (425-432)

clear_vllm_logger_metrics (434-438)

nemo_rl/distributed/worker_groups.py (4)

dp_size (627-629)

get_dp_leader_worker_idx (404-411)

run_single_worker_single_data (631-656)

run_all_workers_single_data (755-799)

nemo_rl/algorithms/utils.py (1)

tests/check_metrics.py (2)

min (25-27)

max (30-32)

nemo_rl/algorithms/grpo.py (2)

nemo_rl/models/generation/vllm/vllm_generation.py (2)

clear_vllm_logger_metrics (858-865)

get_vllm_logger_metrics (822-856)

nemo_rl/models/generation/vllm/vllm_worker.py (2)

clear_vllm_logger_metrics (434-438)

get_vllm_logger_metrics (425-432)

nemo_rl/models/generation/vllm/vllm_worker.py (1)

nemo_rl/models/generation/vllm/vllm_generation.py (2)

get_vllm_logger_metrics (822-856)

clear_vllm_logger_metrics (858-865)

🪛 GitHub Actions: CICD NeMo RL

nemo_rl/models/generation/vllm/vllm_worker_async.py

[error] 1-1: Pre-commit hook 'ruff' failed. No such file or directory (os error 2). 1 file was modified by this hook. Command: 'pre-commit run --all-files --show-diff-on-failure --color=always'.

🪛 Ruff (0.14.5)

nemo_rl/models/generation/vllm/vllm_generation.py

844-844: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

nemo_rl/algorithms/utils.py

459-459: Local variable dp_ranks is assigned to but never used

Remove assignment to unused variable dp_ranks

(F841)

473-473: Do not assign a lambda expression, use a def

Rewrite count_zeros as a def

(E731)

483-483: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

nemo_rl/models/generation/vllm/vllm_worker.py

378-378: Do not catch blind exception: Exception

(BLE001)

396-396: Do not catch blind exception: Exception

(BLE001)

402-402: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-container / main
GitHub Check: sphinx-build / Build docs
GitHub Check: Lint check

🔇 Additional comments (4)

examples/configs/grpo_math_1B.yaml (1)

232-233: vLLM metrics logger config looks reasonable

The new enable_vllm_metrics_logger flag and vllm_metrics_logger_interval value are consistent with the runtime wiring in the vLLM generation/worker code and provide a sensible default (disabled, 0.5s interval).

nemo_rl/models/generation/vllm/vllm_worker_async.py (1)

129-148: AsyncLLM stat_loggers integration is correct and compatible

vLLM v1 exposes PrometheusStatLogger at vllm.v1.metrics.loggers.PrometheusStatLogger and AsyncLLM.from_engine_args accepts a stat_loggers parameter. The implementation correctly gates the logger on the enable_vllm_metrics_logger config flag and passes it to AsyncLLM.from_engine_args, keeping the async path clean when disabled.

nemo_rl/models/generation/vllm/vllm_worker.py (2)

19-20: LGTM!

The threading and time imports are appropriate for the background metrics logger functionality.

332-338: LGTM!

The conditional initialization correctly gates the metrics logger feature on both the enable flag and async engine requirement.

nemo_rl/models/generation/vllm/vllm_worker.py

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 · 2025-11-20T17:38:49Z

Hi @parthchadha and @terrykong, this PR is now ready for review and merge.
Can I ask for your review please?

This PR will provide the most critical visibility feature of the vllm performance, the per-worker "inflight-batch-size" timeline and the "pending queue status".

parthchadha · 2025-11-20T20:41:56Z

Thanks for this feature @youngeunkwon0405 , its super cool to be able to see rollout workload distributed across cluster!

parthchadha · 2025-11-20T20:43:07Z

examples/configs/grpo_math_1B.yaml

      use_deep_gemm: False
      num_last_layers_in_bf16: 0
      num_first_layers_in_bf16: 0
+      enable_vllm_metrics_logger: true # Set to true to enable vLLM internal metrics logger, turn off for better performance


Do we know how much performance hit do we take with metrics enabled?

It wasn't noticeable to me. But, theoretically, it can have some overhead. I will run some iso-config runs in our perf tracker that has exposed generation time and see how much perf impact there is.

I checked the perf difference in QWEN3 30B model. There was no noticeable perf difference.

nemo_rl/algorithms/utils.py

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 · 2025-11-21T04:40:56Z

Hi @parthchadha, all your initial comments are addressed. Can I ask for your final review and potentially a merge if everything looks good to you?

youngeunkwon0405 self-assigned this Nov 18, 2025

youngeunkwon0405 requested review from a team as code owners November 18, 2025 06:38

youngeunkwon0405 force-pushed the youngeunk/vllm-info branch from 28ac270 to e419d91 Compare November 18, 2025 06:41

youngeunkwon0405 added the CI:L1 Run doctests, unit tests, and functional tests label Nov 18, 2025

youngeunkwon0405 had a problem deploying to nemo-ci November 18, 2025 06:48 — with GitHub Actions Error

youngeunkwon0405 marked this pull request as draft November 18, 2025 06:56

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 18, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 18, 2025 07:01 — with GitHub Actions Inactive

youngeunkwon0405 temporarily deployed to nemo-ci November 18, 2025 07:24 — with GitHub Actions Inactive

youngeunkwon0405 added 5 commits November 18, 2025 00:27

vllm logger

b90ec13

Signed-off-by: Youngeun Kwon <[email protected]>

add config

f0a0666

Signed-off-by: Youngeun Kwon <[email protected]>

draft per dp print

f0e0f85

Signed-off-by: Youngeun Kwon <[email protected]>

polish the PR

45d9680

Signed-off-by: Youngeun Kwon <[email protected]>

change marker

f7ca986

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 force-pushed the youngeunk/vllm-info branch from 9b72c7b to f7ca986 Compare November 18, 2025 08:28

youngeunkwon0405 added 3 commits November 18, 2025 01:05

revise print

44f491e

Signed-off-by: Youngeun Kwon <[email protected]>

fix the exposed gen time truncation

3cd5a8c

Signed-off-by: Youngeun Kwon <[email protected]>

fix the error and make it by default

a081752

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 force-pushed the youngeunk/vllm-info branch 2 times, most recently from 7b53c9a to 39947a2 Compare November 18, 2025 22:46

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 18, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 18, 2025 22:52 — with GitHub Actions Inactive

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 18, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 19, 2025 08:21 — with GitHub Actions Inactive

youngeunkwon0405 had a problem deploying to nemo-ci November 19, 2025 10:46 — with GitHub Actions Failure

pipe clean

11a5a02

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 force-pushed the youngeunk/vllm-info branch from 09e77d5 to 11a5a02 Compare November 19, 2025 17:36

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 19, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 19, 2025 17:38 — with GitHub Actions Inactive

youngeunkwon0405 had a problem deploying to nemo-ci November 19, 2025 20:11 — with GitHub Actions Failure

corner case handling

4d2cd06

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 19, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 19, 2025 23:56 — with GitHub Actions Inactive

youngeunkwon0405 temporarily deployed to nemo-ci November 20, 2025 02:18 — with GitHub Actions Inactive

youngeunkwon0405 requested a review from terrykong November 20, 2025 17:35

youngeunkwon0405 marked this pull request as ready for review November 20, 2025 17:35

youngeunkwon0405 requested review from guyueh1 and parthchadha November 20, 2025 17:35

parthchadha requested changes Nov 20, 2025

View reviewed changes

increase the limit

4f349a4

Signed-off-by: Youngeun Kwon <[email protected]>

youngeunkwon0405 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 20, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 20, 2025 21:58 — with GitHub Actions Inactive

youngeunkwon0405 temporarily deployed to nemo-ci November 20, 2025 23:07 — with GitHub Actions Inactive

youngeunkwon0405 requested a review from parthchadha November 21, 2025 01:49

youngeunkwon0405 temporarily deployed to nemo-ci November 21, 2025 02:26 — with GitHub Actions Inactive

feat: per-worker active/idle timeline + IFB size logging #1534

Are you sure you want to change the base?

feat: per-worker active/idle timeline + IFB size logging #1534

Conversation

youngeunkwon0405 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youngeunkwon0405 commented Nov 20, 2025

Uh oh!

parthchadha commented Nov 20, 2025

Uh oh!

parthchadha Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

youngeunkwon0405 Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

youngeunkwon0405 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youngeunkwon0405 commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

youngeunkwon0405 commented Nov 18, 2025 •

edited

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading