Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
/python/ray/data/llm.py @ray-project/ray-llm
/python/ray/dashboard/modules/metrics/dashboards/serve_llm_dashboard_panels.py @ray-project/ray-llm
/python/ray/dashboard/modules/metrics/dashboards/serve_llm_grafana_dashboard_base.json @ray-project/ray-llm
/doc/source/serve/llm/ @ray-project/ray-llm

# Ray Serve
/python/ray/serve/ @ray-project/ray-serve
Expand Down
6 changes: 3 additions & 3 deletions doc/source/serve/llm/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ Engine Metrics
---------------------
All engine metrics, including vLLM, are available through the Ray metrics export endpoint and are queryable using Prometheus. See `vLLM metrics <https://docs.vllm.ai/en/stable/usage/metrics.html>`_ for a complete list. These are also visualized by the Serve LLM Grafana dashboard. Dashboard panels include: time per output token (TPOT), time to first token (TTFT), and GPU cache utilization.

Engine metric logging is off by default, and must be manually enabled. In addition, you must enable the vLLM V1 engine to use engine metrics. To enable engine-level metric logging, set `log_engine_metrics: True` when configuring the LLM deployment. For example:
Engine metric logging is on by default as of Ray 2.51. To disable engine-level metric logging, set `log_engine_metrics: False` when configuring the LLM deployment. For example:

.. tab-set::

Expand All @@ -320,7 +320,7 @@ Engine metric logging is off by default, and must be manually enabled. In additi
min_replicas=1, max_replicas=2,
)
),
log_engine_metrics=True
log_engine_metrics=False
)

app = build_openai_app({"llm_configs": [llm_config]})
Expand All @@ -343,7 +343,7 @@ Engine metric logging is off by default, and must be manually enabled. In additi
autoscaling_config:
min_replicas: 1
max_replicas: 2
log_engine_metrics: true
log_engine_metrics: false
import_path: ray.serve.llm:build_openai_app
name: llm_app
route_prefix: "/"
Expand Down
2 changes: 1 addition & 1 deletion python/ray/llm/_internal/serve/configs/server_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ class LLMConfig(BaseModelExtended):

log_engine_metrics: Optional[bool] = Field(
default=True,
description="Enable additional engine metrics via Ray Prometheus port. Default is True.",
description="Enable additional engine metrics via Ray Prometheus port.",
)

_supports_vision: bool = PrivateAttr(False)
Expand Down