Skip to content

Conversation

@Harshit28j
Copy link
Collaborator

Relevant issues

Fixes a bug where StopAsyncIteration (and other control flow exceptions) were incorrectly counted as deployment failures in Prometheus metrics. Also fixes NameError and mypy issues in prometheus.py.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

image

@vercel
Copy link

vercel bot commented Feb 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 8, 2026 3:55pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

The PR updates PrometheusLogger to avoid counting StopAsyncIteration/StopIteration as deployment failures in the litellm_deployment_failure_responses Prometheus metric, and tightens imports/type-checking in litellm/integrations/prometheus.py. It also adds an enterprise callback test to ensure StopAsyncIteration does not increment the failure metric.

Main gap: the implementation only exempts StopAsyncIteration/StopIteration, but the PR description indicates other control-flow exceptions should be excluded as well; termination exceptions like GeneratorExit are still counted as failures unless added to the control-flow allowlist.

Confidence Score: 3/5

  • Moderately safe to merge, but metrics behavior may still be incorrect for some control-flow terminations.
  • Core change is small and targeted, and a regression test was added for StopAsyncIteration. However, the control-flow exception allowlist appears incomplete relative to the PR’s stated intent (e.g., GeneratorExit still counts as a failure), and the new test doesn’t lock in expected behavior for total-request metrics in the same scenario.
  • litellm/integrations/prometheus.py; tests/enterprise/litellm_enterprise/enterprise_callbacks/test_prometheus_logging_callbacks.py

Important Files Changed

Filename Overview
litellm/integrations/prometheus.py Pending: need to inspect diff to verify StopAsyncIteration handling and metrics behavior.
tests/enterprise/litellm_enterprise/enterprise_callbacks/test_prometheus_logging_callbacks.py Pending: need to inspect diff to verify test coverage and correctness.

Sequence Diagram

sequenceDiagram
    participant App as LiteLLM
    participant PL as PrometheusLogger
    participant PF as prometheus_label_factory
    participant PM as litellm_deployment_failure_responses

    App->>PL: set_llm_deployment_failure_metrics(kwargs)
    Note over App,PL: kwargs may include exception

    alt exception != None AND not control-flow
        PL->>PL: _is_control_flow_exception(exception) == false
        PL->>PF: build labels (deployment_id, api_base, provider,...)
        PF-->>PL: _labels
        PL->>PM: labels(**_labels).inc()
    else exception is None OR control-flow (StopAsyncIteration/StopIteration)
        PL->>PL: _is_control_flow_exception(exception) == true/exception None
        Note over PL,PM: Skip incrementing failure metric
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Additional Comments (1)

litellm/integrations/prometheus.py
Control-flow list incomplete

_CONTROL_FLOW_EXCEPTIONS only includes StopAsyncIteration/StopIteration, but the PR description claims "other control flow exceptions" too. As written, GeneratorExit will still be counted as a deployment failure (it’s also a normal termination signal for generators/streams) and will keep inflating litellm_deployment_failure_responses in those cases. Consider adding GeneratorExit (and any other known control-flow termination exceptions used by your streaming stack) to _CONTROL_FLOW_EXCEPTIONS so the metrics behavior matches the stated intent.

@Harshit28j
Copy link
Collaborator Author

Harshit28j commented Feb 8, 2026

Working as expected counted only when its a real failure

image

@krrishdholakia
Copy link
Member

krrishdholakia commented Feb 8, 2026

@Harshit28j can you please work to have this be a 4/5+ on greptile, the feedback seems legitimate, unless i'm missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants