Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Telemetry V2 #581

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Open Telemetry V2 #581

wants to merge 3 commits into from

Conversation

teocns
Copy link
Contributor

@teocns teocns commented Dec 13, 2024

Currently, AgentOps uses OTEL primarily for internal telemetry, primarily handled by SessionExporter. You can find more details on the implementation here.

V1 Architecture

graph TD
    A[Agent Code] -->|Instrumentation| B[AgentOps SDK]
    B -->|Creates| C[Session]
    C -->|Initializes| D[TracerProvider]
    C -->|Creates| E[SessionExporter]
    D -->|Generates| F[Spans]
    F -->|Processed by| G[BatchSpanProcessor]
    G -->|Exports via| E
    E -->|Sends to| H[AgentOps Backend]

    subgraph "OTEL Implementation"
        D
        F
        G
        E
    end
Loading

Which is pretty limited and does not take full advantage of the OpenTelemetry capabilities.


How a standard OTEL implementation looks like

from opentelemetry import trace, metrics
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource, SERVICE_NAME

def configure_customer_telemetry(
    service_name='customer-service', 
    endpoint='localhost:4317'
):
    # Create a resource with service name
    resource = Resource(attributes={
        SERVICE_NAME: service_name
    })

    # Configure Trace Exporter
    trace_exporter = OTLPSpanExporter(endpoint=endpoint)
    trace_provider = TracerProvider(resource=resource)
    trace_processor = BatchSpanProcessor(trace_exporter)
    trace_provider.add_span_processor(trace_processor)
    trace.set_tracer_provider(trace_provider)

    # Configure Metric Exporter
    metric_exporter = OTLPMetricExporter(endpoint=endpoint)
    meter_provider = MeterProvider(
        resource=resource,
        metric_readers=[PeriodicExportingMetricReader(metric_exporter)]
    )
    metrics.set_meter_provider(meter_provider)


Clients might want to:

  • Use Their Own OTEL Setup
    Many organizations already have OTEL infrastructure and might want to:

    • Send data to multiple backends (their existing + AgentOps)
    • Use their own sampling/batching configurations
    • Add custom attributes/resources
  • Trace Custom Metrics

    • LLM-specific metrics (token usage, latency, costs)
    • Agent performance metrics (success rates, completion times)
    • Custom business metrics

To provide simple but extensible configuration, here are some potential approaches we could take:

  1. Accepting additional exporters via the init function:

    agentops.init(
        api_key="xxx", 
        exporters = [ ... ]
    )
  2. Environemnt Variable Setup
    OTLP_EXPORTER_ENDPOINT=https://api.customer.com/v1/telemetry

  3. [TBD] Metric Configuration

    • LLM-specific metrics
    • Performance metrics
    • Business metrics
  4. [TBD] Context Propagation

    • Distributed tracing
    • Cross-service correlation

Higher-level picture: AgentOps components mapping to OpenTelemetry concepts

graph LR
    subgraph AgentOps
        A[Session] --> B[Events]
        B --> C[LLMEvent]
        B --> D[ActionEvent]
        B --> E[ToolEvent]
    end

    subgraph OpenTelemetry
        F[Trace] --> G[Spans]
        G --> H[LLM Spans]
        G --> I[Action Spans]
        G --> J[Tool Spans]
        K[Metrics] --> L[LLM Metrics]
    end

    A -.->|Maps to| F
    C -.->|Maps to| H
    D -.->|Maps to| I
    E -.->|Maps to| J
Loading
  1. Session → Trace

    • Each session represents a complete interaction/workflow
    • Contains all related events
    • Has a unique session_id (that becomes the trace_id)
  2. Events → Spans

    Each Event naturally maps to a span because:
    - Events have start/end times (like spans)
    - Events have unique IDs (like spans)
    - Events have parameters/metadata (like span attributes)
    - Events are hierarchical (like spans can be)

Session / Event Tracing

graph TB
    subgraph Session/Trace
        A[Session Start] -->|Parent Span| B[Events]
        B --> C[LLMEvent<br/>span: llm.completion]
        B --> D[ActionEvent<br/>span: agent.action]
        B --> E[ToolEvent<br/>span: agent.tool]
        
        C --> C1[API Call<br/>span: llm.api.call]
        D --> D1[Function Execution<br/>span: action.execution]
        E --> E1[Tool Execution<br/>span: tool.execution]
    end
Loading

View more details


Standardized architectures

Exporters Behavior

graph LR
    A[AgentOps Events] --> B[OTEL SDK]
    B --> C{Sampler}
    C -->|Sampled| D[Batch Processor]
    C -->|Not Sampled| E[Dropped]
    D --> F[OTLP Exporter]
    F -->|HTTP/gRPC| G[OTEL Collector]
    G --> H1[Jaeger]
    G --> H2[Prometheus]
    G --> H3[Other Backends]
Loading

@AgentOps-AI AgentOps-AI deleted a comment from codecov bot Dec 14, 2024
@AgentOps-AI AgentOps-AI deleted a comment from codecov bot Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant