Support OpenAI Responses API and enrich OpenAI v2 GenAI telemetry

## Summary

LoongSuite currently provides OpenAI v2 instrumentation for Chat Completions and Embeddings, but the latest OpenAI Python SDK exposes additional API surfaces that are important for modern GenAI workloads. The largest gap is the OpenAI Responses API, which is now the primary model interaction API in the OpenAI SDK and is also used by agentic workflows.

This issue tracks adding OpenAI Responses API instrumentation and enriching OpenAI v2 telemetry while keeping the implementation aligned with LoongSuite's existing `opentelemetry.util.genai` helpers and GenAI semantic-convention behavior.

## Current coverage

The current `opentelemetry-instrumentation-openai-v2` instrumentation wraps:

- `openai.resources.chat.completions.Completions.create`
- `openai.resources.chat.completions.AsyncCompletions.create`
- `openai.resources.embeddings.Embeddings.create`
- `openai.resources.embeddings.AsyncEmbeddings.create`

Chat Completions already has sync, async, streaming, raw-response, tool-call, content-capture, metrics, and error-path test coverage. Embeddings is covered for sync/async calls, token metrics, dimensions, encoding format, and error paths, but it still uses a direct tracer/metrics path instead of the newer `TelemetryHandler` flow used by Chat Completions in experimental semconv mode.

## Gaps

### P0: Responses API is not instrumented

The OpenAI Python SDK exposes `client.responses.create`, `client.responses.stream`, async variants, and related streaming events. LoongSuite currently does not wrap `openai.resources.responses`, so calls through the Responses API do not produce OpenAI GenAI spans, events, or token metrics.

The new instrumentation should cover:

- sync `Responses.create`
- async `AsyncResponses.create`
- `stream=True`
- `Responses.stream` / async streaming helpers
- raw-response parsing and context-manager usage
- success, incomplete, failed, cancelled, and exception paths

### P0: Responses streaming needs a dedicated accumulator

Responses streaming emits different events from Chat Completions streaming. The instrumentation should aggregate final response state from completion/done events, preserve context for sync and async iteration, and end spans reliably when streams are exhausted, closed, or fail.

### P1: Reuse `opentelemetry.util.genai` consistently

The Responses API implementation should reuse the shared GenAI utilities for:

- span lifecycle
- semantic-convention attribute mapping
- content-capture mode handling
- message/tool content serialization
- metrics emission
- error handling

Embeddings should also be evaluated for migration to a shared util/genai path, or util/genai should be extended with a reusable embedding invocation shape if the existing `LLMInvocation` is not a good fit.

### P1: Enrich token and response metadata

Responses and newer OpenAI models expose useful metadata that is not fully represented today, including token detail fields such as cached tokens, reasoning tokens, and audio token details. The implementation should capture stable semantic-convention fields where available and use clearly documented LoongSuite extension attributes only when no stable semconv field exists.

### P1: Capture tool calls and structured output safely

The instrumentation should support function tools and built-in Responses API tools such as web search, file search, code interpreter, and computer-use style outputs where the SDK exposes them. Structured output schemas and tool arguments should only be captured when content capture is enabled.

## Proposed telemetry shape

For Responses API model calls:

- span kind: `CLIENT`
- provider/system: OpenAI
- operation: `chat` unless the semantic convention adds a more specific Responses operation
- span name: `chat <model>` or equivalent existing GenAI naming pattern
- request attributes: model, instructions, input shape, tools/tool_choice, parallel_tool_calls, max_output_tokens, temperature, top_p, reasoning config, service tier, previous_response_id, conversation/background/store indicators when present
- response attributes: response id, response model, status, finish reasons, usage input/output tokens, service tier, and relevant tool-call metadata
- metrics: operation duration and token usage with the same common dimensions as existing OpenAI v2 instrumentation
- events/content: input/output messages, tool call requests, tool call responses, reasoning/text parts, and multimodal references only according to the configured content-capture mode

## Test plan

Add focused tests for:

- sync `responses.create`
- async `responses.create`
- `stream=True`
- `responses.stream` and async stream helpers
- raw response parse and context-manager behavior
- tool calls and built-in tool outputs
- multimodal input and output mapping
- reasoning/token detail extraction
- incomplete/failed/cancelled/error paths
- content capture on/off
- unsampled spans
- metrics for duration and token usage

Existing Chat Completions and Embeddings tests should continue to pass.

## Documentation

Update the OpenAI GenAI instrumentation docs to describe:

- supported OpenAI API surfaces
- Responses API support and streaming behavior
- content-capture privacy behavior
- token detail / extension attribute behavior
- API surfaces intentionally not mapped to GenAI spans, such as management or CRUD APIs where plain HTTP/client telemetry is more appropriate


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support OpenAI Responses API and enrich OpenAI v2 GenAI telemetry #209

Summary

Current coverage

Gaps

P0: Responses API is not instrumented

P0: Responses streaming needs a dedicated accumulator

P1: Reuse `opentelemetry.util.genai` consistently

P1: Enrich token and response metadata

P1: Capture tool calls and structured output safely

Proposed telemetry shape

Test plan

Documentation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support OpenAI Responses API and enrich OpenAI v2 GenAI telemetry #209

Description

Summary

Current coverage

Gaps

P0: Responses API is not instrumented

P0: Responses streaming needs a dedicated accumulator

P1: Reuse opentelemetry.util.genai consistently

P1: Enrich token and response metadata

P1: Capture tool calls and structured output safely

Proposed telemetry shape

Test plan

Documentation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

P1: Reuse `opentelemetry.util.genai` consistently