Summary
LoongSuite currently provides OpenAI v2 instrumentation for Chat Completions and Embeddings, but the latest OpenAI Python SDK exposes additional API surfaces that are important for modern GenAI workloads. The largest gap is the OpenAI Responses API, which is now the primary model interaction API in the OpenAI SDK and is also used by agentic workflows.
This issue tracks adding OpenAI Responses API instrumentation and enriching OpenAI v2 telemetry while keeping the implementation aligned with LoongSuite's existing opentelemetry.util.genai helpers and GenAI semantic-convention behavior.
Current coverage
The current opentelemetry-instrumentation-openai-v2 instrumentation wraps:
openai.resources.chat.completions.Completions.create
openai.resources.chat.completions.AsyncCompletions.create
openai.resources.embeddings.Embeddings.create
openai.resources.embeddings.AsyncEmbeddings.create
Chat Completions already has sync, async, streaming, raw-response, tool-call, content-capture, metrics, and error-path test coverage. Embeddings is covered for sync/async calls, token metrics, dimensions, encoding format, and error paths, but it still uses a direct tracer/metrics path instead of the newer TelemetryHandler flow used by Chat Completions in experimental semconv mode.
Gaps
P0: Responses API is not instrumented
The OpenAI Python SDK exposes client.responses.create, client.responses.stream, async variants, and related streaming events. LoongSuite currently does not wrap openai.resources.responses, so calls through the Responses API do not produce OpenAI GenAI spans, events, or token metrics.
The new instrumentation should cover:
- sync
Responses.create
- async
AsyncResponses.create
stream=True
Responses.stream / async streaming helpers
- raw-response parsing and context-manager usage
- success, incomplete, failed, cancelled, and exception paths
P0: Responses streaming needs a dedicated accumulator
Responses streaming emits different events from Chat Completions streaming. The instrumentation should aggregate final response state from completion/done events, preserve context for sync and async iteration, and end spans reliably when streams are exhausted, closed, or fail.
P1: Reuse opentelemetry.util.genai consistently
The Responses API implementation should reuse the shared GenAI utilities for:
- span lifecycle
- semantic-convention attribute mapping
- content-capture mode handling
- message/tool content serialization
- metrics emission
- error handling
Embeddings should also be evaluated for migration to a shared util/genai path, or util/genai should be extended with a reusable embedding invocation shape if the existing LLMInvocation is not a good fit.
P1: Enrich token and response metadata
Responses and newer OpenAI models expose useful metadata that is not fully represented today, including token detail fields such as cached tokens, reasoning tokens, and audio token details. The implementation should capture stable semantic-convention fields where available and use clearly documented LoongSuite extension attributes only when no stable semconv field exists.
P1: Capture tool calls and structured output safely
The instrumentation should support function tools and built-in Responses API tools such as web search, file search, code interpreter, and computer-use style outputs where the SDK exposes them. Structured output schemas and tool arguments should only be captured when content capture is enabled.
Proposed telemetry shape
For Responses API model calls:
- span kind:
CLIENT
- provider/system: OpenAI
- operation:
chat unless the semantic convention adds a more specific Responses operation
- span name:
chat <model> or equivalent existing GenAI naming pattern
- request attributes: model, instructions, input shape, tools/tool_choice, parallel_tool_calls, max_output_tokens, temperature, top_p, reasoning config, service tier, previous_response_id, conversation/background/store indicators when present
- response attributes: response id, response model, status, finish reasons, usage input/output tokens, service tier, and relevant tool-call metadata
- metrics: operation duration and token usage with the same common dimensions as existing OpenAI v2 instrumentation
- events/content: input/output messages, tool call requests, tool call responses, reasoning/text parts, and multimodal references only according to the configured content-capture mode
Test plan
Add focused tests for:
- sync
responses.create
- async
responses.create
stream=True
responses.stream and async stream helpers
- raw response parse and context-manager behavior
- tool calls and built-in tool outputs
- multimodal input and output mapping
- reasoning/token detail extraction
- incomplete/failed/cancelled/error paths
- content capture on/off
- unsampled spans
- metrics for duration and token usage
Existing Chat Completions and Embeddings tests should continue to pass.
Documentation
Update the OpenAI GenAI instrumentation docs to describe:
- supported OpenAI API surfaces
- Responses API support and streaming behavior
- content-capture privacy behavior
- token detail / extension attribute behavior
- API surfaces intentionally not mapped to GenAI spans, such as management or CRUD APIs where plain HTTP/client telemetry is more appropriate
Summary
LoongSuite currently provides OpenAI v2 instrumentation for Chat Completions and Embeddings, but the latest OpenAI Python SDK exposes additional API surfaces that are important for modern GenAI workloads. The largest gap is the OpenAI Responses API, which is now the primary model interaction API in the OpenAI SDK and is also used by agentic workflows.
This issue tracks adding OpenAI Responses API instrumentation and enriching OpenAI v2 telemetry while keeping the implementation aligned with LoongSuite's existing
opentelemetry.util.genaihelpers and GenAI semantic-convention behavior.Current coverage
The current
opentelemetry-instrumentation-openai-v2instrumentation wraps:openai.resources.chat.completions.Completions.createopenai.resources.chat.completions.AsyncCompletions.createopenai.resources.embeddings.Embeddings.createopenai.resources.embeddings.AsyncEmbeddings.createChat Completions already has sync, async, streaming, raw-response, tool-call, content-capture, metrics, and error-path test coverage. Embeddings is covered for sync/async calls, token metrics, dimensions, encoding format, and error paths, but it still uses a direct tracer/metrics path instead of the newer
TelemetryHandlerflow used by Chat Completions in experimental semconv mode.Gaps
P0: Responses API is not instrumented
The OpenAI Python SDK exposes
client.responses.create,client.responses.stream, async variants, and related streaming events. LoongSuite currently does not wrapopenai.resources.responses, so calls through the Responses API do not produce OpenAI GenAI spans, events, or token metrics.The new instrumentation should cover:
Responses.createAsyncResponses.createstream=TrueResponses.stream/ async streaming helpersP0: Responses streaming needs a dedicated accumulator
Responses streaming emits different events from Chat Completions streaming. The instrumentation should aggregate final response state from completion/done events, preserve context for sync and async iteration, and end spans reliably when streams are exhausted, closed, or fail.
P1: Reuse
opentelemetry.util.genaiconsistentlyThe Responses API implementation should reuse the shared GenAI utilities for:
Embeddings should also be evaluated for migration to a shared util/genai path, or util/genai should be extended with a reusable embedding invocation shape if the existing
LLMInvocationis not a good fit.P1: Enrich token and response metadata
Responses and newer OpenAI models expose useful metadata that is not fully represented today, including token detail fields such as cached tokens, reasoning tokens, and audio token details. The implementation should capture stable semantic-convention fields where available and use clearly documented LoongSuite extension attributes only when no stable semconv field exists.
P1: Capture tool calls and structured output safely
The instrumentation should support function tools and built-in Responses API tools such as web search, file search, code interpreter, and computer-use style outputs where the SDK exposes them. Structured output schemas and tool arguments should only be captured when content capture is enabled.
Proposed telemetry shape
For Responses API model calls:
CLIENTchatunless the semantic convention adds a more specific Responses operationchat <model>or equivalent existing GenAI naming patternTest plan
Add focused tests for:
responses.createresponses.createstream=Trueresponses.streamand async stream helpersExisting Chat Completions and Embeddings tests should continue to pass.
Documentation
Update the OpenAI GenAI instrumentation docs to describe: