Summary
Observability in @tanstack/ai is currently chat-only. otelMiddleware() returns a ChatMiddleware, and the media activities (generateImage, generateVideo, generateAudio, generateSpeech, generateTranscription) accept no middleware and carry no OTel instrumentation — their options are limited to adapter / prompt / modelOptions / stream / debug. There is no first-class way to emit gen_ai.* spans for non-chat activities.
Proposal
Add a thin, activity-agnostic observability primitive: a small lifecycle hook (onStart / onFinish / onError) whose payload is discriminated by an activity field (chat | image | video | audio | speech | transcription), registerable on any activity — ideally via the config/debug surface those activities already share. Ship one otelObserver() that implements it once and maps each activity to the correct gen_ai.operation.name (image_generation, text_to_speech, audio_generation, …).
Why not reuse otelMiddleware
The chat middleware pipeline is intentionally heavy — onConfig, per-iteration and per-tool spans, structured-output hooks — and it hard-codes gen_ai.operation.name: "chat". Media activities are single request→response (or submit→poll for video); routing them through a chat-shaped pipeline mis-shapes the span (e.g. an image generation tagged as a chat operation). Keep chat's rich middleware as-is; add a smaller observer that the OTel exporter implements uniformly for all activities, chat included.
Note
TokenUsage is already unified across activities (per the deprecation note retiring the separate image/audio usage shape), so the observer has a consistent usage payload to emit for every activity. Pairs with the usage/cost emission gap tracked separately.
Observed on @tanstack/ai@0.26.1.
Summary
Observability in
@tanstack/aiis currently chat-only.otelMiddleware()returns aChatMiddleware, and the media activities (generateImage,generateVideo,generateAudio,generateSpeech,generateTranscription) accept no middleware and carry no OTel instrumentation — their options are limited toadapter/prompt/modelOptions/stream/debug. There is no first-class way to emitgen_ai.*spans for non-chat activities.Proposal
Add a thin, activity-agnostic observability primitive: a small lifecycle hook (
onStart/onFinish/onError) whose payload is discriminated by anactivityfield (chat | image | video | audio | speech | transcription), registerable on any activity — ideally via the config/debugsurface those activities already share. Ship oneotelObserver()that implements it once and maps each activity to the correctgen_ai.operation.name(image_generation,text_to_speech,audio_generation, …).Why not reuse
otelMiddlewareThe chat middleware pipeline is intentionally heavy —
onConfig, per-iteration and per-tool spans, structured-output hooks — and it hard-codesgen_ai.operation.name: "chat". Media activities are single request→response (or submit→poll for video); routing them through a chat-shaped pipeline mis-shapes the span (e.g. an image generation tagged as achatoperation). Keep chat's rich middleware as-is; add a smaller observer that the OTel exporter implements uniformly for all activities, chat included.Note
TokenUsageis already unified across activities (per the deprecation note retiring the separate image/audio usage shape), so the observer has a consistent usage payload to emit for every activity. Pairs with the usage/cost emission gap tracked separately.Observed on
@tanstack/ai@0.26.1.