Skip to content

RealtimeModel: "Only model output audio messages can be truncated" when interrupted before first audio frame #6157

Description

@areebkhan-tech

Bug Description

When using the OpenAI Realtime model, if the user interrupts (barge-in) in the
narrow window after the model has declared an audio response but before the
first audio frame is actually played, the plugin emits a
conversation.item.truncate with audio_end_ms=0. The Realtime API rejects it:
APIError('OpenAI Realtime API returned an error',
body=RealtimeError(
message='Only model output audio messages can be truncated',
type='invalid_request_error',
code='unsupported_content_type'),
retryable=True) recoverable=True

Root cause
AgentActivity calls truncate() on interruption with
audio_end_ms = int(entry.out.playback_position * 1000), which is 0 when no
frame has played yet
(livekit-agents/livekit/agents/voice/agent_activity.py:3609).
RealtimeSession.truncate() then unconditionally sends a
ConversationItemTruncateEvent whenever "audio" is in modalities
(livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.py:1609).
Because the item has no committed model-output audio, the server rejects it.

Event sequence

  1. response.created
  2. response.output_item.added — message_id assigned
  3. response.content_part.added — modalities resolve to ["audio", "text"]
  4. (user interrupts here — VAD fires)response.audio.delta has NOT happened yet
  5. interruption path calls truncate(..., audio_end_ms=0) → server error

Expected Behavior

Interrupting before any audio has played should be a no-op for truncation —
there is no committed audio to truncate, so the plugin should not send a
conversation.item.truncate with audio_end_ms=0. No error should be raised
and the session should continue cleanly.

Reproduction Steps

1. Start an AgentSession with `openai.realtime.RealtimeModel` (audio modality).
2. Trigger an initial agent reply (e.g. a welcome message).
3. Speak / send input that interrupts within the first few hundred ms,
   before the first audio frame plays.
4. Observe the error in logs.

Operating System

macOS

Models Used

No response

Package Versions

livekit-agents == 1.5.4
livekit-plugins-openai == 1.5.4
OpenAI Realtime model (e.g. gpt-4o-realtime / gpt-realtime), server VAD turn detection

Session/Room/Call IDs

No response

Proposed Solution

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions