Bug Description
When using the OpenAI Realtime model, if the user interrupts (barge-in) in the
narrow window after the model has declared an audio response but before the
first audio frame is actually played, the plugin emits a
conversation.item.truncate with audio_end_ms=0. The Realtime API rejects it:
APIError('OpenAI Realtime API returned an error',
body=RealtimeError(
message='Only model output audio messages can be truncated',
type='invalid_request_error',
code='unsupported_content_type'),
retryable=True) recoverable=True
Root cause
AgentActivity calls truncate() on interruption with
audio_end_ms = int(entry.out.playback_position * 1000), which is 0 when no
frame has played yet
(livekit-agents/livekit/agents/voice/agent_activity.py:3609).
RealtimeSession.truncate() then unconditionally sends a
ConversationItemTruncateEvent whenever "audio" is in modalities
(livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.py:1609).
Because the item has no committed model-output audio, the server rejects it.
Event sequence
response.created
response.output_item.added — message_id assigned
response.content_part.added — modalities resolve to ["audio", "text"]
- (user interrupts here — VAD fires) ←
response.audio.delta has NOT happened yet
- interruption path calls
truncate(..., audio_end_ms=0) → server error
Expected Behavior
Interrupting before any audio has played should be a no-op for truncation —
there is no committed audio to truncate, so the plugin should not send a
conversation.item.truncate with audio_end_ms=0. No error should be raised
and the session should continue cleanly.
Reproduction Steps
1. Start an AgentSession with `openai.realtime.RealtimeModel` (audio modality).
2. Trigger an initial agent reply (e.g. a welcome message).
3. Speak / send input that interrupts within the first few hundred ms,
before the first audio frame plays.
4. Observe the error in logs.
Operating System
macOS
Models Used
No response
Package Versions
livekit-agents == 1.5.4
livekit-plugins-openai == 1.5.4
OpenAI Realtime model (e.g. gpt-4o-realtime / gpt-realtime), server VAD turn detection
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
No response
Screenshots and Recordings
No response
Bug Description
When using the OpenAI Realtime model, if the user interrupts (barge-in) in the
narrow window after the model has declared an audio response but before the
first audio frame is actually played, the plugin emits a
conversation.item.truncatewithaudio_end_ms=0. The Realtime API rejects it:APIError('OpenAI Realtime API returned an error',
body=RealtimeError(
message='Only model output audio messages can be truncated',
type='invalid_request_error',
code='unsupported_content_type'),
retryable=True) recoverable=True
Root cause
AgentActivitycallstruncate()on interruption withaudio_end_ms = int(entry.out.playback_position * 1000), which is0when noframe has played yet
(
livekit-agents/livekit/agents/voice/agent_activity.py:3609).RealtimeSession.truncate()then unconditionally sends aConversationItemTruncateEventwhenever"audio"is inmodalities(
livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.py:1609).Because the item has no committed model-output audio, the server rejects it.
Event sequence
response.createdresponse.output_item.added— message_id assignedresponse.content_part.added— modalities resolve to["audio", "text"]response.audio.deltahas NOT happened yettruncate(..., audio_end_ms=0)→ server errorExpected Behavior
Interrupting before any audio has played should be a no-op for truncation —
there is no committed audio to truncate, so the plugin should not send a
conversation.item.truncatewithaudio_end_ms=0. No error should be raisedand the session should continue cleanly.
Reproduction Steps
Operating System
macOS
Models Used
No response
Package Versions
Session/Room/Call IDs
No response
Proposed Solution
Additional Context
No response
Screenshots and Recordings
No response