Skip to content

Releases: livekit/agents

livekit-agents@1.5.1

23 Mar 22:52
d167306

Choose a tag to compare

Note

livekit-agents 1.5 introduced many new features. You can check out the changelog here.

What's Changed

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.5.0...livekit-agents@1.5.1

livekit-agents@1.5.0

19 Mar 17:01
760504e

Choose a tag to compare

Highlights

Adaptive Interruption Handling

The headline feature of v1.5.0: an audio-based ML model that distinguishes genuine user interruptions from incidental sounds like backchannels ("mm-hmm"), coughs, sighs, or background noise. Enabled by default β€” no configuration needed.

Key stats:

  • 86% precision and 100% recall at 500ms overlapping speech
  • Rejects 51% of traditional VAD false positives
  • Detects true interruptions 64% faster than VAD alone
  • Inference completes in 30ms or less

When a false interruption is detected, the agent automatically resumes playback from where it left off β€” no re-generation needed.

To opt out and use VAD-only interruption:

session = AgentSession(
    ...
    turn_handling=TurnHandlingOptions(
        interruption={
            "mode": "vad",
        },
    ),
)

Blog post: https://livekit.com/blog/adaptive-interruption-handling

Dynamic Endpointing

Endpointing delays now adapt to each conversation's natural rhythm. Instead of a fixed silence threshold, the agent uses an exponential moving average of pause durations to dynamically adjust when it considers the user's turn complete.

session = AgentSession(
    ...
    turn_handling=TurnHandlingOptions(
        endpointing={
            "mode": "dynamic",
            "min_delay": 0.3,
            "max_delay": 3.0,
        },
    ),
)

New TurnHandlingOptions API

Endpointing and interruption settings are now consolidated into a single TurnHandlingOptions dict passed to AgentSession. Old keyword arguments (min_endpointing_delay, allow_interruptions, etc.) still work but are deprecated and will emit warnings.

session = AgentSession(
    turn_handling={
        "turn_detection": "vad",
        "endpointing": {"min_delay": 0.5, "max_delay": 3.0},
        "interruption": {"enabled": True, "mode": "adaptive"},
    },
)

Session Usage Tracking

New SessionUsageUpdatedEvent provides structured, per-model usage data β€” token counts, character counts, and audio durations β€” broken down by provider and model:

@session.on("session_usage_updated")
def on_usage(ev: SessionUsageUpdatedEvent):
    for usage in ev.usage.model_usage:
        print(f"{usage.provider}/{usage.model}: {usage}")

Usage types: LLMModelUsage, TTSModelUsage, STTModelUsage, InterruptionModelUsage.

You can also access aggregated usage at any time via the session.usage property:

usage = session.usage
for model_usage in usage.model_usage:
    print(model_usage)

Usage data is also included in SessionReport (via model_usage), so it's available in post-session telemetry and reporting out of the box.

Per-Turn Latency on ChatMessage.metrics

Each ChatMessage now carries a metrics field (MetricsReport) with per-turn latency data:

  • transcription_delay β€” time to obtain transcript after end of speech
  • end_of_turn_delay β€” time between end of speech and turn decision
  • on_user_turn_completed_delay β€” time in the developer callback

Action-Aware Chat Context Summarization

Context summarization now includes function calls and their outputs when building summaries, preserving tool-use context across the conversation window.

Configurable Log Level

Set the agent log level via LIVEKIT_LOG_LEVEL environment variable or through ServerOptions, without touching your code.

Deprecations

Deprecated Replacement Notes
metrics_collected event session_usage_updated event + ChatMessage.metrics Usage/cost data moves to session_usage_updated; per-turn latency moves to ChatMessage.metrics. Old listeners still work with a deprecation warning.
UsageCollector ModelUsageCollector New collector supports per-model/provider breakdown
UsageSummary LLMModelUsage, TTSModelUsage, STTModelUsage Typed per-service usage classes
RealtimeModelBeta RealtimeModel Beta API removed
AgentFalseInterruptionEvent.message / .extra_instructions Automatic resume via adaptive interruption Accessing these fields logs a deprecation warning
AgentSession kwargs: min_endpointing_delay, max_endpointing_delay, allow_interruptions, discard_audio_if_uninterruptible, min_interruption_duration, min_interruption_words, turn_detection, false_interruption_timeout, resume_false_interruption turn_handling=TurnHandlingOptions(...) Old kwargs still work but emit deprecation warnings. Will be removed in v2.0.
Agent / AgentTask kwargs: turn_detection, min_endpointing_delay, max_endpointing_delay, allow_interruptions turn_handling=TurnHandlingOptions(...) Same migration path as AgentSession. Will be removed in future versions.

Complete changelog

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.4.6...livekit-agents@1.5.0

livekit-agents@1.4.6

16 Mar 19:09
29b71d4

Choose a tag to compare

What's Changed

  • fix(types): replace TypeGuard with TypeIs in is_given for bidirectional narrowing by @longcw in #5079
  • [inworld] websocket _recv_loop to flush the audio immediately by @ianbbqzy in #5071
  • fix: include null in enum array for nullable enum schemas by @MSameerAbbas in #5080
  • (openai chat completions): drop reasoning_effort when function tools are present by @tinalenguyen in #5088
  • (google realtime): replace deprecated mediaChunks by @tinalenguyen in #5089
  • fix: omit required field in tool schema when function has no parameters by @longcw in #5082
  • fix(sarvam-tts): correct mime_type from audio/mp3 to audio/wav by @shmundada93 in #5086
  • add trunk_config to WarmTransferTask for SIP endpoint transfers by @longcw in #5016
  • healthcare example by @tinalenguyen in #5031
  • fix(openai): only reuse previous_response_id when pending tool calls are completed by @longcw in #5094
  • feat(assemblyai): add speaker diarization support by @dlange-aai in #5074
  • fix: prevent _cancel_speech_pause from poisoning subsequent user turns by @giulio-leone in #5101
  • feat(google): support universal credential types in STT and TTS credentials_file by @rafallezanko in #5056
  • Add Murf AI - TTS Plugin Support by @gaurav-murf in #3000
  • feat(voice): add callable TextTransforms support with built-in replace transform by @longcw in #5104
  • fix(eou): only reset speech/speaking time when no new speech by @chenghao-mou in #5083
  • (xai): add tts by @tinalenguyen in #5120
  • (xai tts): add language parameter by @tinalenguyen in #5122
  • livekit-agents 1.4.6 by @theomonnom in #5123

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.4.5...livekit-agents@1.4.6

livekit-agents@1.4.5

11 Mar 06:45
56f7182

Choose a tag to compare

What's Changed

  • Pass through additional params to LemonSlice when using the LemonSlice Avatar by @jp-lemon in #4984
  • fix(anthropic): add dummy user message for Claude 4.6+ trailing assistant turns by @giulio-leone in #4973
  • (keyframe): remove whitespace from py.typed by @tinalenguyen in #4990
  • Add Phonic Plugin to LiveKit agents by @qionghuang6 in #4980
  • Fixed E2EE encryption of content in data tracks by @zelidrag-arbo in #4992
  • fix: resync tool context when tools are mutated inside llm_node by @longcw in #4994
  • [πŸ€– readme-manager] Update README by @ladvoc in #4996
  • fix(google): prevent function_call text from leaking to TTS output by @BkSouX in #4999
  • (openai responses): add websocket connection pool by @tinalenguyen in #4985
  • (openai tts): close openai client by @tinalenguyen in #5012
  • nvidia stt: add speaker diarization support by @longcw in #4997
  • update error message when TTS is not set by @longcw in #4998
  • initialize interval future in init by @tinalenguyen in #5013
  • Fix/elevenlabs update default voice non expiring by @yusuf-eren in #5010
  • [Inworld] Flush to drain decoder on every audio chunk from server by @ianbbqzy in #4983
  • (google): support passing credentials through realtime and llm by @tinalenguyen in #5015
  • use default voice accessible to free tier users by @tmshapland in #5020
  • make commit_user_turn() return a Future with the audio transcript by @longcw in #5019
  • Add GPT-5.4 to OpenAI plugin by @Topherhindman in #5022
  • Generate and upload markdown docs by @Topherhindman in #4993
  • Add GPT-5.4 and GPT-5.3 Chat Latest support by @Topherhindman in #5030
  • Improve Audio Generation Quality for Cartesia TTS Plugin by @tycartesia in #5032
  • fix(elevenlabs): handle empty words in _to_timed_words by @MonkeyLeeT in #5036
  • fix(deepgram): include word confidence for stt v2 alternatives by @inickt in #5034
  • fix: generate final LLM response when max_tool_steps is reached by @IanSteno in #4747
  • fix: guard against negative sleep duration in voice agent scheduling by @jnMetaCode in #5040
  • add modality-aware Instructions with audio/text variants by @longcw in #4987
  • fix(core): move callbacks to the caller by @chenghao-mou in #5039
  • Added raw logging of API errors via the LiveKit plugins for both STT and TTS. by @dhruvladia-sarvam in #5025
  • Log LemonSlice API error + new agent_idle_prompt arg by @jp-lemon in #5052
  • Sarvam v3 tts addns by @dhruvladia-sarvam in #4976
  • fix(google): avoid session restart on update_instructions, use mid-session client content by @D-zigi in #5049
  • (responses llm): override provider property and set use_websocket to False for wrappers by @tinalenguyen in #5055
  • feat(mcp): add MCPToolResultResolver callback for customizing tool call results by @longcw in #5046
  • docs: add development instructions to README and example READMEs by @bcherry in #2636
  • Improve plugin READMEs with installation, pre-requisites, and docs links by @bcherry in #3025
  • Add generate_reply and update_chat_ctx support to Phonic Plugin by @qionghuang6 in #5058
  • feat: enhance worker load management with reserved slots and effective load calculation by @ProblematicToucan in #4911
  • fix(core): render error message with full details in traceback by @chenghao-mou in #5047
  • feat(core): allow skip_reply when calling commit_user_turn by @chenghao-mou in #5066
  • fix(mcp): replace deprecated streamablehttp_client with streamable_http_client by @longcw in #5048
  • fix: disable aec warmup timer when audio is disabled by @longcw in #5065
  • feat(openai): add transcript_confidence from OpenAI realtime logprobs by @theomonnom in #5070
  • Enhance LK Inference STT and TTS options with new parameters and models by @russellmartin-livekit in #4949
  • Move Instructions to beta exports by @theomonnom in #5075
  • livekit-agents 1.4.5 by @theomonnom in #5076

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.4.4...livekit-agents@1.4.5

livekit-agents@1.4.4

03 Mar 01:13
597c4fe

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.4.3...livekit-agents@1.4.4

livekit-agents@1.4.3

23 Feb 04:07
cee3d40

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/browser-v0.1.4...livekit-agents@1.4.3

livekit-agents@1.4.2

17 Feb 03:17
baaf655

Choose a tag to compare

Stability-focused release with significant reliability improvements. Fixes multiple memory leaks in the process pool β€” job counter leaks on cancellation, pending assignment leaks on timeout, socket leaks on startup failure, and orphaned executors on send failure. IPC pipeline reliability has been improved, and several edge-case hangs have been resolved (participant never joining, Ctrl+C propagation to child processes). STT/TTS fallback behavior is now more robust: STT fallback correctly skips the main stream during recovery, and TTS fallback no longer shares resamplers across streams. Other fixes include ChatContext.truncate no longer dropping developer messages, correct cgroups v2 CPU quota parsing, proper on_session_end callback ordering, and log uploads even when sessions fail to start. Workers now automatically reject jobs when draining or full, and the proc pool correctly spawns processes under high load.

New RecordingOptions API

The record parameter on AgentSession.start() now accepts granular options in addition to bool. All keys default to True when omitted.

# record everything (default)
await session.start(agent, record=True)

# record nothing
await session.start(agent, record=False)

# granular: record audio but disable traces, logs, and transcript
await session.start(agent, record={"audio": True, "traces": False, "logs": False, "transcript": False})

What's Changed

New Contributors

Full Changelog: https://github.com/livekit/agents/compare/livekit-agents@1.4.0...livekit-agents@1.4.2

Browser v0.1.3

16 Feb 22:41
bb20f4f

Choose a tag to compare

Browser v0.1.3 Pre-release
Pre-release

CEF native binaries for livekit-browser v0.1.3. Supports Python 3.12-3.14 on macOS arm64, Linux x64, and Linux arm64.

Browser v0.1.2

16 Feb 05:40
6317935

Choose a tag to compare

Browser v0.1.2 Pre-release
Pre-release

CEF native binaries for livekit-browser v0.1.2. Supports Python 3.12-3.14 on macOS arm64, Linux x64, and Linux arm64.

livekit-agents@1.4.0

06 Feb 21:10
30a91f5

Choose a tag to compare

Python 3.14 Support & Python 3.9 Dropped

This release adds Python 3.14 support and drops Python 3.9. The minimum supported version is now Python 3.10.

Tool Improvements

Tools and toolsets now have stable unique IDs, making it possible to reference and filter tools programmatically. Changes to agent configuration (instructions, tools) are now tracked in conversation history via AgentConfigUpdate.

LLMStream.collect() API

A new LLMStream.collect() API makes it significantly easier to use LLMs outside of AgentSession. You can now call an LLM, collect the full response, and execute tool calls with a straightforward API β€” useful for background tasks, pre-processing, or any workflow where you need LLM capabilities without the full voice agent pipeline.

from livekit.agents import llm

response = await my_llm.chat(chat_ctx=ctx, tools=tools).collect()

for tc in response.tool_calls:
    result = await llm.execute_function_call(tc, tool_ctx)
    ctx.insert(result.fnc_call)
    if result.fnc_call_out:
        ctx.insert(result.fnc_call_out)

Manual Turn Detection for Realtime Models

Realtime models now support commit_user_turn, enabling turn_detection="manual" mode. This gives you full control over when user turns are committed β€” useful for push-to-talk interfaces or scenarios where automatic VAD-based turn detection isn't ideal.

@ctx.room.local_participant.register_rpc_method("end_turn")
async def end_turn(data: rtc.RpcInvocationData):
    session.input.set_audio_enabled(False)
    session.commit_user_turn(
        transcript_timeout=10.0,
        stt_flush_duration=2.0,
    )

Job Migration on Reconnection

When the agent server temporarily loses connection and reconnects, active jobs are now automatically migrated rather than being dropped. This significantly improves reliability during transient network issues.

False Interruption Fix

Fixed a bug where late end-of-speech events could trigger duplicate false interruption timers, causing the agent to incorrectly stop speaking. The agent now properly deduplicates these events and tracks STT completion state more reliably.

New Providers & Plugins

  • xAI Responses LLM β€” Use xAI's Responses API via xai.responses.LLM()
  • Azure OpenAI Responses β€” Azure-hosted Responses API via azure.responses.LLM(), with support for deployments and Azure auth
  • Camb.ai TTS β€” New TTS plugin powered by the MARS model family (mars-flash, mars-pro, mars-instruct), with voice selection, language control, and style instructions
  • Avatario Avatar β€” Virtual avatar plugin with session management and API client

What's Changed

Read more