A suite of library to interact with various Large Language Model (LLM) providers through a unified API and build agentic AI applications.
Two libraries are provided:
- LLM SDK: Unified SDKs to interact with various LLM providers.
- LLM Agent: An abstraction to build agentic AI applications using the LLM SDK.
Check out the Console Application for a demo application that showcases the capabilities of the libraries.
Note: This Agent
library is v0, and the API may change in a future version. SDK
library is also v0 but the API is more stable. Please open an issue or a PR if you have a suggestion on the API ergonomics or features.
- Supports multiple LLM providers with a unified API.
- Handles multiple modalities: Text, Image, and Audio.
- Support streaming, including for image and audio.
- Supports citations and reasoning for supported models.
- Reports token usage and calculates the cost of a request when provided with the model's pricing information.
- Unified serialization across programming languages (systems in different languages can work together).
- Integrates OpenTelemetry for tracing.
The specification serves as the foundation for implementing the unified LLM SDK in various programming languages. It is expressed using TypeScript in schemas/sdk.ts.
Implementations in different programming languages must strictly adhere to this specification. Specifically, the properties in data structures should retain the same names and types when being serialized to JSON (either by naming of the fields or through serialization attributes).
Each implementation may provide additional features.
We provide SDKs to interact with various LLM providers in the following programming languages:
Provider | Sampling Params | Function Calling | Structured Output | Text Input | Image Input | Audio Input | Citation 1 | Text Output | Image Output | Audio Output | Reasoning |
---|---|---|---|---|---|---|---|---|---|---|---|
OpenAI (Responses) | ✅ except top_k ,frequency_penalty , presence_penalty , seed |
✅ | ✅ | ✅ | ✅ | ✅ | ➖ | ✅ | ✅ | ➖ | ✅ |
OpenAI (Chat Completion) | ✅ except top_k |
✅ | ✅ | ✅ | ✅ | ✅ | ➖ | ✅ | ➖ | ✅ | ➖ |
Anthropic | ✅ except frequency_penalty , presence_penalty , seed |
✅ | ➖ | ✅ | ✅ | ➖ | ✅ | ✅ | ➖ | ➖ | ✅ |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ➖ | ✅ | ✅ | ✅ | ✅ | |
Cohere | ✅ | ✅ | ✅ | ✅ | ✅ | ➖ | ✅ | ✅ | ➖ | ➖ | ✅ |
Mistral | ✅ except top_k |
✅ | ✅ | ✅ | ✅ | ✅ | 🚧 | ✅ | ➖ | ➖ | ✅ |
Keys:
- ✅: Supported
- 🚧: Not yet implemented
- ➖: Not available from provider
A language model instance satisfies the LanguageModel
interface, which includes the following:
provider
: The LLM provider name.model_id
: The model identifier.metadata
: Metadata about the model, such as pricing information or capabilities.generate(LanguageModelInput) -> ModelResponse
: Generate a non-streaming response from the model.stream(LanguageModelInput) -> AsyncIterable<PartialModelResponse>
: Generate a streaming response from the model.
LanguageModelInput
is a unified format to represent the input for generating responses from the language model, applicable to both non-streaming and streaming requests. The library converts these inputs into corresponding properties for each LLM provider, if applicable. This allows specifying:
- The conversation history, which includes
UserMessage
,AssistantMessage
, andToolMessage
. - Sampling parameters:
max_tokens
,temperature
,top_p
,top_k
,presence_penalty
,frequency_penalty
, andseed
. - Tool definitions and tool selection.
- The response format to enforce the model to return structured objects instead of plain text.
modalities
for the model to generate, such as text, images, or audio.- Specific part output options like
audio
,reasoning
.
Message
s are primitives that make up the conversation history, and Part
s are the building blocks of each message. The library converts them into a format suitable for the underlying LLM provider and maps those from different providers to the unified format.
Three message types are defined in the SDK: UserMessage
, AssistantMessage
, and ToolMessage
.
Note
Tool calls are implemented as a Part
instead of being a property of the AssistantMessage
.
Note
The ToolResultPart
content is an array of Part
instead of a string or an object. This enables non-text results to be returned for LLM providers that support them (e.g., Anthropic Function Calling supports images in tool results).
The following Part
types are implemented in the SDK: TextPart
, ImagePart
, AudioPart
, SourcePart
(for citation), ToolCallPart
, ToolResultPart
, and ReasoningPart
.
For streaming calls, there are also corresponding PartDelta
types.
The response from the language model is represented as a ModelResponse
that includes:
content
: An array ofPart
that represents the generated content, which usually comes from theAssistantMessage
.usage
: Token usage information, if available.cost
: The estimated cost of the request, if the model's pricing information is provided.
For streaming calls, the response is represented as a series of PartialModelResponse
objects that include:
delta
: APartDelta
and its index in the eventualcontent
array.usage
: Token usage information, if available.
All SDKs provide the StreamAccumulator
utility to help build the final ModelResponse
from a stream of PartialModelResponse
.
Agents enable the development of agentic AI applications that can generate responses and execute tasks autonomously. Agents utilize the LLM SDK to interact with different language models and allow definitions of instructions, tools, and other language model parameters.
We provide Agent implementations in the following programming languages:
The agent is constructed with the following parameters:
name
: The identifier of the agent.model
: The language model instance from the LLM SDK.instructions
: A list of instructions to be injected into the system prompt to guide the agent's behavior.tools
: A list of executable tools that the agent can call during its execution.response_format
: The expected response format from the agent. While the default is plain text, it can be customized to return structured output.max_turns
: The maximum number of turns the agent can take to complete a request.- Other sampling parameters:
temperature
,top_p
,max_tokens
, etc.
In addition, the agent is defined with a context
generic type that can be accessed in the instructions (for dynamic instructions) and tools.
An agent tool is defined with the following properties:
name
: The identifier of the tool.description
: A description of the tool to instruct the model how and when to use it.parameters
: The JSON schema of the parameters that the tool accepts. The type must be "object".execute(args, context, state)
: The function that will be called to execute the tool with given parameters and context.
The execute
function must always return an AgentToolResult
, which includes:
content
: The content generated by the tool, which is an array ofPart
, allowing multi-modal outputs for language models that support them.is_error
: A boolean indicating whether the tool execution resulted in an error. Some language models utilize this property to guide its behavior.
An agent run is initiated by calling the run
method with an AgentRequest
, which includes the following:
input
: The list of inputAgentItem
for the agent, such asMessage
s orModelResponse
s.context
: A user-provided value that can be accessed in instructions and tools.
Note
Each agent run is stateless, so it is recommended to implement a strategy to persist the conversation history if needed.
Each run will continuously generate LLM completions, parse responses to check for tool calls, execute any tools, and feed the tool results back to the model until one of the following conditions is met:
- The model generates a final response (i.e., no tool call).
- The maximum number of turns is reached.
AgentResponse
includes the final response with the following properties:
output
: A list of outputAgentItem
, such asToolMessage
andAssistantMessage
, that were generated during the run. This can be used to append to theinput
of the next run.content
: The final content generated by the agent, which is usually the content of the lastAssistantMessage
.
The library also provides a streaming interface, similar to streaming LLM completions, to stream the agent run progress, including part deltas (e.g., text deltas, audio deltas) and intermediate tool calls. Each event can either be:
AgentStreamEventPartial
: Contains thePartialModelResponse
as generated by the LLM SDK, which includes part deltas.AgentStreamItemEvent
: Contains anAgentItem
that was generated during the run.AgentStreamResponseEvent
: The final response of the agent run, which includes theAgentResponse
.
This agent library (not framework) is designed for transparency and control. Unlike many “agentic” frameworks, it ships with no hidden prompt templates or secret parsing rules—and that’s on purpose:
- Nothing hidden – What you write is what runs. No secret prompts or “special sauce” behind the scenes, so your instructions aren’t quietly overridden.
- Works in any settings – Many frameworks bake in English-only prompts. Here, the model sees only your words, in whichever language or format.
- Easy to tweak – Change prompts, parsing, or flow without fighting built-in defaults.
- Less to debug – Fewer layers mean you can trace exactly where things break.
- No complex abstraction – Don't waste time learning new concepts or APIs (e.g., “chains”, “graphs”, syntax with special meanings, etc.). Just plain functions and data structures.
LLM in the past was not as powerful as today, so frameworks had to do a lot of heavy lifting to get decent results. But with modern LLMs, much of that complexity is no longer necessary.
Because we keep the core minimal (500 LOC!) and do not want to introduce such hidden magic, the library doesn’t bundle heavy agent patterns like hand-off, memory, or planners.
Instead, the examples/
folders shows clean, working references you can copy or adapt to see that it can still be used to build complex use cases.
The idea is inspired by this blog post.
The initial version of llm-sdk
was developed internally at my company, prior to the existence or knowledge of similar libraries like the Vercel AI SDK or OpenAI Swarm. As a result, it was never intended to compete with or address the limitations of those libraries. As these other libraries matured, llm-sdk
continued to evolve independently, focusing on its unique features and use cases, which were designed to be sufficient for its intended applications.
This section is designed to outline the differences for those considering migration to or from llm-sdk
or to assert compatibility.
TBD.
Footnotes
-
Source Input (citation) is not supported by all providers and may be converted to compatible inputs instead. ↩