Skip to content

Litellm dev 08 16 2025 p3 #13694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docs/my-website/docs/response_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -803,11 +803,18 @@ LiteLLM Proxy supports session management for non-OpenAI models. This allows you

1. Enable storing request / response content in the database

Set `store_prompts_in_spend_logs: true` in your proxy config.yaml. When this is enabled, LiteLLM will store the request and response content in the database.
Set `store_prompts_in_cold_storage: true` in your proxy config.yaml. When this is enabled, LiteLLM will store the request and response content in the s3 bucket you specify.

```yaml
litellm_settings:
callbacks: ["s3_v2"]
s3_callback_params: # learn more https://docs.litellm.ai/docs/proxy/logging#s3-buckets
s3_bucket_name: litellm-logs # AWS Bucket Name for S3
s3_region_name: us-west-2

general_settings:
store_prompts_in_spend_logs: true
cold_storage_custom_logger: s3_v2
store_prompts_in_cold_storage: true
```

2. Make request 1 with no `previous_response_id` (new session)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 42 additions & 1 deletion docs/my-website/release_notes/v1.75.5-stable/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "[PRE-RELEASE]v1.75.5-stable"
title: "v1.75.5-stable - Redis latency improvements"
slug: "v1-75-5"
date: 2025-08-10T10:00:00
authors:
Expand Down Expand Up @@ -43,8 +43,49 @@ pip install litellm==1.75.5.post1

---

## Key Highlights

- **Redis - Latency Improvements** - Reduces P99 latency by 50% with Redis enabled.
- **Responses API Session Management** - Support for managing responses API sessions with images.
- **Oracle Cloud Infrastructure** - New LLM provider for calling models on Oracle Cloud Infrastructure.
- **Digital Ocean's Gradient AI** - New LLM provider for calling models on Digital Ocean's Gradient AI platform.


### Risk of Upgrade

If you build the proxy from the pip package, you should hold off on upgrading. This version makes `prisma migrate deploy` our default for managing the DB. This is safer, as it doesn't reset the DB, but it requires a manual `prisma generate` step.

Users of our Docker image, are **not** affected by this change.

---

## Redis Latency Improvements

<Image
img={require('../../img/release_notes/faster_caching_calls.png')}
style={{width: '100%', display: 'block', margin: '2rem auto'}}
/>

<br/>

This release adds in-memory caching for Redis requests, enabling faster response times in high-traffic. Now, LiteLLM instances will check their in-memory cache for a cache hit, before checking Redis. This reduces caching-related latency from 100ms for LLM API calls to sub-1ms, on cache hits.

---

## Responses API Session Management w/ Images

<Image
img={require('../../img/release_notes/responses_api_session_mgt_images.jpg')}
style={{width: '100%', display: 'block', margin: '2rem auto'}}
/>

<br/>

LiteLLM now supports session management for Responses API requests with images. This is great for use-cases like chatbots, that are using the Responses API to track the state of a conversation. LiteLLM session management works across **ALL** LLM API's (including Anthropic, Bedrock, OpenAI, etc). LiteLLM session management works by storing the request and response content in an s3 bucket, you can specify.

---


## New Models / Updated Models

#### New Model Support
Expand Down
Loading