Skip to content

Prompt caching for Claude #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

tpaulshippy
Copy link
Contributor

@tpaulshippy tpaulshippy commented Jun 9, 2025

What this does

Support prompt caching in both Anthropic and Bedrock providers for Claude models that support it.

Caching system prompts:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.cache_prompts(system: true)

Caching user prompts:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.cache_prompts(system: true, user: true)
chat.ask("What is the capital of France?")

Caching tool definitions:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.with_tool(MyTool)
chat.cache_prompts(tools: true)
chat.ask("What is the capital of France?")

Type of change

  • New feature

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • New public methods/classes

Related issues

Resolves #13

@tpaulshippy tpaulshippy changed the title Prompt caching Prompt caching for Claude Jun 9, 2025
@tpaulshippy tpaulshippy marked this pull request as ready for review June 9, 2025 21:44
@tpaulshippy
Copy link
Contributor Author

@crmne As I don't have an Anthropic key, I'll need you to generate the VCR cartridges for that provider. Hoping everything just works, but let me know if not.

@crmne
Copy link
Owner

crmne commented Jun 11, 2025

@tpaulshippy this would be great to have! Will you be willing to enable it on all providers?

I'll do a proper review when I can.

@tpaulshippy
Copy link
Contributor Author

My five minutes of research indicates that at least OpenAI and Gemini take the approach of automatically caching for you based on the size and structure of your request. So the only support I think we'd really need for those two is to populate the cached token counts on the response messages. Unless we want to try to support explicit caching on the Gemini API but that looks complex and not as commonly needed.

Do you know of other providers that require payload changes for prompt caching?

def with_cache_control(hash, cache: false)
return hash unless cache

hash.merge(cache_control: { type: 'ephemeral' })
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing this might cause errors on older models that do not support caching. If it does, we could raise here, or just let the API validation handle it. I'm torn on whether the capabilities check complexity is worth it as these models are probably so rarely used.

@tpaulshippy
Copy link
Contributor Author

@crmne As I don't have an Anthropic key, I'll need you to generate the VCR cartridges for that provider. Hoping everything just works, but let me know if not.

Scratch that. I decided to stop being a cheapskate and just pay Anthropic their $5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support prompt caching
2 participants