Skip to content

feat: meter Gemini thinking tokens and grounding requests#3178

Open
Aaryan-Dadu wants to merge 1 commit into
HeyPuter:mainfrom
Aaryan-Dadu:feat/3132
Open

feat: meter Gemini thinking tokens and grounding requests#3178
Aaryan-Dadu wants to merge 1 commit into
HeyPuter:mainfrom
Aaryan-Dadu:feat/3132

Conversation

@Aaryan-Dadu

@Aaryan-Dadu Aaryan-Dadu commented May 28, 2026

Copy link
Copy Markdown

Summary

  • Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model specific rate.
  • Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses.
  • Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.

Closes #3132

Test

  • All pre-existing tests pass.
  • 5 unit tests for the corresponding changes have been added
  • 4 pre-existing test assertions updated to include thinking_tokens: 0 and grounding_requests: 0 in the expected usage shapes
Screenshot From 2026-05-28 15-12-15

- Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model-specific rate.
- Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses.
- Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.
@CLAassistant

CLAassistant commented May 28, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@ProgrammerIn-wonderland

Copy link
Copy Markdown
Collaborator

are thinking tokens not already included in output tokens in the usage object?

@Aaryan-Dadu

Aaryan-Dadu commented May 29, 2026

Copy link
Copy Markdown
Author

are thinking tokens not already included in output tokens in the usage object?

Yes they are already included but we split them because they are billed at different rates, like this: thinking_rate*thinking_tokens + standard_rate*(completion_tokens - thinking_tokens)

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Gemini metering in the AI chat driver to correctly account for Gemini “thinking” tokens (billed at a distinct rate) and to add flat-fee metering for grounded Google Search requests by detecting grounding_metadata in both streaming and non-streaming responses.

Changes:

  • Split Gemini reasoning_tokens (“thinking tokens”) out of completion_tokens and meter each at its own model-specific rate.
  • Add grounding_requests usage metering (1 per response when grounding_metadata is present) for streaming and non-streaming Gemini completions.
  • Update Gemini model pricing entries to include thinking_tokens and grounding_requests rates (and refresh some existing token rates).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/backend/drivers/ai-chat/utils/OpenAIUtil.js Captures streamed extra_content and forwards it into the usage calculator for provider-specific metering.
src/backend/drivers/ai-chat/providers/gemini/models.ts Adds/updates Gemini cost keys for thinking_tokens and grounding_requests (and adjusts some stale token rates).
src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.ts Implements Gemini-specific usage shaping: cached token exclusion, thinking token split, and grounding request detection.
src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.test.ts Updates expected usage shapes and adds unit tests for thinking-token and grounding-request metering (streaming + non-streaming).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +106 to +109
// Cast to access Gemini-specific extras passed alongside usage:
// - choices: non-stream grounding metadata lives in choices[0].message.extra_content
// - extra_content: streaming grounding metadata accumulated by the stream handler
const { usage, choices, extra_content } = args as {
Comment on lines 285 to +289
// Gemini specific thing for metadata, we will basically be appending onto the current message by abusing .addText a little
// Apps have to choose to handle extra_content themselves, it doesn't seem like theres a way we can do it in a backwards
// compatible fashion since most streaming apps will handle chat history by continuously updating content themselves
// This doesn't present us a chance to add in an extra object for gemini's chat continuing features
last_extra_content = choice.delta.extra_content;
@Salazareo

Copy link
Copy Markdown
Member

@ProgrammerIn-wonderland is this mergable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate & possible fix metering for gemini models search and caching

5 participants