feat: meter Gemini thinking tokens and grounding requests by Aaryan-Dadu · Pull Request #3178 · HeyPuter/puter

Aaryan-Dadu · 2026-05-28T09:53:11Z

Summary

Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model specific rate.
Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses.
Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.

Test

All pre-existing tests pass.
5 unit tests for the corresponding changes have been added
4 pre-existing test assertions updated to include thinking_tokens: 0 and grounding_requests: 0 in the expected usage shapes

- Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model-specific rate. - Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses. - Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.

CLAassistant · 2026-05-28T09:53:20Z

All committers have signed the CLA.

ProgrammerIn-wonderland · 2026-05-29T18:48:34Z

are thinking tokens not already included in output tokens in the usage object?

Aaryan-Dadu · 2026-05-29T19:02:54Z

are thinking tokens not already included in output tokens in the usage object?

Yes they are already included but we split them because they are billed at different rates, like this: thinking_rate*thinking_tokens + standard_rate*(completion_tokens - thinking_tokens)

Copilot

Pull request overview

This PR updates Gemini metering in the AI chat driver to correctly account for Gemini “thinking” tokens (billed at a distinct rate) and to add flat-fee metering for grounded Google Search requests by detecting grounding_metadata in both streaming and non-streaming responses.

Changes:

Split Gemini reasoning_tokens (“thinking tokens”) out of completion_tokens and meter each at its own model-specific rate.
Add grounding_requests usage metering (1 per response when grounding_metadata is present) for streaming and non-streaming Gemini completions.
Update Gemini model pricing entries to include thinking_tokens and grounding_requests rates (and refresh some existing token rates).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/backend/drivers/ai-chat/utils/OpenAIUtil.js`	Captures streamed `extra_content` and forwards it into the usage calculator for provider-specific metering.
`src/backend/drivers/ai-chat/providers/gemini/models.ts`	Adds/updates Gemini cost keys for `thinking_tokens` and `grounding_requests` (and adjusts some stale token rates).
`src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.ts`	Implements Gemini-specific usage shaping: cached token exclusion, thinking token split, and grounding request detection.
`src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.test.ts`	Updates expected usage shapes and adds unit tests for thinking-token and grounding-request metering (streaming + non-streaming).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                // Cast to access Gemini-specific extras passed alongside usage:
+                // - choices: non-stream grounding metadata lives in choices[0].message.extra_content
+                // - extra_content: streaming grounding metadata accumulated by the stream handler
+                const { usage, choices, extra_content } = args as {


                // Gemini specific thing for metadata, we will basically be appending onto the current message by abusing .addText a little
                // Apps have to choose to handle extra_content themselves, it doesn't seem like theres a way we can do it in a backwards
                // compatible fashion since most streaming apps will handle chat history by continuously updating content themselves
                // This doesn't present us a chance to add in an extra object for gemini's chat continuing features
+                last_extra_content = choice.delta.extra_content;


Salazareo · 2026-06-10T00:04:58Z

@ProgrammerIn-wonderland is this mergable?

Aaryan-Dadu mentioned this pull request May 28, 2026

Investigate & possible fix metering for gemini models search and caching #3132

Open

ProgrammerIn-wonderland self-assigned this May 28, 2026

Salazareo requested a review from Copilot June 7, 2026 01:33

Copilot started reviewing on behalf of Salazareo June 7, 2026 01:33 View session

Copilot AI reviewed Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: meter Gemini thinking tokens and grounding requests#3178

feat: meter Gemini thinking tokens and grounding requests#3178
Aaryan-Dadu wants to merge 1 commit into
HeyPuter:mainfrom
Aaryan-Dadu:feat/3132

Aaryan-Dadu commented May 28, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 28, 2026 •

edited

Loading

Uh oh!

ProgrammerIn-wonderland commented May 29, 2026

Uh oh!

Aaryan-Dadu commented May 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Salazareo commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Aaryan-Dadu commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test

Uh oh!

CLAassistant commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProgrammerIn-wonderland commented May 29, 2026

Uh oh!

Aaryan-Dadu commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Salazareo commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Aaryan-Dadu commented May 28, 2026 •

edited

Loading

CLAassistant commented May 28, 2026 •

edited

Loading

Aaryan-Dadu commented May 29, 2026 •

edited

Loading