feat: meter Gemini thinking tokens and grounding requests#3178
feat: meter Gemini thinking tokens and grounding requests#3178Aaryan-Dadu wants to merge 1 commit into
Conversation
- Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model-specific rate. - Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses. - Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.
|
are thinking tokens not already included in output tokens in the usage object? |
Yes they are already included but we split them because they are billed at different rates, like this: |
There was a problem hiding this comment.
Pull request overview
This PR updates Gemini metering in the AI chat driver to correctly account for Gemini “thinking” tokens (billed at a distinct rate) and to add flat-fee metering for grounded Google Search requests by detecting grounding_metadata in both streaming and non-streaming responses.
Changes:
- Split Gemini
reasoning_tokens(“thinking tokens”) out ofcompletion_tokensand meter each at its own model-specific rate. - Add
grounding_requestsusage metering (1 per response whengrounding_metadatais present) for streaming and non-streaming Gemini completions. - Update Gemini model pricing entries to include
thinking_tokensandgrounding_requestsrates (and refresh some existing token rates).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/backend/drivers/ai-chat/utils/OpenAIUtil.js |
Captures streamed extra_content and forwards it into the usage calculator for provider-specific metering. |
src/backend/drivers/ai-chat/providers/gemini/models.ts |
Adds/updates Gemini cost keys for thinking_tokens and grounding_requests (and adjusts some stale token rates). |
src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.ts |
Implements Gemini-specific usage shaping: cached token exclusion, thinking token split, and grounding request detection. |
src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.test.ts |
Updates expected usage shapes and adds unit tests for thinking-token and grounding-request metering (streaming + non-streaming). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Cast to access Gemini-specific extras passed alongside usage: | ||
| // - choices: non-stream grounding metadata lives in choices[0].message.extra_content | ||
| // - extra_content: streaming grounding metadata accumulated by the stream handler | ||
| const { usage, choices, extra_content } = args as { |
| // Gemini specific thing for metadata, we will basically be appending onto the current message by abusing .addText a little | ||
| // Apps have to choose to handle extra_content themselves, it doesn't seem like theres a way we can do it in a backwards | ||
| // compatible fashion since most streaming apps will handle chat history by continuously updating content themselves | ||
| // This doesn't present us a chance to add in an extra object for gemini's chat continuing features | ||
| last_extra_content = choice.delta.extra_content; |
|
@ProgrammerIn-wonderland is this mergable? |
Summary
grounding_metadataacross both streaming and non-streaming responses.Closes #3132
Test
thinking_tokens: 0andgrounding_requests: 0in the expected usage shapes