fix: Migrate token usage extraction from native responses from legacy manual counting #3121

MaxonZhao · 2025-09-10T09:13:02Z

This PR migrates token counting system from manual tokenization to native usage data provided by LLM providers. This fundamental change addresses accuracy and reliability issues with manual token counting while also enabling proper streaming support. The implementation adds native usage data extraction capabilities to the BaseTokenCounter system, providing accurate token tracking across all major LLM providers.

Key Changes Made:

Enhanced BaseTokenCounter with Streaming Methods:
- Added extract_usage_from_streaming_response() for synchronous streams
- Added extract_usage_from_async_streaming_response() for asynchronous streams
- Added extract_usage_from_response() abstract method for all providers
- Added deprecation warnings to count_tokens_from_messages() method
Provider-Specific Native Usage Extraction:
- OpenAI: Extracts usage from response.usage when stream_options: {"include_usage": true} is enabled
- Anthropic: Extracts usage from response.usage (input_tokens, output_tokens)
- LiteLLM: Extracts usage from response.usage with provider-agnostic handling
- Mistral: Extracts usage from response.usage (prompt_tokens, completion_tokens)
Comprehensive Documentation:
- STREAMING_TOKEN_COUNTING.md: Complete guide for streaming token counting
- STREAMING_API_REFERENCE.md: API reference documentation
- OFFICIAL_STREAMING_API_REFERENCES.md: Links to official provider documentation
- architectural_analysis.md: Technical analysis and design rationale
Example Implementation:
- streaming_token_counting_example.py: Comprehensive examples showing:
  - Basic streaming usage extraction
  - Async streaming patterns
  - Multi-provider configuration
  - Error handling and fallback strategies
  - Integration with existing CAMEL workflows
- streaming_token_counting_utils.py: Utility functions for configuration and validation
Robust Error Handling:
- Graceful handling of missing usage data
- Provider-specific error patterns
- Fallback mechanisms for unsupported configurations
- Debug logging for troubleshooting

Benefits Delivered:

Migration to Native Counting: Replaces error-prone manual tokenization with accurate provider-supplied usage data
100% Accuracy: Usage data comes directly from model providers, eliminating manual counting discrepancies
Streaming Support: Enables proper token counting in streaming mode, previously problematic with manual approaches
Universal Compatibility: Works with OpenAI, Anthropic, LiteLLM, and Mistral providers using their native response formats
Future-Proof: No need to update when providers change tokenization rules or add new models
Simplified Maintenance: Eliminates ~500+ lines of complex manual tokenization logic
Backward Compatibility: Existing code continues to work with deprecation warnings guiding migration

Technical Implementation Details:

Streaming Usage Extraction Pattern:

# Configure streaming with usage tracking
config = enable_streaming_usage_for_openai({"stream": True})
response = model.run(messages, **config)

# Extract accurate usage data
counter = model.token_counter
usage = counter.extract_usage_from_streaming_response(response)
# Returns: {'prompt_tokens': X, 'completion_tokens': Y, 'total_tokens': Z}

Provider-Specific Configurations:

OpenAI: Requires stream_options: {"include_usage": true}
Anthropic: Usage included by default in streaming responses
Mistral: Usage included by default in final chunk
LiteLLM: Proxies provider-specific usage data

Migration Path:
The implementation provides a clear deprecation path while maintaining full backward compatibility:

Existing count_tokens_from_messages() continues to work with deprecation warnings
New extract_usage_from_response() methods provide accurate native data
Comprehensive examples demonstrate migration patterns
Documentation explains benefits and usage patterns

Future Work

This migration to native token counting opens up several opportunities for further enhancements:

Short-term Improvements

Complete Deprecation: Remove legacy BaseTokenCounter manual counting methods in a future major version
Standardize Observability Integration: Update all model implementations (OpenAI, Anthropic, Mistral) to use extract_usage_from_response() for consistent Langfuse integration, following the pattern already established in litellm_model.py
Enhanced Error Handling: Improve fallback strategies when native usage data is unavailable
Additional Provider Support: Extend native usage extraction to emerging LLM providers

Medium-term Enhancements

Langfuse Integration: Leverage existing Langfuse observability features for advanced usage analytics and cost tracking
Real-time Usage Monitoring: Live token usage tracking during streaming responses
Cost Estimation: Integrate with provider pricing APIs for real-time cost calculations
Partial Usage Data: Support for incremental usage updates during long streaming responses

Long-term Vision

Universal Usage Interface: Standardized usage data format across all providers
Predictive Usage: ML-based token usage prediction for request planning
Usage Optimization: Automatic prompt optimization based on token efficiency

Migration Roadmap

Phase 1 (Current): Native usage extraction with deprecation warnings
Phase 2 (Next release): Integrate with existing observability systems (Langfuse) and update model implementations
Phase 3 (Future major version): Complete removal of manual counting methods
Phase 4 (Long-term): Advanced usage analytics and optimization features

Dependencies Added:
- deprecation>=2.1.0,<3 - Added to main dependencies for deprecation warnings on legacy token counting methods
- python-dotenv>=1.0.0 - Already included in optional dependency groups (owl, eigent) for example environment loading

This migration from manual to native token counting significantly improves the accuracy and reliability of token usage tracking in CAMEL. The implementation provides a clear deprecation path for existing manual counting methods while enabling proper streaming support and comprehensive documentation for adopting the new native approach.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

I have read the CONTRIBUTION guide (required)
I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
I have updated the tests accordingly (required for a bug fix or a new feature)
I have updated the documentation if needed:
I have added examples if this is a new feature

… manual counting

coderabbitai · 2025-09-10T09:13:10Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Wendong-Fan

thanks @MaxonZhao 's contribution, please double check the function we implemented

Wendong-Fan · 2025-09-17T23:40:21Z

camel/agents/chat_agent.py

        # 3. Check if slicing is necessary
        try:
-            current_tokens = token_counter.count_tokens_from_messages(
+            current_tokens = token_counter.extract_usage_from_response(


i think extract_usage_from_response would return dict instead of int?

fengju0213

thansk for contribution！

fengju0213

i think we can use existing method directly，the methods below already has been recorded the original usage，we can just use it ，then synchronize this usage to token_counter.

cc @Wendong-Fan @MaxonZhao @waleedalzarooni

fengju0213 · 2025-09-18T03:07:04Z

examples/token_counter/__init__.py

a empty file

fengju0213 · 2025-09-18T03:12:58Z

camel/models/base_model.py

            int: Number of tokens in the messages.
        """
-        return self.token_counter.count_tokens_from_messages(messages)
+        return self.token_counter.extract_usage_from_response(messages)


this should return an int here as well

waleedalzarooni

Great work @MaxonZhao, very comprehensive updates!

Left a few comments, may seem like a lot but they only require slight tweaks to the implementation design, on the whole good job!

waleedalzarooni · 2025-09-18T15:19:18Z

camel/agents/chat_agent.py

        # 3. Check if slicing is necessary
        try:
-            current_tokens = token_counter.count_tokens_from_messages(
+            current_tokens = token_counter.extract_usage_from_response(


extract_usage_from_response expects a usage parameter to find token count, however, I believe the output of [message.to_openai_message(role)] will not include a usage field. Try passing a response object instead as this will include the usage field the counting mechanism requires.

waleedalzarooni · 2025-09-18T15:24:32Z

camel/agents/repo_agent.py

        """
        counter = self.model_backend.token_counter
-        content_token_count = counter.count_tokens_from_messages(
+        content_token_count = counter.extract_usage_from_response(


I believe this will have the same issue as previous comment

waleedalzarooni · 2025-09-18T15:31:59Z

camel/memories/context_creators/score_based.py


-            token_count = self.token_counter.count_tokens_from_messages(
+            token_count = self.token_counter.extract_usage_from_response(
                [record.memory_record.to_openai_message()]


same as above, also returns a list of messages with no usage field

waleedalzarooni · 2025-09-18T15:34:26Z

camel/memories/context_creators/score_based.py


        message = first_record.memory_record.to_openai_message()
-        tokens = self.token_counter.count_tokens_from_messages([message])
+        tokens = self.token_counter.extract_usage_from_response([message])


same issue here

waleedalzarooni · 2025-09-18T15:49:14Z

camel/models/anthropic_model.py


        original_count_tokens = (
-            AnthropicTokenCounter.count_tokens_from_messages
+            AnthropicTokenCounter.extract_usage_from_response


Since extract_usage_from_response processes responses directly from the API it doesn't require the whitespace patching that count_tokens_from_messages required. If we are still keeping the old method I would revert this back to the original implementation as it is not necessary for the new response based method.

waleedalzarooni · 2025-09-18T16:37:49Z

camel/utils/token_counting.py

        return self._completion_cost

+    def extract_usage_from_response(self, response: Any) -> Optional[Dict[str, int]]:
+        r"""Extract native usage data from LiteLLM response.


Since these fields are identical to the OpenAI method simplify by creating an inheritance version
class LiteLLMTokenCounter(OpenAITokenCounter):

waleedalzarooni · 2025-09-18T17:20:45Z

examples/token_counter/streaming_token_counting_utils.py

+        return stream_options.get("include_usage", False)
+    elif provider.lower() in ["anthropic", "mistral"]:
+        return True
+    elif provider.lower() == "litellm":


May need to change this depending on whether LiteLLM approach is changed to inheritance based design

waleedalzarooni · 2025-09-18T17:34:06Z

examples/token_counter/token_counting_demo.py

+        print(f"Model: {model.model_type}")
+        print(f"Messages: {messages}")
+
+        traditional_count = model.count_tokens_from_messages(messages)


I'm getting a problem here, I think it's from the faulty count_tokens_from_messages(self, messages: List[OpenAIMessage]) -> int: in base_model.py. Should be fixed once that comment is implemented!

waleedalzarooni · 2025-09-18T17:44:36Z

test/utils/test_streaming_token_counting.py

+            pass
+
+try:
+    from camel.utils import MistralTokenCounter


I don't think mistral_common is currently present in camel's dependencies, please add to pyproject.toml and follow the instructions on adding dependencies in CONTRIBUTING.md!

waleedalzarooni · 2025-09-18T17:46:22Z

test/utils/test_token_counting.py

+    ANTHROPIC_AVAILABLE = False
+
+try:
+    from camel.utils import MistralTokenCounter


Same mistral_common dependencies update as above

fix: Migrate token usage extraction from native responses from legacy…

6543570

… manual counting

waleedalzarooni requested review from waleedalzarooni, Saedbhati, Wendong-Fan and fengju0213 September 17, 2025 14:46

Wendong-Fan requested changes Sep 17, 2025

View reviewed changes

fengju0213 reviewed Sep 18, 2025

View reviewed changes

fengju0213 assigned MaxonZhao Sep 18, 2025

fengju0213 and others added 2 commits September 18, 2025 11:42

Merge branch 'master' into token_counter_deprecation

df99d54

Merge branch 'master' into token_counter_deprecation

e18f26b

waleedalzarooni reviewed Sep 18, 2025

View reviewed changes

Merge branch 'master' into token_counter_deprecation

ebe55ca

fix: Migrate token usage extraction from native responses from legacy manual counting #3121

Are you sure you want to change the base?

fix: Migrate token usage extraction from native responses from legacy manual counting #3121

Uh oh!

Conversation

MaxonZhao commented Sep 10, 2025

Key Changes Made:

Benefits Delivered:

Technical Implementation Details:

Future Work

Short-term Improvements

Medium-term Enhancements

Long-term Vision

Migration Roadmap

Checklist

Uh oh!

coderabbitai bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Wendong-Fan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fengju0213 left a comment

Choose a reason for hiding this comment

Uh oh!

fengju0213 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waleedalzarooni left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Sep 10, 2025 •

edited

Loading

Wendong-Fan left a comment •

edited

Loading

fengju0213 left a comment •

edited

Loading