-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: Migrate token usage extraction from native responses from legacy manual counting #3121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @MaxonZhao 's contribution, please double check the function we implemented
# 3. Check if slicing is necessary | ||
try: | ||
current_tokens = token_counter.count_tokens_from_messages( | ||
current_tokens = token_counter.extract_usage_from_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think extract_usage_from_response
would return dict
instead of int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thansk for contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can use existing method directly,the methods below already has been recorded the original usage,we can just use it ,then synchronize this usage to token_counter.
cc @Wendong-Fan @MaxonZhao @waleedalzarooni


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a empty file
int: Number of tokens in the messages. | ||
""" | ||
return self.token_counter.count_tokens_from_messages(messages) | ||
return self.token_counter.extract_usage_from_response(messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should return an int here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @MaxonZhao, very comprehensive updates!
Left a few comments, may seem like a lot but they only require slight tweaks to the implementation design, on the whole good job!
# 3. Check if slicing is necessary | ||
try: | ||
current_tokens = token_counter.count_tokens_from_messages( | ||
current_tokens = token_counter.extract_usage_from_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract_usage_from_response
expects a usage parameter to find token count, however, I believe the output of [message.to_openai_message(role)]
will not include a usage
field. Try passing a response object instead as this will include the usage field the counting mechanism requires.
""" | ||
counter = self.model_backend.token_counter | ||
content_token_count = counter.count_tokens_from_messages( | ||
content_token_count = counter.extract_usage_from_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will have the same issue as previous comment
|
||
token_count = self.token_counter.count_tokens_from_messages( | ||
token_count = self.token_counter.extract_usage_from_response( | ||
[record.memory_record.to_openai_message()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, also returns a list of messages with no usage field
|
||
message = first_record.memory_record.to_openai_message() | ||
tokens = self.token_counter.count_tokens_from_messages([message]) | ||
tokens = self.token_counter.extract_usage_from_response([message]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here
|
||
original_count_tokens = ( | ||
AnthropicTokenCounter.count_tokens_from_messages | ||
AnthropicTokenCounter.extract_usage_from_response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since extract_usage_from_response
processes responses directly from the API it doesn't require the whitespace patching that count_tokens_from_messages
required. If we are still keeping the old method I would revert this back to the original implementation as it is not necessary for the new response based method.
return self._completion_cost | ||
|
||
def extract_usage_from_response(self, response: Any) -> Optional[Dict[str, int]]: | ||
r"""Extract native usage data from LiteLLM response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these fields are identical to the OpenAI method simplify by creating an inheritance version
class LiteLLMTokenCounter(OpenAITokenCounter):
return stream_options.get("include_usage", False) | ||
elif provider.lower() in ["anthropic", "mistral"]: | ||
return True | ||
elif provider.lower() == "litellm": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May need to change this depending on whether LiteLLM approach is changed to inheritance based design
print(f"Model: {model.model_type}") | ||
print(f"Messages: {messages}") | ||
|
||
traditional_count = model.count_tokens_from_messages(messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a problem here, I think it's from the faulty count_tokens_from_messages(self, messages: List[OpenAIMessage]) -> int:
in base_model.py. Should be fixed once that comment is implemented!
pass | ||
|
||
try: | ||
from camel.utils import MistralTokenCounter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think mistral_common
is currently present in camel's dependencies, please add to pyproject.toml
and follow the instructions on adding dependencies in CONTRIBUTING.md
!
ANTHROPIC_AVAILABLE = False | ||
|
||
try: | ||
from camel.utils import MistralTokenCounter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same mistral_common
dependencies update as above
Fixes #3026
This PR migrates token counting system from manual tokenization to native usage data provided by LLM providers. This fundamental change addresses accuracy and reliability issues with manual token counting while also enabling proper streaming support. The implementation adds native usage data extraction capabilities to the
BaseTokenCounter
system, providing accurate token tracking across all major LLM providers.Key Changes Made:
Enhanced BaseTokenCounter with Streaming Methods:
extract_usage_from_streaming_response()
for synchronous streamsextract_usage_from_async_streaming_response()
for asynchronous streamsextract_usage_from_response()
abstract method for all providerscount_tokens_from_messages()
methodProvider-Specific Native Usage Extraction:
response.usage
whenstream_options: {"include_usage": true}
is enabledresponse.usage
(input_tokens, output_tokens)response.usage
with provider-agnostic handlingresponse.usage
(prompt_tokens, completion_tokens)Comprehensive Documentation:
STREAMING_TOKEN_COUNTING.md
: Complete guide for streaming token countingSTREAMING_API_REFERENCE.md
: API reference documentationOFFICIAL_STREAMING_API_REFERENCES.md
: Links to official provider documentationarchitectural_analysis.md
: Technical analysis and design rationaleExample Implementation:
streaming_token_counting_example.py
: Comprehensive examples showing:streaming_token_counting_utils.py
: Utility functions for configuration and validationRobust Error Handling:
Benefits Delivered:
Technical Implementation Details:
Streaming Usage Extraction Pattern:
Provider-Specific Configurations:
stream_options: {"include_usage": true}
Migration Path:
The implementation provides a clear deprecation path while maintaining full backward compatibility:
count_tokens_from_messages()
continues to work with deprecation warningsextract_usage_from_response()
methods provide accurate native dataFuture Work
This migration to native token counting opens up several opportunities for further enhancements:
Short-term Improvements
BaseTokenCounter
manual counting methods in a future major versionextract_usage_from_response()
for consistent Langfuse integration, following the pattern already established inlitellm_model.py
Medium-term Enhancements
Long-term Vision
Migration Roadmap
deprecation>=2.1.0,<3
- Added to main dependencies for deprecation warnings on legacy token counting methodspython-dotenv>=1.0.0
- Already included in optional dependency groups (owl
,eigent
) for example environment loadingThis migration from manual to native token counting significantly improves the accuracy and reliability of token usage tracking in CAMEL. The implementation provides a clear deprecation path for existing manual counting methods while enabling proper streaming support and comprehensive documentation for adopting the new native approach.
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.Fixes #issue-number
in the PR description (required)pyproject.toml
anduv lock