-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this package capable of calculating tokens for OpenAI assistant mode and more advanced chats? #58
Comments
|
Thank you very much for your response. This feature is already great. As for your mention that GPT-4O currently does not have a dedicated token segmentation tool, could we change the model to cl100k as follows? import tiktoken encoding = tiktoken.encoding_for_model("cl100k") token_contents = len(encoding.encode(contents)) |
Can you explain what you mean? We currently use cl100k as a fallback: Line 101 in e1d52db
|
Yes, I later changed the model name in this code segment to cl100k_base, but I encountered an error when running the program. I'm not sure where the issue in my code is. Thank you. import tiktoken The error message is: |
Hello,
I noticed that the code package you wrote is very impressive. However, is it only capable of counting tokens for regular simple chats?
I saw your code requires the input prompt to include "role", "user", and "content" strings.....
message_prompt = [{ "role": "user", "content": "Hello world"}]
If using the assistant mode with instructions, file search, and uploading files to vector stores for RAG, the calculation might be more complex.
Are the token calculation methods for gpt-4-1106-preview and gpt4o the same?
I checked the tokenizer on the official website, but the tokenizer for gpt4o is not yet available:
https://platform.openai.com/tokenizer
Currently, my code for calculating tokens is as follows. Is this correct?
Thank you.
import tiktoken encoding = tiktoken.encoding_for_model("gpt-4-1106-preview") token_contents = len(encoding.encode(contents))
The text was updated successfully, but these errors were encountered: