High Open AI Request Count & Token usage #1103

codedash07 · 2024-09-16T09:00:35Z

Hello There,

This may not be specifically a bug, but there seems to be some issue.
For a single chat request sent from bot UI, metrics report that 3 or 3x Open AI requests were made. Also the token usage is quite high (3 times than the expected token count).
I am using AI Search Index as data source for chatbot.

I am using default settings mostly, below is the configuration JSON being sent to the model at the time of invocation.

{'messages': [{'role': 'user', 'content': 'what can you answer?'}], 'temperature': 0.0, 'max_tokens': 4096, 'top_p': 1.0, 'stop': None, 'stream': True, 'model': 'gpt4o', 'user': '{"EndUserId": "00000000-0000-0000-0000-000000000000", "EndUserIdType": "EntraId", "SourceIp": "127.0.0.1"}', 'extra_body': {'data_sources': [{'type': 'azure_search', 'parameters': {'top_n_documents': 1, 'strictness': 1, 'in_scope': True, 'index_name': 'vector-1718086437222', 'semantic_configuration': 'vector-1718086437222-semantic-configuration', 'query_type': 'vector_simple_hybrid', 'endpoint': '
https://instance.search.windows.net'
, 'authentication': {'type': 'api_key', 'key': 'abc21321abc'}, 'embedding_dependency': {'type': 'deployment_name', 'deployment_name': 'text-embedding-ada-002'}, 'fields_mapping': {'content_fields': ['chunk'], 'title_field': 'chunk_id', 'url_field': 'metadata_storage_path', 'filepath_field': 'title', 'vector_fields': ['text_vector']}, 'allow_partial_result': False, 'include_contexts': ['citations', 'intent'], 'role_information': "You are a professional AI support assistant."}}]}}

What could be the possible fix to this problem.
Thank in advance!!!

filipafcastro · 2024-10-19T16:44:22Z

I'm having exactly the same issue: https://learn.microsoft.com/en-us/answers/questions/2103832/high-token-consumption-in-azure-openai-with-your-d. And I also found this post reporting the same: https://stackoverflow.com/questions/78779006/why-is-the-consumption-of-openai-tokens-in-azure-hybrid-search-100x-higher-in-co.

@codedash07 were you able to understand and/or solve the issue?

master-fury · 2024-11-14T04:18:29Z

Is there any fix for this issue? I'm getting frequent rate limit errors.

reminegrier · 2024-12-04T10:47:24Z

Hello, facing the same issue, but according to @filipafcastro 's threads, the only solution seems to be the reduction of chunk size, or top_n_documents, wich indicates how many documents are sent to the LLM to perform the RAG ...

And that actually makes sense, understanding how a RAG model works.

The main challenge therefore seems to minimize these parameters, while maintaining high precision on the responses ...

codedash07 added the bug Something isn't working label Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Open AI Request Count & Token usage #1103

High Open AI Request Count & Token usage #1103

codedash07 commented Sep 16, 2024 •

edited

Loading

filipafcastro commented Oct 19, 2024

master-fury commented Nov 14, 2024

reminegrier commented Dec 4, 2024

High Open AI Request Count & Token usage #1103

High Open AI Request Count & Token usage #1103

Comments

codedash07 commented Sep 16, 2024 • edited Loading

filipafcastro commented Oct 19, 2024

master-fury commented Nov 14, 2024

reminegrier commented Dec 4, 2024

codedash07 commented Sep 16, 2024 •

edited

Loading