estimate token use before sending openai completions #1112
+71
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When setting
max_tokens
for services compliant with OpenAI python client the value passed to the client needs to be reduce to a maximum of the model's supported context length inclusive of the tokens in the prompt request.This revision validates the available context space before attempting to request inference with the following behaviors:
Please review with a eye to desired runtime behavior, should the run be terminated if a prompt from a probe exceeds the context length of the target model or should the run continue and simply log the skipped
Attempt
?Error reported as 400 response when context length of the model is exceeded:
Test example:
high_tokens_config.yaml:
Logged error: