Chunk tolerance annotation in streaming completion docs #5190
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Minor documentation changes requested by @ekzhu in #5078.
The user guide recommends users of the
create_stream
method (in the chat completion client classes) set theinclude_usage
option to ensure final token usage is sent by the server when the completion streaming finishes. However, the final chunk sent from the server containing the usage information is empty, and this fails a check for empty chunks in _openai_client.py. To retrieve usage and avoid inducing this error, users can adjust the chunk tolerance (max_consecutive_empty_chunk_tolerance
) when invoking the streaming completion. This PR adjusts the API docs and includes a suggested revision for the user guide as well.Related issue number
#5078
Checks