-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add Streaming of Function Call Arguments to Chat Completions #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Streaming of Function Call Arguments to Chat Completions #999
Conversation
Hey @seratch @rm-openai! 👋 Would you mind taking a look at this PR? I've implemented real-time streaming for function call arguments (fixes #834). Basically, instead of waiting for the entire function call to complete, users now see the arguments being built up incrementally as the LLM generates them. This should make the experience much smoother for things like code generation and API building. Let me know what you think! 🚀 |
6687ab8
to
3ee4d63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your pull request. Left a few comments.
@@ -0,0 +1,222 @@ | |||
import asyncio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have examples like this, we prefer having responses api model. Also, this example is so good! but it's a bit lengthy. So, more simplified version would be better to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll optimize this example.
@@ -53,6 +53,9 @@ class StreamingState: | |||
refusal_content_index_and_output: tuple[int, ResponseOutputRefusal] | None = None | |||
reasoning_content_index_and_output: tuple[int, ResponseReasoningItem] | None = None | |||
function_calls: dict[int, ResponseFunctionToolCall] = field(default_factory=dict) | |||
# New fields for real-time function call streaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this pull request improves the compatibility of Chat Completions model for function call streaming events, right? If yes, focusing on the changes and considering the document changes (actually it uses responses api and it's already available) in a different pull request would be easier for us to check.
Summary
This PR implements real-time streaming of function call arguments as requested in #834. Previously, function call arguments were only emitted after the entire function call was complete, causing poor user experience for large parameter generation.
Changes
ChatCmplStreamHandler
: Added real-time streaming of function call arguments during generationCloses #834