Skip to content

Add Streaming of Function Call Arguments to Chat Completions #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

devtalker
Copy link
Contributor

Summary

This PR implements real-time streaming of function call arguments as requested in #834. Previously, function call arguments were only emitted after the entire function call was complete, causing poor user experience for large parameter generation.

Changes

  • Enhanced ChatCmplStreamHandler: Added real-time streaming of function call arguments during generation
  • New streaming logic: Function call arguments now stream incrementally as they are generated, similar to text content
  • Backward compatibility: Maintains existing behavior for completed function calls
  • Comprehensive testing: Added tests for both OpenAI and LiteLLM models
  • Example implementation: Created demonstration code showing the new streaming capability

Closes #834

@devtalker devtalker marked this pull request as ready for review July 3, 2025 11:01
@devtalker devtalker changed the title # Support Streaming of Function Call Arguments Support Streaming of Function Call Arguments Jul 3, 2025
@devtalker
Copy link
Contributor Author

Hey @seratch @rm-openai! 👋

Would you mind taking a look at this PR? I've implemented real-time streaming for function call arguments (fixes #834). Basically, instead of waiting for the entire function call to complete, users now see the arguments being built up incrementally as the LLM generates them. This should make the experience much smoother for things like code generation and API building.

Let me know what you think! 🚀

@devtalker devtalker force-pushed the feature/function-call-args-streaming branch from 6687ab8 to 3ee4d63 Compare July 7, 2025 07:14
@seratch seratch requested review from seratch and rm-openai July 8, 2025 13:16
Copy link
Member

@seratch seratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pull request. Left a few comments.

@@ -0,0 +1,222 @@
import asyncio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have examples like this, we prefer having responses api model. Also, this example is so good! but it's a bit lengthy. So, more simplified version would be better to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll optimize this example.

@@ -53,6 +53,9 @@ class StreamingState:
refusal_content_index_and_output: tuple[int, ResponseOutputRefusal] | None = None
reasoning_content_index_and_output: tuple[int, ResponseReasoningItem] | None = None
function_calls: dict[int, ResponseFunctionToolCall] = field(default_factory=dict)
# New fields for real-time function call streaming
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this pull request improves the compatibility of Chat Completions model for function call streaming events, right? If yes, focusing on the changes and considering the document changes (actually it uses responses api and it's already available) in a different pull request would be easier for us to check.

@seratch seratch changed the title Support Streaming of Function Call Arguments Add Streaming of Function Call Arguments to Chat Completions Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Streaming of Function Call Arguments
2 participants