Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Feature/function calling update #2700

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

YAMY1234
Copy link
Contributor

@YAMY1234 YAMY1234 commented Jan 2, 2025

Pull Request Description

Summary

This pull request introduces streaming modes for function calling within the OpenAI API integration, and updated the non-streaming framework for better extensibility. The changes include:

  1. New Features:

    • Implementation of a FunctionCallParser for robust and efficient parsing of function calls in both streaming and non-streaming contexts.
    • Added support for incremental streaming responses using the parse_streaming_increment method.
    • Enhanced tooling support with structured parsing for tool calls, enabling seamless function integration with improved parameter handling.
  2. Refactoring:

    • Refactored openai_api/adapter.py to integrate streaming tool call parsing logic.
    • Updated openai_api/protocol.py with additional models (ToolCallItem, DeltaMessage) to support streaming functionalities.
  3. Documentation:

    • Added detailed comments and docstrings for new classes and methods to enhance readability and maintainability.

Detailed Changes

  • docs/backend/function_calling_streaming.py:

    • Added functionality to demonstrate streaming and non-streaming API calls with mock tool integrations.
    • Included an example for handling tool calls and parsing streamed arguments incrementally.
  • python/sglang/srt/function_call_parser.py:

    • Introduced FunctionCallParser, StreamingJSONParser, and related utility functions to handle function calls during streaming responses.
    • Implemented logic for detecting and parsing incremental JSON inputs with robust error handling.
  • python/sglang/srt/openai_api/adapter.py:

    • Integrated FunctionCallParser to enable real-time function call parsing during streaming response generation.
    • Adjusted tool-related logic to align with the new structured tool parsing approach.
  • python/sglang/srt/openai_api/protocol.py:

    • Modified FunctionResponse and ToolCall models to use Optional fields for compatibility with the new parser.
    • Added ToolCallItem and DeltaMessage models to streamline the representation of parsed tool calls and response deltas.

Testing

  • Verified the functionality of streaming and non-streaming API calls using mock scenarios.
  • Validated the correctness of tool call parsing through tests and real-time simulations.

@YAMY1234 YAMY1234 changed the title Feature/function calling update WIP: Feature/function calling update Jan 2, 2025
@merrymercy
Copy link
Contributor

@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576

@YAMY1234 YAMY1234 requested a review from HaiShaw as a code owner January 2, 2025 19:59
@YAMY1234 YAMY1234 force-pushed the feature/function-calling-update branch 2 times, most recently from 99bb21f to 25a03b0 Compare January 2, 2025 20:09
@YAMY1234 YAMY1234 force-pushed the feature/function-calling-update branch from 25a03b0 to 63c3d4e Compare January 2, 2025 20:10
@YAMY1234
Copy link
Contributor Author

YAMY1234 commented Jan 3, 2025

@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576

Sure! We’ll review each other’s code and collaborate to work out a great solution. 🚀

@HaoyuWang4188
Copy link
Contributor

HaoyuWang4188 commented Jan 3, 2025

Hi! After a general review, I would like to initiate some discussions to help we determine the best solution:

1. Support for parallel_tool_calls

OpenAI API supports parrallel_tool_calls option (default true) to determine whether LLM should output multiple tool calls at once.
In vLLM, this option is added but skipped like always treating parrallel_tool_calls=true (details).
In our current implementation, we have not considered this option in both #2544 and #2700. And the actual behaviour is summarized as follow :

Static API in #2544 (details)

  • parrallel_tool_calls=true for qwen2.5 (can output multiple tool calls at once)
  • parrallel_tool_calls=false for internlm2, llama3.1, llama3.2 (only output the first parsed tool call)

I tried to align these behaviours by set parrallel_tool_calls=false by default and force qwen2.5 to only output the first tool call in API level in link.

IMO, parallel_tool_calls should be supported in our function calling API (both static and stream).
And I suggest to support it in two steps:

2. Aligning Terms of model names
In #2544, we uses these names in link

Name Special Token (i.e. bot_token)
Llama 3.2 <|python_tag|>
Llama 3.1 <function=
Qwen 2.5 <tool_call>
InternLM <|plugin|>

I prefer to change Llama 3.1/3.2 into Llama 3.1+ (since Llama3.3 also shares the same pattern) and use terms JSON-based and User-defined from Meta's doc for clarification, because Llama 3.2 adds no new function calling support from training phase and both <|python_tag|> and <function= is supported from 3.1.

@HaoyuWang4188
Copy link
Contributor

Please let me know your thoughts on the above discussion. @YAMY1234 @Thunderbeee @Tushar-ml @merrymercy
If you guys agree with my suggestions, I will proceed with the proposed plan. Thank you! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants