WIP: Feature/function calling update #2700

YAMY1234 · 2025-01-02T04:54:42Z

Pull Request Description

Summary

This pull request introduces streaming modes for function calling within the OpenAI API integration, and updated the non-streaming framework for better extensibility. The changes include:

New Features:
- Implementation of a FunctionCallParser for robust and efficient parsing of function calls in both streaming and non-streaming contexts.
- Added support for incremental streaming responses using the parse_streaming_increment method.
- Enhanced tooling support with structured parsing for tool calls, enabling seamless function integration with improved parameter handling.
Refactoring:
- Refactored openai_api/adapter.py to integrate streaming tool call parsing logic.
- Updated openai_api/protocol.py with additional models (ToolCallItem, DeltaMessage) to support streaming functionalities.
Documentation:
- Added detailed comments and docstrings for new classes and methods to enhance readability and maintainability.

Detailed Changes

docs/backend/function_calling_streaming.py:
- Added functionality to demonstrate streaming and non-streaming API calls with mock tool integrations.
- Included an example for handling tool calls and parsing streamed arguments incrementally.
python/sglang/srt/function_call_parser.py:
- Introduced FunctionCallParser, StreamingJSONParser, and related utility functions to handle function calls during streaming responses.
- Implemented logic for detecting and parsing incremental JSON inputs with robust error handling.
python/sglang/srt/openai_api/adapter.py:
- Integrated FunctionCallParser to enable real-time function call parsing during streaming response generation.
- Adjusted tool-related logic to align with the new structured tool parsing approach.
python/sglang/srt/openai_api/protocol.py:
- Modified FunctionResponse and ToolCall models to use Optional fields for compatibility with the new parser.
- Added ToolCallItem and DeltaMessage models to streamline the representation of parsed tool calls and response deltas.

Testing

Verified the functionality of streaming and non-streaming API calls using mock scenarios.
Validated the correctness of tool call parsing through tests and real-time simulations.

merrymercy · 2025-01-02T10:08:52Z

@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576

YAMY1234 · 2025-01-03T00:12:32Z

@HaoyuWang4188 @Tushar-ml @YAMY1234 Can you review each other's code? #2576

Sure! We’ll review each other’s code and collaborate to work out a great solution. 🚀

HaoyuWang4188 · 2025-01-03T05:52:52Z

Hi! After a general review, I would like to initiate some discussions to help we determine the best solution:

1. Support for parallel_tool_calls

OpenAI API supports parrallel_tool_calls option (default true) to determine whether LLM should output multiple tool calls at once.
In vLLM, this option is added but skipped like always treating parrallel_tool_calls=true (details).
In our current implementation, we have not considered this option in both #2544 and #2700. And the actual behaviour is summarized as follow :

Static API in #2544 (details)

parrallel_tool_calls=true for qwen2.5 (can output multiple tool calls at once)

parrallel_tool_calls=false for internlm2, llama3.1, llama3.2 (only output the first parsed tool call)

I tried to align these behaviours by set parrallel_tool_calls=false by default and force qwen2.5 to only output the first tool call in API level in link.

IMO, parallel_tool_calls should be supported in our function calling API (both static and stream).
And I suggest to support it in two steps:

Step1: For PR WIP: Feature/function calling update #2700, I recommand @YAMY1234 @Thunderbeee to support parrallel_tool_calls=false for streaming API in this PR (no need to add this option, just treat it like it is false). And I will change [Feature] Streaming API for Function Calling #2576 to be a support PR to always set parallel_tool_calls option to false and align the behaviour of static API of all supported models.
Step2: Then we can raise a new PR to fully support these 4 conditions (static/streaming x parallel_tool_calls=true/false) and add corresponding tests.

2. Aligning Terms of model names
In #2544, we uses these names in link

Name	Special Token (i.e. `bot_token`)
Llama 3.2	`<\|python_tag\|>`
Llama 3.1	`<function=`
Qwen 2.5	`<tool_call>`
InternLM	`<\|plugin\|>`

I prefer to change Llama 3.1/3.2 into Llama 3.1+ (since Llama3.3 also shares the same pattern) and use terms JSON-based and User-defined from Meta's doc for clarification, because Llama 3.2 adds no new function calling support from training phase and both <|python_tag|> and <function= is supported from 3.1.

HaoyuWang4188 · 2025-01-03T05:59:57Z

Please let me know your thoughts on the above discussion. @YAMY1234 @Thunderbeee @Tushar-ml @merrymercy
If you guys agree with my suggestions, I will proceed with the proposed plan. Thank you! 🚀

YAMY1234 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners January 2, 2025 04:54

YAMY1234 changed the title ~~Feature/function calling update~~ WIP: Feature/function calling update Jan 2, 2025

Tushar-ml mentioned this pull request Jan 2, 2025

[Feature] Streaming API for Function Calling #2576

Draft

3 tasks

YAMY1234 requested a review from HaiShaw as a code owner January 2, 2025 19:59

YAMY1234 force-pushed the feature/function-calling-update branch 2 times, most recently from 99bb21f to 25a03b0 Compare January 2, 2025 20:09

YAMY1234 and others added 6 commits January 2, 2025 20:09

feat: support streaming function calling

c3d5e98

fix: streaming function output

cb6289f

fix: merge issue

42a7bce

add Qwen25Detector

ce73c88

update: function call parser dependency

1e13504

add Mistral and merge with lym

63c3d4e

YAMY1234 force-pushed the feature/function-calling-update branch from 25a03b0 to 63c3d4e Compare January 2, 2025 20:10

update: unit test

f7de1ff

fix: function_calling_streaming.ipynb

63d86c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Feature/function calling update #2700

WIP: Feature/function calling update #2700

YAMY1234 commented Jan 2, 2025

merrymercy commented Jan 2, 2025

YAMY1234 commented Jan 3, 2025

HaoyuWang4188 commented Jan 3, 2025 •

edited

Loading

HaoyuWang4188 commented Jan 3, 2025

WIP: Feature/function calling update #2700

Are you sure you want to change the base?

WIP: Feature/function calling update #2700

Conversation

YAMY1234 commented Jan 2, 2025

Pull Request Description

Summary

Detailed Changes

Testing

merrymercy commented Jan 2, 2025

YAMY1234 commented Jan 3, 2025

HaoyuWang4188 commented Jan 3, 2025 • edited Loading

HaoyuWang4188 commented Jan 3, 2025

HaoyuWang4188 commented Jan 3, 2025 •

edited

Loading