add tool and reasoning test by littlegy · Pull Request #4388 · InternLM/lmdeploy

littlegy · 2026-03-02T08:19:31Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

lvhan028 · 2026-03-06T11:41:25Z

autotest/utils/constant.py

-    'internlm/internlm2_5-20b-chat', 'internlm/internlm2_5-20b', 'Qwen/Qwen3-32B', 'OpenGVLab/InternVL3_5-30B-A3B',
-    'OpenGVLab/InternVL3-38B', 'Qwen/Qwen3-VL-8B-Instruct', 'internlm/internlm3-8b-instruct',
-    'meta-llama/Llama-3.2-3B-Instruct', 'Qwen/Qwen3-VL-30B-A3B-Instruct'
+    'Qwen/Qwen3-0.6B',


Should we split this list into legacy and latest versions?

Copilot

Pull request overview

Adds a reusable helper module and introduces comprehensive RESTful API tests for tool-calling and reasoning behavior across supported backends/models, while extending the model lists used to parametrize these tests.

Changes:

Added autotest/utils/tool_reasoning_definitions.py to share tool schemas, logging helpers, and stream aggregation utilities between test suites.
Added new RESTful test suites for tool calls (test_restful_tool_calls.py) and reasoning (test_restful_reasoning.py), including parser unit tests and API-level streaming/non-streaming checks.
Extended constants with a dedicated TOOL_REASONING_MODEL_LIST and reformatted RESTFUL_MODEL_LIST for readability.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
autotest/utils/tool_reasoning_definitions.py	Centralizes shared tool/reasoning test utilities (schemas, logging, stream collectors).
autotest/utils/constant.py	Adds `TOOL_REASONING_MODEL_LIST` and reformats `RESTFUL_MODEL_LIST`.
autotest/interface/restful/test_restful_tool_calls.py	Introduces extensive REST API tests validating tool-call behavior (streaming/non-streaming, tool_choice, parallel calls).
autotest/interface/restful/test_restful_reasoning.py	Adds reasoning API tests plus unit tests for reasoning parsers and token accounting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T11:45:32Z

autotest/utils/tool_reasoning_definitions.py

+    _original_create = client.chat.completions.create
+
+    def _logged_create(*args, **kwargs):
+        kwargs.setdefault('extra_body', dict(spaces_between_special_tokens=False))


make_logged_client uses kwargs.setdefault('extra_body', ...), so if the caller already passes extra_body (e.g. reasoning tests set enable_thinking), spaces_between_special_tokens=False is not applied even though the docstring says it is always injected. Consider merging into any existing extra_body dict and setting a default for spaces_between_special_tokens to keep behavior consistent across all calls.

Suggested change

kwargs.setdefault('extra_body', dict(spaces_between_special_tokens=False))

extra_body = kwargs.get('extra_body')

if extra_body is None:

kwargs['extra_body'] = {'spaces_between_special_tokens': False}

elif isinstance(extra_body, dict) and 'spaces_between_special_tokens' not in extra_body:

extra_body['spaces_between_special_tokens'] = False

Copilot · 2026-03-06T11:45:33Z

autotest/interface/restful/test_restful_tool_calls.py

+        # Per OpenAI spec: content should be null or empty when tool_calls
+        # are present.
+        if choice.message.content is not None:
+            assert choice.message.content.strip() == '', (f'content should be null/empty when tool_calls are '
+                                                          f'present, got: {choice.message.content!r}')


test_content_null_when_tool_calls_present enforces that message.content must be null/empty whenever tool_calls exist. However, the server-side protocol explicitly notes that content and tool calls can be returned together (rarely / model-dependent). This assertion will fail for models that intentionally emit a short natural-language preface alongside tool_calls. Consider loosening the check (e.g., only assert that tool_calls are present and well-formed, and optionally assert content does not contain tool-call markup) or gating this expectation behind a model/backend capability flag.

Suggested change

# Per OpenAI spec: content should be null or empty when tool_calls

# are present.

if choice.message.content is not None:

assert choice.message.content.strip() == '', (f'content should be null/empty when tool_calls are '

f'present, got: {choice.message.content!r}')

# Some models may return a short natural-language preface alongside

# tool_calls. We only enforce that any tool call markup is conveyed

# via the structured tool_calls field, not embedded in content.

if choice.message.content is not None:

assert '"tool_calls"' not in choice.message.content, (

f'message.content should not contain tool call markup when tool_calls are '

f'present, got: {choice.message.content!r}'

)

Copilot · 2026-03-06T11:45:33Z

autotest/interface/restful/test_restful_tool_calls.py

+    def test_empty_tools_list(self, backend, model_case):
+        """Empty tools list should behave like no tools."""
+        client, model_name = self._get_client()
+
+        try:
+            response = client.chat.completions.create(
+                model=model_name,
+                messages=MESSAGES_NO_TOOL_NEEDED,
+                temperature=0,
+                max_completion_tokens=100,
+                tools=[],
+                logprobs=False,
+            )
+
+            choice = response.choices[0]
+            assert choice.message.role == 'assistant'
+            assert choice.message.content is not None
+            assert (choice.message.tool_calls is None or len(choice.message.tool_calls) == 0)
+        except Exception:
+            # Some backends reject an empty tools list
+            pass


test_empty_tools_list catches a broad Exception and then passes, which will make the test succeed even on unexpected failures (e.g. server down, regression, assertion failures inside the try block). If the goal is to tolerate only backends that reject tools=[], catch the specific error type/status (e.g. OpenAI BadRequestError) and pytest.skip, otherwise re-raise so real failures aren’t hidden.

root and others added 5 commits February 10, 2026 16:52

add tool and reasoning test

edf17ff

fix lint

c6130a2

update

9c1e147

update

b7b2ee0

fix lint

bdd5e3a

lvhan028 requested review from Copilot and zhulinJulia24 March 6, 2026 11:39

Copilot started reviewing on behalf of lvhan028 March 6, 2026 11:40 View session

lvhan028 reviewed Mar 6, 2026

View reviewed changes

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tool and reasoning test#4388

add tool and reasoning test#4388
littlegy wants to merge 5 commits intoInternLM:mainfrom
littlegy:tool_call_test

littlegy commented Mar 2, 2026

Uh oh!

lvhan028 Mar 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        kwargs.setdefault('extra_body', dict(spaces_between_special_tokens=False))
+        extra_body = kwargs.get('extra_body')
+        if extra_body is None:
+            kwargs['extra_body'] = {'spaces_between_special_tokens': False}
+        elif isinstance(extra_body, dict) and 'spaces_between_special_tokens' not in extra_body:
+            extra_body['spaces_between_special_tokens'] = False

Conversation

littlegy commented Mar 2, 2026

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

lvhan028 Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lvhan028 Mar 6, 2026 •

edited

Loading