Conversation
| 'internlm/internlm2_5-20b-chat', 'internlm/internlm2_5-20b', 'Qwen/Qwen3-32B', 'OpenGVLab/InternVL3_5-30B-A3B', | ||
| 'OpenGVLab/InternVL3-38B', 'Qwen/Qwen3-VL-8B-Instruct', 'internlm/internlm3-8b-instruct', | ||
| 'meta-llama/Llama-3.2-3B-Instruct', 'Qwen/Qwen3-VL-30B-A3B-Instruct' | ||
| 'Qwen/Qwen3-0.6B', |
There was a problem hiding this comment.
Should we split this list into legacy and latest versions?
There was a problem hiding this comment.
Pull request overview
Adds a reusable helper module and introduces comprehensive RESTful API tests for tool-calling and reasoning behavior across supported backends/models, while extending the model lists used to parametrize these tests.
Changes:
- Added
autotest/utils/tool_reasoning_definitions.pyto share tool schemas, logging helpers, and stream aggregation utilities between test suites. - Added new RESTful test suites for tool calls (
test_restful_tool_calls.py) and reasoning (test_restful_reasoning.py), including parser unit tests and API-level streaming/non-streaming checks. - Extended constants with a dedicated
TOOL_REASONING_MODEL_LISTand reformattedRESTFUL_MODEL_LISTfor readability.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| autotest/utils/tool_reasoning_definitions.py | Centralizes shared tool/reasoning test utilities (schemas, logging, stream collectors). |
| autotest/utils/constant.py | Adds TOOL_REASONING_MODEL_LIST and reformats RESTFUL_MODEL_LIST. |
| autotest/interface/restful/test_restful_tool_calls.py | Introduces extensive REST API tests validating tool-call behavior (streaming/non-streaming, tool_choice, parallel calls). |
| autotest/interface/restful/test_restful_reasoning.py | Adds reasoning API tests plus unit tests for reasoning parsers and token accounting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _original_create = client.chat.completions.create | ||
|
|
||
| def _logged_create(*args, **kwargs): | ||
| kwargs.setdefault('extra_body', dict(spaces_between_special_tokens=False)) |
There was a problem hiding this comment.
make_logged_client uses kwargs.setdefault('extra_body', ...), so if the caller already passes extra_body (e.g. reasoning tests set enable_thinking), spaces_between_special_tokens=False is not applied even though the docstring says it is always injected. Consider merging into any existing extra_body dict and setting a default for spaces_between_special_tokens to keep behavior consistent across all calls.
| kwargs.setdefault('extra_body', dict(spaces_between_special_tokens=False)) | |
| extra_body = kwargs.get('extra_body') | |
| if extra_body is None: | |
| kwargs['extra_body'] = {'spaces_between_special_tokens': False} | |
| elif isinstance(extra_body, dict) and 'spaces_between_special_tokens' not in extra_body: | |
| extra_body['spaces_between_special_tokens'] = False |
| # Per OpenAI spec: content should be null or empty when tool_calls | ||
| # are present. | ||
| if choice.message.content is not None: | ||
| assert choice.message.content.strip() == '', (f'content should be null/empty when tool_calls are ' | ||
| f'present, got: {choice.message.content!r}') |
There was a problem hiding this comment.
test_content_null_when_tool_calls_present enforces that message.content must be null/empty whenever tool_calls exist. However, the server-side protocol explicitly notes that content and tool calls can be returned together (rarely / model-dependent). This assertion will fail for models that intentionally emit a short natural-language preface alongside tool_calls. Consider loosening the check (e.g., only assert that tool_calls are present and well-formed, and optionally assert content does not contain tool-call markup) or gating this expectation behind a model/backend capability flag.
| # Per OpenAI spec: content should be null or empty when tool_calls | |
| # are present. | |
| if choice.message.content is not None: | |
| assert choice.message.content.strip() == '', (f'content should be null/empty when tool_calls are ' | |
| f'present, got: {choice.message.content!r}') | |
| # Some models may return a short natural-language preface alongside | |
| # tool_calls. We only enforce that any tool call markup is conveyed | |
| # via the structured tool_calls field, not embedded in content. | |
| if choice.message.content is not None: | |
| assert '"tool_calls"' not in choice.message.content, ( | |
| f'message.content should not contain tool call markup when tool_calls are ' | |
| f'present, got: {choice.message.content!r}' | |
| ) |
| def test_empty_tools_list(self, backend, model_case): | ||
| """Empty tools list should behave like no tools.""" | ||
| client, model_name = self._get_client() | ||
|
|
||
| try: | ||
| response = client.chat.completions.create( | ||
| model=model_name, | ||
| messages=MESSAGES_NO_TOOL_NEEDED, | ||
| temperature=0, | ||
| max_completion_tokens=100, | ||
| tools=[], | ||
| logprobs=False, | ||
| ) | ||
|
|
||
| choice = response.choices[0] | ||
| assert choice.message.role == 'assistant' | ||
| assert choice.message.content is not None | ||
| assert (choice.message.tool_calls is None or len(choice.message.tool_calls) == 0) | ||
| except Exception: | ||
| # Some backends reject an empty tools list | ||
| pass |
There was a problem hiding this comment.
test_empty_tools_list catches a broad Exception and then passes, which will make the test succeed even on unexpected failures (e.g. server down, regression, assertion failures inside the try block). If the goal is to tolerate only backends that reject tools=[], catch the specific error type/status (e.g. OpenAI BadRequestError) and pytest.skip, otherwise re-raise so real failures aren’t hidden.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist