Python: Normalize OpenAI tool-call argument envelopes on parse#4741
Python: Normalize OpenAI tool-call argument envelopes on parse#4741davidahmann wants to merge 1 commit intomicrosoft:mainfrom
Conversation
|
@davidahmann please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
|
This removes an OpenAI-only envelope mismatch where tool-call arguments arrived as raw JSON strings instead of the mapping shape other backends already emit. The change stays on the parse boundary: OpenAI chat and responses payloads now normalize JSON-object arguments immediately, and the targeted tests were updated to lock that in. Validation:
Inspired by research context: CAISI publishes independent, reproducible AI agent governance research: https://caisi.dev |
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 88%
✓ Correctness
The PR introduces
normalize_function_call_argumentsto eagerly parse JSON-string arguments into dicts at the provider-parsing layer. The helper function itself is correct and well-guarded. The non-streaming paths are properly updated. However, in the chat client,_parse_tool_calls_from_openaiis shared by both non-streaming (Choice) and streaming (ChunkChoice) code paths. In streaming, partial JSON argument fragments will fail to parse and remain as strings (correct), but this creates a type inconsistency: non-streaming responses producedictarguments while streaming accumulation producesstrarguments via_add_function_call_content. Additionally, in a rare edge case where a single streaming chunk contains complete valid JSON, normalization would produce adict, and a subsequent string fragment chunk would triggerTypeError('Incompatible argument types')in_add_function_call_content. No blocking bugs are present in practice since OpenAI streams arguments incrementally, but the inconsistency warrants consideration.
✗ Security Reliability
The new
normalize_function_call_argumentsfunction safely parses JSON-string arguments into dicts when receiving tool calls from the OpenAI API. However, the responses client's_prepare_content_for_openaimethod at line 1166 passescontent.argumentsdirectly back to the API without re-serializing to a JSON string. Sinceargumentscan now be adictafter normalization, this will break the round-trip to the OpenAI Responses API, which expectsargumentsas a JSON string. The chat client already handles this correctly (line 640) with ajson.dumpsguard. The same issue affects line 1202 formcp_approval_requestpayloads. Additionally, onefunction_callparsing site (line 1478) is inconsistently left un-normalized.
✗ Test Coverage
The diff introduces
normalize_function_call_argumentsin_types.pyto parse JSON-string tool-call arguments into dicts. The function is used in 4 call sites (1 in chat client, 3 in responses client) but has zero unit tests of its own. The integration tests only cover the happy-path JSON-object-string case. There are no tests for: non-JSON strings,Noneinput, already-dict/Mapping input, JSON arrays (should pass through as string), orjson.JSONDecodeErrorfallback. The streaming code path in_parse_chunk_from_openai(line 1918 in responses client) also lacks test coverage for the normalization. Additionally, themcp_callbranches passitem.argumentswithout normalization — this may be intentional but is inconsistent with the other tool-call types.
✗ Design Approach
The PR introduces
normalize_function_call_argumentsto eagerly parse JSON-string arguments into dicts at parse time. While the intent is reasonable, there are three significant design problems: (1) normalization is applied to only a subset offrom_function_callcall sites, leaving others inconsistent; (2) the responses client serialization path at line 1166 passescontent.argumentsdirectly to the API payload without thejson.dumpsguard that the chat client already has, so a round-tripped function call will send a raw dict where OpenAI expects a JSON string; (3) the normalization logic belongs insideContent.from_function_call()(orContent.__init__) rather than scattered at individual call sites—there is already aparse_arguments()method onContentthat duplicates this logic on-demand, making the new eagerly-parsed path a parallel but incomplete alternative.
Flagged Issues
- Serialization regression in
_responses_client._prepare_content_for_openai(line 1166):content.argumentsis passed directly into the API payload, but after normalization the value may be adict. The OpenAI Responses API expects a JSON string. The chat client already guards against this at line 640 withjson.dumps(content.arguments) if isinstance(content.arguments, Mapping) else content.arguments, but the responses client does not. Multi-turn conversations that serialize a previously parsed function call back to the provider will send malformed payloads. - _responses_client.py line 1202: Same serialization issue —
content.function_call.arguments(which may now be a dict after normalization) is passed directly in themcp_approval_requestpayload without re-serialization to a JSON string. -
normalize_function_call_argumentshas no direct unit tests. It is a pure utility with multiple branches (str→dict, str→passthrough for non-dict JSON, Mapping→dict, None→None, JSONDecodeError→passthrough) that are not exercised by any test. - Normalization is applied to only a subset of call sites in
_responses_client.py. Thecase "function_call"path (line 1478),mcp_call(line 1503), and untouched streaming paths (lines 1867, 1880, 2080) remain unnormalized, socontent.argumentstype will bestrordictdepending on which code path produced the Content. - The normalization logic is placed at individual call sites rather than inside
Content.from_function_call(). Every future provider adapter must independently remember to apply it. Centralizing inContent.from_function_call()(orContent.__init__) would fix all existing and future call sites at once, and the existingContent.parse_arguments()method already performs the same on-demand JSON parsing, making the new helper a parallel but incomplete alternative.
Suggestions
- In
_chat_client.py,_parse_tool_calls_from_openaiis shared by both non-streaming (Choice) and streaming (ChunkChoice) paths. Normalizing here means streaming accumulation producesstrarguments while non-streaming results becomedict. In a rare edge case, if a streaming chunk contains complete valid JSON, normalization converts it to adict, and a subsequent string-fragment chunk triggersTypeError('Incompatible argument types')in_add_function_call_content. Consider gating normalization on the non-streaming path or normalizing after accumulation. - The
function_callcase at line 1478 andmcp_callbranches at lines 1503 and 1880 in_responses_client.pyare not wrapped withnormalize_function_call_arguments, unlike the other call sites. These should also normalize arguments for consistency. - Add parametrized unit tests for
normalize_function_call_argumentscovering: valid JSON object string, non-JSON string,None, already-a-dict, Mapping subclass, JSON array string, and malformed JSON. - Add a test for the streaming path (
_parse_chunk_from_openai) in the responses client to verify normalization applies to streamedmcp_approval_requestfunction calls. - If eager normalization is truly needed, consider unifying around a single implementation: normalize inside
Content.from_function_call(), remove the standalone helper, and reconcile with the existingContent.parse_arguments()method to avoid duplicated JSON-parsing logic.
Automated review by moonbox3's agents
| call_id=tool.id if tool.id else "", | ||
| name=tool.function.name if tool.function.name else "", | ||
| arguments=tool.function.arguments if tool.function.arguments else "", | ||
| arguments=normalize_function_call_arguments(tool.function.arguments if tool.function.arguments else ""), |
There was a problem hiding this comment.
This method is called for both non-streaming Choice and streaming ChunkChoice. Normalizing here means streaming chunks with partial JSON stay as strings (fine), but if a chunk contains complete valid JSON it becomes a dict, and a subsequent string chunk causes TypeError('Incompatible argument types') in _add_function_call_content. Consider gating normalization on the non-streaming path (isinstance(choice, Choice)) or applying it after accumulation.
| if not isinstance(arguments, str): | ||
| return dict(arguments) if isinstance(arguments, Mapping) else arguments | ||
|
|
||
| try: |
There was a problem hiding this comment.
The normalization belongs inside Content.from_function_call() rather than as a standalone helper. All current and future call sites automatically benefit, and the existing parse_arguments() method's JSON-parsing logic is no longer duplicated. Applying it only at selected call sites leads to a Content whose arguments type depends on which parsing path produced it.
|
|
||
| # region Internal Helper functions for unified Content | ||
|
|
||
|
|
There was a problem hiding this comment.
This utility function has five distinct code paths (None, non-str Mapping, non-str non-Mapping, valid JSON dict string, and non-dict-JSON/invalid-JSON string) but no direct unit tests. Add a parametrized test covering at least: None → None, '{"a":1}' → {"a":1}, 'not json' → 'not json', '[1,2]' → '[1,2]', {"a":1} (dict) → {"a":1}, and an OrderedDict → plain dict.
| assert contents[0].arguments == {"city": "Seattle"} | ||
|
|
||
|
|
||
| def test_usage_content_in_streaming_response( |
There was a problem hiding this comment.
This test only covers the JSON-object-string → dict path. Consider adding cases for non-JSON arguments (e.g., arguments = 'plain text' should pass through as-is) and empty string (arguments = '') to verify the fallback behavior.
| @@ -1037,7 +1037,7 @@ def test_response_content_creation_with_function_call() -> None: | |||
| function_call = response.messages[0].contents[0] | |||
There was a problem hiding this comment.
This is the only responses-client test covering function_call argument normalization in the non-streaming path. Consider adding a case where arguments is not valid JSON (e.g., partial/truncated) to confirm passthrough behavior.
Closes #4740
Problem
OpenAI chat and responses parsers return function-call arguments as raw JSON strings, while the other backends already normalize equivalent tool-call payloads to mappings. That backend-dependent envelope shape leaks into downstream handling.
What changed
Validation
uv run pytest packages/core/tests/openai/test_openai_chat_client.py::test_parse_tool_calls_from_openai_normalizes_json_object_arguments packages/core/tests/openai/test_openai_responses_client.py::test_response_content_creation_with_function_call packages/core/tests/openai/test_openai_responses_client.py::test_parse_response_from_openai_function_call_includes_status -m "not integration"