Feat: structured session options and extensions array (#106)#116
Merged
Conversation
Address the asymmetry reported in #106 between the HTTP request body and ONNX_SERVER_PREPARE_MODEL: only `cuda` was forwarded from the env- var grammar, so even though `ortextensions_path` was supported in the JSON option path, it was unreachable from the prepare-model path. Restructure the option surface so the same set of keys is reachable from both entry points, and so response echoes reflect what ORT actually applied — not what the caller sent. Option grammar (env var / --prepare-model) - Dotted keys map to nested objects (e.g. cuda.device_id, session_options.intra_op_num_threads). - Repeating `extensions` accumulates a deduplicated array; legacy `ortextensions_path` normalizes into the same array. - Malformed option entries are skipped silently rather than failing the whole list. JSON option surface (HTTP request body) - New `extensions` array; legacy `ortextensions_path` is accepted as an alias and normalized into the array on input and on echo. - New `session_options` group: intra/inter_op_num_threads, execution_mode, graph_optimization_level, enable_cpu_mem_arena, enable_mem_pattern, log_severity_level, logid, enable_profiling, profile_file_prefix, optimized_model_filepath, free_dimension_overrides, config_entries. - Extended `cuda` object covering CUDA EP V2 keys (device_id, gpu_mem_limit, arena_extend_strategy, cudnn_conv_algo_search, cudnn_conv_use_max_workspace, do_copy_in_default_stream, enable_cuda_graph, tunable_op_*, cudnn_conv1d_pad_to_nc1d, plus forward-compatible passthrough of additional ORT-known keys). - Backward-compatible shortcuts kept: `cuda: true|false|<device_id>`. Echo accuracy - Server only validates JSON shape and our enum-string mapping; the value itself is delegated to ORT. Keys whose ORT setter throws are silently dropped from the echo so the response never claims an option was applied when it was not. - `config_entries` are round-tripped through GetSessionConfigEntry so the echo reflects what ORT actually stored (always strings). - CUDA EP options are forwarded in a single batched UpdateCUDAProviderOptions call. ORT V2 silently resets sibling keys when called per-key (e.g. updating arena_extend_strategy reverts gpu_mem_limit to its default), so a single batched call is the only safe way to apply multiple keys; any rejected key aborts the whole batch with the ORT error message identifying the offending key. The echo is built from GetCUDAProviderOptionsAsString readback, filtered to caller-supplied keys plus device_id. Safety - Every throw path releases its CUDA provider options handle and ORT status; the readback section is additionally wrapped in try/catch as a guard against std::bad_alloc during string/json construction. - Session::collect_extensions is exposed as a static method so the normalization can be unit-tested without touching the file system or registering a real custom-ops library. Tests - Parse: dotted notation, repeated `extensions`, value type inference, unknown-key passthrough, lenient skipping of malformed entries, mixed legacy `ortextensions_path` + new `extensions` with dedupe, all on top of the existing parse fixtures. - Session: SessionWithSessionOptions, SessionOptionsConfigEntries Readback, SessionOptionsFreeDimensionOverrides, SessionOptionsIgnoresInvalidEntries, CollectExtensionsNormalization (12 cases over input shapes, dedupe, legacy normalization, garbage entries, non-object input), ExtensionsRegistrationFailsLoudly. - CUDA: CudaObjectOptionsEcho asserts round-trip of every supplied key; CudaObjectRejectsUnknownKey asserts session construction aborts with a clear error when ORT rejects an unknown key; CudaScalarShortcutStillWorks pins the legacy boolean shorthand. Docs - README: prepare-model description refreshed with the new grammar and examples; new sections covering session_options and CUDA EP options; Extensions Support section now leads with the `extensions` array and notes `ortextensions_path` as a backward-compatible alias. - OpenAPI: ONNXSessionOption / ONNXSessionOptionRequest restructured; new ONNXSessionOptionsGroup schema; ONNXSessionOptionCUDA enumerates V2 keys with additionalProperties for forward compatibility; ortextensions_path marked deprecated. Closes #106.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #106.
The asymmetry reported in #106 —
ortextensions_pathwas reachable through the HTTP request body but not throughONNX_SERVER_PREPARE_MODELbecause the env-var grammar only forwardedcuda— turned out to be the tip of a larger contract problem. The same option grammar that could not carryortextensions_pathalso could not carry any other useful ORT session knob, and the response echo silently asserted that options had been applied even when ORT had ignored them. This PR fixes #106 by restructuring the option surface so the same set of keys is reachable from both the env-var path and the HTTP path, and so response echoes only contain values that ORT actually applied.Option grammar (env var /
--prepare-model)cuda.device_id,session_options.intra_op_num_threads.extensionsaccumulates a deduplicated array.ortextensions_pathis normalized into theextensionsarray.cuda=true,cuda=1) keep working unchanged.JSON option surface (HTTP request body)
extensionsarray; legacyortextensions_pathis still accepted as an alias and normalized into the array on input and on echo.session_optionsgroup covering ORTSessionOptions:intra/inter_op_num_threads,execution_mode,graph_optimization_level,enable_cpu_mem_arena,enable_mem_pattern,log_severity_level,logid,enable_profiling+profile_file_prefix,optimized_model_filepath,free_dimension_overrides,config_entries.cudaobject covering CUDA EP V2 keys:device_id,gpu_mem_limit,arena_extend_strategy,cudnn_conv_algo_search,cudnn_conv_use_max_workspace,do_copy_in_default_stream,enable_cuda_graph,tunable_op_*,cudnn_conv1d_pad_to_nc1d, plus forward-compatible passthrough of any additional ORT-known key.cuda: true | false | <device_id>.Echo accuracy
The most important behavioural change is that the response echo now reflects what ORT actually stored, not what the caller sent.
config_entriesare round-tripped throughGetSessionConfigEntry(string values;true/42come back as"1"/"42").UpdateCUDAProviderOptionscall. ORT V2 silently resets sibling keys when called per-key (e.g. updatingarena_extend_strategyrevertsgpu_mem_limitto its default), so a single batched call is the only safe way to apply multiple keys; any rejected key aborts the whole batch with the ORT error message identifying the offending key. The echo is built fromGetCUDAProviderOptionsAsStringreadback, filtered to caller-supplied keys plusdevice_id.Safety
std::bad_allocduring string/json construction.main.cppprepare_models, HTTP handler, TCP handler,session_manager::create_session) continue to catch and convert option errors into the existing error response paths — option processing cannot crash the process.Session::collect_extensionsis exposed as a static method so the normalization is unit-testable without touching the file system or registering a real custom-ops library.Test plan
cmake -DCMAKE_BUILD_TYPE=Debugbuild clean.ctest8/8 pass on a CUDA-enabled host (CPU and CUDA suites).extensions, value type inference, unknown-key passthrough, lenient skipping of malformed entries, mixed legacyortextensions_path+ newextensionswith dedupe.SessionWithSessionOptions,SessionOptionsConfigEntriesReadback,SessionOptionsFreeDimensionOverrides,SessionOptionsIgnoresInvalidEntries,CollectExtensionsNormalization(12 cases),ExtensionsRegistrationFailsLoudly.CudaObjectOptionsEchoround-trips every supplied key;CudaObjectRejectsUnknownKeyconfirms session creation aborts with a clear error on unknown CUDA keys;CudaScalarShortcutStillWorkspins the legacy boolean shorthand.--prepare-model="model:v1(cuda=true)"invocations continue to work (backward compatibility).Docs
session_optionsand CUDA EP options; Extensions Support section now leads with theextensionsarray and notesortextensions_pathas a backward-compatible alias.ONNXSessionOption/ONNXSessionOptionRequestrestructured; newONNXSessionOptionsGroupschema;ONNXSessionOptionCUDAenumerates V2 keys withadditionalPropertiesfor forward compatibility;ortextensions_pathmarkeddeprecated: true.