Feat: structured session options and extensions array (#106) by kibae · Pull Request #116 · kibae/onnxruntime-server

kibae · 2026-05-10T14:03:32Z

Summary

Closes #106.

The asymmetry reported in #106 — ortextensions_path was reachable through the HTTP request body but not through ONNX_SERVER_PREPARE_MODEL because the env-var grammar only forwarded cuda — turned out to be the tip of a larger contract problem. The same option grammar that could not carry ortextensions_path also could not carry any other useful ORT session knob, and the response echo silently asserted that options had been applied even when ORT had ignored them. This PR fixes #106 by restructuring the option surface so the same set of keys is reachable from both the env-var path and the HTTP path, and so response echoes only contain values that ORT actually applied.

Option grammar (env var / `--prepare-model`)

Dotted keys map to nested objects: cuda.device_id, session_options.intra_op_num_threads.
Repeating extensions accumulates a deduplicated array.
Legacy ortextensions_path is normalized into the extensions array.
Malformed option entries are skipped silently rather than failing the whole list.
Existing inputs (cuda=true, cuda=1) keep working unchanged.

ONNX_SERVER_PREPARE_MODEL="bert:v1(cuda.device_id=0, cuda.gpu_mem_limit=2147483648, session_options.intra_op_num_threads=4, extensions=/usr/local/lib/libortextensions.so)"

JSON option surface (HTTP request body)

New extensions array; legacy ortextensions_path is still accepted as an alias and normalized into the array on input and on echo.
New session_options group covering ORT SessionOptions: intra/inter_op_num_threads, execution_mode, graph_optimization_level, enable_cpu_mem_arena, enable_mem_pattern, log_severity_level, logid, enable_profiling + profile_file_prefix, optimized_model_filepath, free_dimension_overrides, config_entries.
Extended cuda object covering CUDA EP V2 keys: device_id, gpu_mem_limit, arena_extend_strategy, cudnn_conv_algo_search, cudnn_conv_use_max_workspace, do_copy_in_default_stream, enable_cuda_graph, tunable_op_*, cudnn_conv1d_pad_to_nc1d, plus forward-compatible passthrough of any additional ORT-known key.
Backward-compatible shortcuts kept: cuda: true | false | <device_id>.

Echo accuracy

The most important behavioural change is that the response echo now reflects what ORT actually stored, not what the caller sent.

Server only validates JSON shape and our enum-string mapping. The value itself is delegated to ORT. Keys whose ORT setter throws are silently dropped from the echo.
config_entries are round-tripped through GetSessionConfigEntry (string values; true/42 come back as "1"/"42").
CUDA EP options are forwarded to ORT in a single batched UpdateCUDAProviderOptions call. ORT V2 silently resets sibling keys when called per-key (e.g. updating arena_extend_strategy reverts gpu_mem_limit to its default), so a single batched call is the only safe way to apply multiple keys; any rejected key aborts the whole batch with the ORT error message identifying the offending key. The echo is built from GetCUDAProviderOptionsAsString readback, filtered to caller-supplied keys plus device_id.

Safety

Every throw path in the CUDA option code releases its provider-options handle and ORT status; the readback section is additionally wrapped in try/catch as a guard against std::bad_alloc during string/json construction.
All existing throw call sites (main.cpp prepare_models, HTTP handler, TCP handler, session_manager::create_session) continue to catch and convert option errors into the existing error response paths — option processing cannot crash the process.
Session::collect_extensions is exposed as a static method so the normalization is unit-testable without touching the file system or registering a real custom-ops library.

Test plan

cmake -DCMAKE_BUILD_TYPE=Debug build clean.
ctest 8/8 pass on a CUDA-enabled host (CPU and CUDA suites).
New parse cases: dotted notation, repeated extensions, value type inference, unknown-key passthrough, lenient skipping of malformed entries, mixed legacy ortextensions_path + new extensions with dedupe.
New session-level cases: SessionWithSessionOptions, SessionOptionsConfigEntriesReadback, SessionOptionsFreeDimensionOverrides, SessionOptionsIgnoresInvalidEntries, CollectExtensionsNormalization (12 cases), ExtensionsRegistrationFailsLoudly.
New CUDA cases: CudaObjectOptionsEcho round-trips every supplied key; CudaObjectRejectsUnknownKey confirms session creation aborts with a clear error on unknown CUDA keys; CudaScalarShortcutStillWorks pins the legacy boolean shorthand.
Manual smoke test: existing --prepare-model="model:v1(cuda=true)" invocations continue to work (backward compatibility).

Docs

README: prepare-model description refreshed with the new grammar and examples; new sections covering session_options and CUDA EP options; Extensions Support section now leads with the extensions array and notes ortextensions_path as a backward-compatible alias.
OpenAPI: ONNXSessionOption / ONNXSessionOptionRequest restructured; new ONNXSessionOptionsGroup schema; ONNXSessionOptionCUDA enumerates V2 keys with additionalProperties for forward compatibility; ortextensions_path marked deprecated: true.

Address the asymmetry reported in #106 between the HTTP request body and ONNX_SERVER_PREPARE_MODEL: only `cuda` was forwarded from the env- var grammar, so even though `ortextensions_path` was supported in the JSON option path, it was unreachable from the prepare-model path. Restructure the option surface so the same set of keys is reachable from both entry points, and so response echoes reflect what ORT actually applied — not what the caller sent. Option grammar (env var / --prepare-model) - Dotted keys map to nested objects (e.g. cuda.device_id, session_options.intra_op_num_threads). - Repeating `extensions` accumulates a deduplicated array; legacy `ortextensions_path` normalizes into the same array. - Malformed option entries are skipped silently rather than failing the whole list. JSON option surface (HTTP request body) - New `extensions` array; legacy `ortextensions_path` is accepted as an alias and normalized into the array on input and on echo. - New `session_options` group: intra/inter_op_num_threads, execution_mode, graph_optimization_level, enable_cpu_mem_arena, enable_mem_pattern, log_severity_level, logid, enable_profiling, profile_file_prefix, optimized_model_filepath, free_dimension_overrides, config_entries. - Extended `cuda` object covering CUDA EP V2 keys (device_id, gpu_mem_limit, arena_extend_strategy, cudnn_conv_algo_search, cudnn_conv_use_max_workspace, do_copy_in_default_stream, enable_cuda_graph, tunable_op_*, cudnn_conv1d_pad_to_nc1d, plus forward-compatible passthrough of additional ORT-known keys). - Backward-compatible shortcuts kept: `cuda: true|false|<device_id>`. Echo accuracy - Server only validates JSON shape and our enum-string mapping; the value itself is delegated to ORT. Keys whose ORT setter throws are silently dropped from the echo so the response never claims an option was applied when it was not. - `config_entries` are round-tripped through GetSessionConfigEntry so the echo reflects what ORT actually stored (always strings). - CUDA EP options are forwarded in a single batched UpdateCUDAProviderOptions call. ORT V2 silently resets sibling keys when called per-key (e.g. updating arena_extend_strategy reverts gpu_mem_limit to its default), so a single batched call is the only safe way to apply multiple keys; any rejected key aborts the whole batch with the ORT error message identifying the offending key. The echo is built from GetCUDAProviderOptionsAsString readback, filtered to caller-supplied keys plus device_id. Safety - Every throw path releases its CUDA provider options handle and ORT status; the readback section is additionally wrapped in try/catch as a guard against std::bad_alloc during string/json construction. - Session::collect_extensions is exposed as a static method so the normalization can be unit-tested without touching the file system or registering a real custom-ops library. Tests - Parse: dotted notation, repeated `extensions`, value type inference, unknown-key passthrough, lenient skipping of malformed entries, mixed legacy `ortextensions_path` + new `extensions` with dedupe, all on top of the existing parse fixtures. - Session: SessionWithSessionOptions, SessionOptionsConfigEntries Readback, SessionOptionsFreeDimensionOverrides, SessionOptionsIgnoresInvalidEntries, CollectExtensionsNormalization (12 cases over input shapes, dedupe, legacy normalization, garbage entries, non-object input), ExtensionsRegistrationFailsLoudly. - CUDA: CudaObjectOptionsEcho asserts round-trip of every supplied key; CudaObjectRejectsUnknownKey asserts session construction aborts with a clear error when ORT rejects an unknown key; CudaScalarShortcutStillWorks pins the legacy boolean shorthand. Docs - README: prepare-model description refreshed with the new grammar and examples; new sections covering session_options and CUDA EP options; Extensions Support section now leads with the `extensions` array and notes `ortextensions_path` as a backward-compatible alias. - OpenAPI: ONNXSessionOption / ONNXSessionOptionRequest restructured; new ONNXSessionOptionsGroup schema; ONNXSessionOptionCUDA enumerates V2 keys with additionalProperties for forward compatibility; ortextensions_path marked deprecated. Closes #106.

kibae mentioned this pull request May 10, 2026

Add support for ortextensions_path in ONNX_SERVER_PREPARE_MODEL #106

Closed

kibae merged commit 9dc8618 into main May 10, 2026
6 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: structured session options and extensions array (#106)#116

Feat: structured session options and extensions array (#106)#116
kibae merged 1 commit into
mainfrom
feat/session-options-restructure

kibae commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kibae commented May 10, 2026

Summary

Option grammar (env var / --prepare-model)

JSON option surface (HTTP request body)

Echo accuracy

Safety

Test plan

Docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Option grammar (env var / `--prepare-model`)