Skip to content

Feat: structured session options and extensions array (#106)#116

Merged
kibae merged 1 commit into
mainfrom
feat/session-options-restructure
May 10, 2026
Merged

Feat: structured session options and extensions array (#106)#116
kibae merged 1 commit into
mainfrom
feat/session-options-restructure

Conversation

@kibae
Copy link
Copy Markdown
Owner

@kibae kibae commented May 10, 2026

Summary

Closes #106.

The asymmetry reported in #106ortextensions_path was reachable through the HTTP request body but not through ONNX_SERVER_PREPARE_MODEL because the env-var grammar only forwarded cuda — turned out to be the tip of a larger contract problem. The same option grammar that could not carry ortextensions_path also could not carry any other useful ORT session knob, and the response echo silently asserted that options had been applied even when ORT had ignored them. This PR fixes #106 by restructuring the option surface so the same set of keys is reachable from both the env-var path and the HTTP path, and so response echoes only contain values that ORT actually applied.

Option grammar (env var / --prepare-model)

  • Dotted keys map to nested objects: cuda.device_id, session_options.intra_op_num_threads.
  • Repeating extensions accumulates a deduplicated array.
  • Legacy ortextensions_path is normalized into the extensions array.
  • Malformed option entries are skipped silently rather than failing the whole list.
  • Existing inputs (cuda=true, cuda=1) keep working unchanged.
ONNX_SERVER_PREPARE_MODEL="bert:v1(cuda.device_id=0, cuda.gpu_mem_limit=2147483648, session_options.intra_op_num_threads=4, extensions=/usr/local/lib/libortextensions.so)"

JSON option surface (HTTP request body)

  • New extensions array; legacy ortextensions_path is still accepted as an alias and normalized into the array on input and on echo.
  • New session_options group covering ORT SessionOptions: intra/inter_op_num_threads, execution_mode, graph_optimization_level, enable_cpu_mem_arena, enable_mem_pattern, log_severity_level, logid, enable_profiling + profile_file_prefix, optimized_model_filepath, free_dimension_overrides, config_entries.
  • Extended cuda object covering CUDA EP V2 keys: device_id, gpu_mem_limit, arena_extend_strategy, cudnn_conv_algo_search, cudnn_conv_use_max_workspace, do_copy_in_default_stream, enable_cuda_graph, tunable_op_*, cudnn_conv1d_pad_to_nc1d, plus forward-compatible passthrough of any additional ORT-known key.
  • Backward-compatible shortcuts kept: cuda: true | false | <device_id>.

Echo accuracy

The most important behavioural change is that the response echo now reflects what ORT actually stored, not what the caller sent.

  • Server only validates JSON shape and our enum-string mapping. The value itself is delegated to ORT. Keys whose ORT setter throws are silently dropped from the echo.
  • config_entries are round-tripped through GetSessionConfigEntry (string values; true/42 come back as "1"/"42").
  • CUDA EP options are forwarded to ORT in a single batched UpdateCUDAProviderOptions call. ORT V2 silently resets sibling keys when called per-key (e.g. updating arena_extend_strategy reverts gpu_mem_limit to its default), so a single batched call is the only safe way to apply multiple keys; any rejected key aborts the whole batch with the ORT error message identifying the offending key. The echo is built from GetCUDAProviderOptionsAsString readback, filtered to caller-supplied keys plus device_id.

Safety

  • Every throw path in the CUDA option code releases its provider-options handle and ORT status; the readback section is additionally wrapped in try/catch as a guard against std::bad_alloc during string/json construction.
  • All existing throw call sites (main.cpp prepare_models, HTTP handler, TCP handler, session_manager::create_session) continue to catch and convert option errors into the existing error response paths — option processing cannot crash the process.
  • Session::collect_extensions is exposed as a static method so the normalization is unit-testable without touching the file system or registering a real custom-ops library.

Test plan

  • cmake -DCMAKE_BUILD_TYPE=Debug build clean.
  • ctest 8/8 pass on a CUDA-enabled host (CPU and CUDA suites).
  • New parse cases: dotted notation, repeated extensions, value type inference, unknown-key passthrough, lenient skipping of malformed entries, mixed legacy ortextensions_path + new extensions with dedupe.
  • New session-level cases: SessionWithSessionOptions, SessionOptionsConfigEntriesReadback, SessionOptionsFreeDimensionOverrides, SessionOptionsIgnoresInvalidEntries, CollectExtensionsNormalization (12 cases), ExtensionsRegistrationFailsLoudly.
  • New CUDA cases: CudaObjectOptionsEcho round-trips every supplied key; CudaObjectRejectsUnknownKey confirms session creation aborts with a clear error on unknown CUDA keys; CudaScalarShortcutStillWorks pins the legacy boolean shorthand.
  • Manual smoke test: existing --prepare-model="model:v1(cuda=true)" invocations continue to work (backward compatibility).

Docs

  • README: prepare-model description refreshed with the new grammar and examples; new sections covering session_options and CUDA EP options; Extensions Support section now leads with the extensions array and notes ortextensions_path as a backward-compatible alias.
  • OpenAPI: ONNXSessionOption / ONNXSessionOptionRequest restructured; new ONNXSessionOptionsGroup schema; ONNXSessionOptionCUDA enumerates V2 keys with additionalProperties for forward compatibility; ortextensions_path marked deprecated: true.

Address the asymmetry reported in #106 between the HTTP request body
and ONNX_SERVER_PREPARE_MODEL: only `cuda` was forwarded from the env-
var grammar, so even though `ortextensions_path` was supported in the
JSON option path, it was unreachable from the prepare-model path.
Restructure the option surface so the same set of keys is reachable
from both entry points, and so response echoes reflect what ORT
actually applied — not what the caller sent.

Option grammar (env var / --prepare-model)
- Dotted keys map to nested objects (e.g. cuda.device_id,
  session_options.intra_op_num_threads).
- Repeating `extensions` accumulates a deduplicated array; legacy
  `ortextensions_path` normalizes into the same array.
- Malformed option entries are skipped silently rather than failing
  the whole list.

JSON option surface (HTTP request body)
- New `extensions` array; legacy `ortextensions_path` is accepted as
  an alias and normalized into the array on input and on echo.
- New `session_options` group: intra/inter_op_num_threads,
  execution_mode, graph_optimization_level, enable_cpu_mem_arena,
  enable_mem_pattern, log_severity_level, logid, enable_profiling,
  profile_file_prefix, optimized_model_filepath,
  free_dimension_overrides, config_entries.
- Extended `cuda` object covering CUDA EP V2 keys (device_id,
  gpu_mem_limit, arena_extend_strategy, cudnn_conv_algo_search,
  cudnn_conv_use_max_workspace, do_copy_in_default_stream,
  enable_cuda_graph, tunable_op_*, cudnn_conv1d_pad_to_nc1d, plus
  forward-compatible passthrough of additional ORT-known keys).
- Backward-compatible shortcuts kept: `cuda: true|false|<device_id>`.

Echo accuracy
- Server only validates JSON shape and our enum-string mapping; the
  value itself is delegated to ORT. Keys whose ORT setter throws are
  silently dropped from the echo so the response never claims an
  option was applied when it was not.
- `config_entries` are round-tripped through GetSessionConfigEntry so
  the echo reflects what ORT actually stored (always strings).
- CUDA EP options are forwarded in a single batched
  UpdateCUDAProviderOptions call. ORT V2 silently resets sibling keys
  when called per-key (e.g. updating arena_extend_strategy reverts
  gpu_mem_limit to its default), so a single batched call is the only
  safe way to apply multiple keys; any rejected key aborts the whole
  batch with the ORT error message identifying the offending key. The
  echo is built from GetCUDAProviderOptionsAsString readback, filtered
  to caller-supplied keys plus device_id.

Safety
- Every throw path releases its CUDA provider options handle and ORT
  status; the readback section is additionally wrapped in try/catch as
  a guard against std::bad_alloc during string/json construction.
- Session::collect_extensions is exposed as a static method so the
  normalization can be unit-tested without touching the file system or
  registering a real custom-ops library.

Tests
- Parse: dotted notation, repeated `extensions`, value type
  inference, unknown-key passthrough, lenient skipping of malformed
  entries, mixed legacy `ortextensions_path` + new `extensions` with
  dedupe, all on top of the existing parse fixtures.
- Session: SessionWithSessionOptions, SessionOptionsConfigEntries
  Readback, SessionOptionsFreeDimensionOverrides,
  SessionOptionsIgnoresInvalidEntries, CollectExtensionsNormalization
  (12 cases over input shapes, dedupe, legacy normalization, garbage
  entries, non-object input), ExtensionsRegistrationFailsLoudly.
- CUDA: CudaObjectOptionsEcho asserts round-trip of every supplied
  key; CudaObjectRejectsUnknownKey asserts session construction aborts
  with a clear error when ORT rejects an unknown key;
  CudaScalarShortcutStillWorks pins the legacy boolean shorthand.

Docs
- README: prepare-model description refreshed with the new grammar
  and examples; new sections covering session_options and CUDA EP
  options; Extensions Support section now leads with the `extensions`
  array and notes `ortextensions_path` as a backward-compatible alias.
- OpenAPI: ONNXSessionOption / ONNXSessionOptionRequest restructured;
  new ONNXSessionOptionsGroup schema; ONNXSessionOptionCUDA enumerates
  V2 keys with additionalProperties for forward compatibility;
  ortextensions_path marked deprecated.

Closes #106.
@kibae kibae merged commit 9dc8618 into main May 10, 2026
6 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for ortextensions_path in ONNX_SERVER_PREPARE_MODEL

1 participant