kibae · kibae · May 10, 2026 · May 10, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
 .idea/
 .claude/
 cmake-build-debug*/
+.claude/
+CLAUDE.md
 
 *.onnx
 

diff --git a/README.md b/README.md
@@ -150,7 +150,7 @@ sudo cmake --install build --prefix /usr/local/onnxruntime-server
 | `--workers`               | `ONNX_SERVER_WORKERS`               | Worker thread pool size.<br/>Default: `4`                                                                                                                                                                                                                                                                                                       |
 | `--request-payload-limit` | `ONNX_SERVER_REQUEST_PAYLOAD_LIMIT` | HTTP/HTTPS request payload size limit.<br />Default: 1024 * 1024 * 10(10MB)`                                                                                                                                                                                                                                                                    |
 | `--model-dir`             | `ONNX_SERVER_MODEL_DIR`             | Model directory path<br/>The onnx model files must be located in the following path:<br/>`${model_dir}/${model_name}/${model_version}/model.onnx` or<br/>`${model_dir}/${model_name}/${model_version}.onnx`<br/>Default: `models`                                                                                                               |
-| `--prepare-model`         | `ONNX_SERVER_PREPARE_MODEL`         | Pre-create some model sessions at server startup.<br/><br/>Format as a space-separated list of `model_name:model_version` or `model_name:model_version(session_options, ...)`.<br/><br/>Available session_options are<br/>- cuda=device_id`[ or true or false]`<br/><br/>eg) `model1:v1 model2:v9`<br/>`model1:v1(cuda=true) model2:v9(cuda=1)` |
+| `--prepare-model`         | `ONNX_SERVER_PREPARE_MODEL`         | Pre-create some model sessions at server startup.<br/><br/>Format as a space-separated list of `model_name:model_version` or `model_name:model_version(opt1=val1, opt2=val2, ...)`. Option keys may use dotted notation to address nested groups (e.g. `cuda.device_id`, `session_options.intra_op_num_threads`). Repeating the `extensions` key accumulates a deduplicated array. Option entries that do not match the grammar are skipped silently rather than failing the whole list.<br/><br/>Examples:<br/>- `model1:v1 model2:v9`<br/>- `model1:v1(cuda=true) model2:v9(cuda=1)`<br/>- `bert:v1(cuda.device_id=0, cuda.gpu_mem_limit=2147483648)`<br/>- `bert:v1(session_options.intra_op_num_threads=4, session_options.graph_optimization_level=all)`<br/>- `bert:v1(extensions=/usr/local/lib/libortextensions.so)` |
 
 ### Backend options
 
@@ -223,20 +223,92 @@ docker run --name onnxruntime_server_container -d --rm --gpus all \
 
 ## ONNXRuntime Extensions Support
 
-To use the [onnxruntime-extensions](https://github.com/microsoft/onnxruntime-extensions)(Custom Ops Library), set the
-options as follows when creating a session.
+To use the [onnxruntime-extensions](https://github.com/microsoft/onnxruntime-extensions) (Custom Ops Library), supply
+one or more library paths through the `extensions` array. The server registers each path with ORT in order and
+deduplicates entries.
 
 ```json
 {
   "model": "string",
   "version": "string",
   "option": {
     "cuda": ...,
-    "ortextensions_path": "/absolute/path/to/libonnxruntime_extensions.so"
+    "extensions": [
+      "/absolute/path/to/libonnxruntime_extensions.so"
+    ]
   }
 }
 ```
 
+The legacy `ortextensions_path` (single string) is still accepted for backward compatibility; it is normalized into the
+`extensions` array on the server side and the response always echoes the normalized form.
+
+## Session-level options
+
+The optional `session_options` object on a session-create request forwards the listed keys to the underlying
+onnxruntime `SessionOptions`. Only the JSON shape (types and our enum-string mapping) is validated on the server side;
+the actual value validation is delegated to ORT, and the response echoes only the values ORT accepted.
+
+```json
+{
+  "model": "string",
+  "version": "string",
+  "option": {
+    "session_options": {
+      "intra_op_num_threads": 4,
+      "inter_op_num_threads": 1,
+      "execution_mode": "sequential",
+      "graph_optimization_level": "all",
+      "enable_cpu_mem_arena": true,
+      "enable_mem_pattern": true,
+      "log_severity_level": 2,
+      "logid": "my-model",
+      "enable_profiling": false,
+      "profile_file_prefix": "/var/log/onnx/profile-",
+      "optimized_model_filepath": "/cache/optimized.onnx",
+      "free_dimension_overrides": { "batch": 1 },
+      "config_entries": {
+        "session.disable_prepacking": "1"
+      }
+    }
+  }
+}
+```
+
+`config_entries` is round-tripped through `GetSessionConfigEntry`, so the response shows what ORT actually stored
+(string values; `true`/`42` become `"1"`/`"42"`).
+
+## CUDA execution provider options
+
+When CUDA is enabled, the `cuda` field accepts either a boolean / integer (legacy shorthand) or an object that maps to
+[CUDA Execution Provider options](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html). The
+server forwards the object to ORT via `UpdateCUDAProviderOptions` in a single batched call (per-key calls trigger a
+sibling-reset quirk in ORT V2). If any key is rejected by ORT, session creation fails with the ORT error message
+identifying the offending key. The response is built from `GetCUDAProviderOptionsAsString` readback, so it reflects
+exactly what ORT stored.
+
+```json
+{
+  "model": "string",
+  "version": "string",
+  "option": {
+    "cuda": {
+      "device_id": 0,
+      "gpu_mem_limit": 2147483648,
+      "arena_extend_strategy": "kNextPowerOfTwo",
+      "cudnn_conv_algo_search": "EXHAUSTIVE",
+      "cudnn_conv_use_max_workspace": true,
+      "do_copy_in_default_stream": true,
+      "enable_cuda_graph": false
+    }
+  }
+}
+```
+
+Backward-compatible shortcuts:
+- `"cuda": true`  — enable CUDA with all defaults (`device_id=0`).
+- `"cuda": 1`     — enable CUDA on `device_id=1`.
+
 For more details on the session creation request, please refer to
 the [API documentation](https://kibae.github.io/onnxruntime-server/swagger/#/ONNX%20Runtime%20Session/createSession).
 

diff --git a/docs/swagger/openapi.yaml b/docs/swagger/openapi.yaml
@@ -269,15 +269,30 @@ components:
           $ref: '#/components/schemas/ONNXSessionOption'
     ONNXSessionOption:
       type: object
+      description: |
+        Normalized echo of the options applied to the session. The server only includes
+        keys whose corresponding ORT calls succeeded; values reflect what ORT actually
+        stored (read back via GetCUDAProviderOptionsAsString and GetSessionConfigEntry
+        where applicable).
       nullable: true
       properties:
         cuda:
           nullable: true
           required: false
           oneOf:
             - type: boolean
-              description: Use CUDA
+              description: CUDA disabled (false) — present for backward compatibility.
             - $ref: '#/components/schemas/ONNXSessionOptionCUDA'
+        extensions:
+          type: array
+          description: Registered onnxruntime-extensions library paths in registration order, deduplicated.
+          required: false
+          items:
+            type: string
+          example:
+            - /absolute/path/to/libonnxruntime_extensions.so
+        session_options:
+          $ref: '#/components/schemas/ONNXSessionOptionsGroup'
     ONNXSessionOptionRequest:
       type: object
       nullable: true
@@ -287,11 +302,13 @@ components:
           required: false
           oneOf:
             - type: boolean
-              description: Use CUDA
+              description: Enable CUDA with all defaults (device_id=0).
+            - type: integer
+              description: Enable CUDA on the given device_id.
             - $ref: '#/components/schemas/ONNXSessionOptionCUDA'
         input_shape:
           type: object
-          description: Input shape
+          description: Input shape overrides keyed by input name.
           nullable: false
           required: false
           example: {
@@ -301,25 +318,157 @@ components:
           }
         output_shape:
           type: object
-          description: Output shape
+          description: Output shape overrides keyed by output name.
           nullable: false
           required: false
           example: {
             "output": [ 1, 1 ]
           }
+        extensions:
+          type: array
+          description: |
+            One or more absolute paths to onnxruntime-extensions custom-ops libraries.
+            Each path is registered with ORT in array order; duplicate paths are deduplicated.
+          nullable: false
+          required: false
+          items:
+            type: string
+          example:
+            - /absolute/path/to/libonnxruntime_extensions.so
         ortextensions_path:
           type: string
-          description: To use the ONNXRuntime Extension (Custom Ops Library), you must provide the library path.
+          description: |
+            Deprecated alias for `extensions`. A single library path. The server normalizes
+            it into the `extensions` array on input and the response always echoes the
+            normalized form.
+          deprecated: true
           nullable: false
           required: false
-          example: /absolute/path/to/libonnxruntime_extensions
+          example: /absolute/path/to/libonnxruntime_extensions.so
+        session_options:
+          $ref: '#/components/schemas/ONNXSessionOptionsGroup'
     ONNXSessionOptionCUDA:
       type: object
+      description: |
+        CUDA Execution Provider V2 options. The server forwards every supplied key to
+        UpdateCUDAProviderOptions in a single batched call; if ORT rejects any key the
+        whole session creation fails with the ORT error message. The response is built
+        from GetCUDAProviderOptionsAsString readback, so it shows exactly what ORT
+        stored (which may differ from the requested value if ORT normalized it).
       properties:
         device_id:
           type: integer
           description: CUDA device ID
           nullable: false
+        gpu_mem_limit:
+          type: integer
+          description: Per-session GPU memory limit, in bytes.
+          nullable: false
+        arena_extend_strategy:
+          type: string
+          description: Arena extension strategy, e.g. "kNextPowerOfTwo" or "kSameAsRequested".
+          nullable: false
+        cudnn_conv_algo_search:
+          type: string
+          description: cuDNN convolution algorithm search policy. Accepted values are ORT-defined enum names.
+          nullable: false
+        cudnn_conv_use_max_workspace:
+          type: boolean
+          nullable: false
+        do_copy_in_default_stream:
+          type: boolean
+          nullable: false
+        enable_cuda_graph:
+          type: boolean
+          description: Capture and replay a CUDA graph (requires static input shapes).
+          nullable: false
+        tunable_op_enable:
+          type: boolean
+          nullable: false
+        tunable_op_tuning_enable:
+          type: boolean
+          nullable: false
+        cudnn_conv1d_pad_to_nc1d:
+          type: boolean
+          nullable: false
+      additionalProperties:
+        description: |
+          Any additional CUDA Execution Provider V2 key understood by your ORT build is
+          forwarded as-is. Refer to the ORT CUDA EP documentation for the full list of
+          accepted keys.
+    ONNXSessionOptionsGroup:
+      type: object
+      description: |
+        Session-level options forwarded to onnxruntime SessionOptions. The server only
+        validates JSON shape (types and our enum-string mapping); ORT decides whether the
+        value itself is acceptable. Keys whose ORT setter throws are silently dropped from
+        the echoed response. The `config_entries` object is round-tripped through
+        GetSessionConfigEntry so the echo shows what ORT actually stored (always strings).
+      nullable: false
+      required: false
+      properties:
+        intra_op_num_threads:
+          type: integer
+          description: Number of threads used for parallelizing operators. 0 means ORT default.
+          nullable: false
+        inter_op_num_threads:
+          type: integer
+          description: Number of threads used for parallelizing the graph. 0 means ORT default.
+          nullable: false
+        execution_mode:
+          type: string
+          enum: [sequential, parallel]
+          nullable: false
+        graph_optimization_level:
+          type: string
+          enum: [disable, basic, extended, all]
+          nullable: false
+        enable_cpu_mem_arena:
+          type: boolean
+          nullable: false
+        enable_mem_pattern:
+          type: boolean
+          nullable: false
+        log_severity_level:
+          type: integer
+          description: ORT log severity level (0=verbose ... 4=fatal).
+          nullable: false
+        logid:
+          type: string
+          nullable: false
+        enable_profiling:
+          type: boolean
+          description: Enable profiling. When true, profile_file_prefix must also be supplied.
+          nullable: false
+        profile_file_prefix:
+          type: string
+          nullable: false
+        optimized_model_filepath:
+          type: string
+          description: Filepath where ORT writes the optimized model after graph transformations.
+          nullable: false
+        free_dimension_overrides:
+          type: object
+          description: Map of free dimension name to a fixed integer size.
+          additionalProperties:
+            type: integer
+          nullable: false
+          example:
+            batch: 1
+        config_entries:
+          type: object
+          description: |
+            Generic passthrough to AddSessionConfigEntry (e.g. "session.disable_prepacking").
+            Booleans and integers are stringified before being passed to ORT; values in the
+            response are always strings (round-tripped through GetSessionConfigEntry).
+          additionalProperties:
+            oneOf:
+              - type: string
+              - type: boolean
+              - type: integer
+          nullable: false
+          example:
+            session.disable_prepacking: "1"
     ONNXSessionCreateRequest:
       type: object
       properties: