Skip to content

[SYCL][SYCLBIN] Implement kernel_bundle::ext_oneapi_get_content#22061

Open
koparasy wants to merge 21 commits into
intel:syclfrom
koparasy:sycl-bin/fix-CMPLRLLVM-67311
Open

[SYCL][SYCLBIN] Implement kernel_bundle::ext_oneapi_get_content#22061
koparasy wants to merge 21 commits into
intel:syclfrom
koparasy:sycl-bin/fix-CMPLRLLVM-67311

Conversation

@koparasy

Copy link
Copy Markdown
Contributor

Adds the runtime-side serializer for the sycl_ext_oneapi_syclbin extension. Users can now produce SYCLBIN bytes from any non-source kernel_bundle by calling ext_oneapi_get_content(), the inverse of the existing SYCLBIN load path.

Implementation

  • New SYCLBIN::SYCLBINDesc / AbstractModuleDesc / ImageDesc descriptor structs and SYCLBIN::write in the runtime SYCLBIN module. Mirrors the llvm/lib/Object/SYCLBIN.cpp writer but operates on in-memory image bytes rather than file paths, since kernel_bundles hold their device images in memory. Output is wrapped in an OffloadBinary v2 entry so it round-trips through the existing reader path.

  • SYCLBIN::serializeImages walks a kernel_bundle's live device images and emits one abstract module per image. Per-image property sets (notably [SYCL/device requirements] carrying compile_target) are forwarded verbatim into the abstract module metadata so device-compatibility matching survives the round-trip.

  • kernel_bundle_impl::ext_oneapi_get_content always re-serializes from the current MUniqueDeviceImages, so state-promoted bundles (compile/link/ build) emit the post-promotion images. ext_oneapi_source bundles are rejected at compile time via std::enable_if and at runtime as a belt-and-suspenders check.

  • Public API: kernel_bundle::ext_oneapi_get_content for any non-source state, plumbed through kernel_bundle_plain. New ABI symbol registered for both linux and windows symbol dumps.

  • Feature macro SYCL_EXT_ONEAPI_SYCLBIN added.

  • Helpers: Base64::encode (encode side missing in the runtime copy of Base64) and PropertySetRegistry::write (text serialization in the runtime copy of PropertySetIO).

Tests

  • 13 unit tests across sycl/unittests/SYCL2020/SYCLBINSerialize{,JIT,Multi} cover: compile_target round-trip, all three non-source bundle states, image payload round-trip, magic / version=1 headers, IR vs native image classification, multi-image bundles (one abstract module per image), empty bundles, mixed IR + native, spec constant property byte round-trip, device global property byte round-trip.

  • 2 E2E tests under sycl/test-e2e/SYCLBIN: a baseline round-trip (compile -> load .syclbin -> ext_oneapi_get_content -> reload from bytes -> run kernel) and a multi-target variant gated on opencl-aot + opencl-cpu-rt.

Spec: sycl/doc/extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc
Design: sycl/doc/design/SYCLBINDesign.md, sycl/doc/design/PropertySets.md

@koparasy koparasy marked this pull request as ready for review May 21, 2026 18:20
@koparasy koparasy requested a review from a team as a code owner May 21, 2026 18:20
@koparasy koparasy requested a review from sergey-semenov May 21, 2026 18:20

@iclsrc iclsrc left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implements kernel_bundle::ext_oneapi_get_content by serializing live device images into a SYCLBIN binary wrapped in an OffloadBinary envelope. The approach (per-image abstract modules, property-set forwarding, round-trip tests) looks sound. A few issues flagged below ranging from a UB risk in release-mode tests to a reserved C++ identifier.

Comment thread sycl/test-e2e/SYCLBIN/get_content_round_trip.cpp Outdated
Comment thread sycl/test-e2e/SYCLBIN/get_content_round_trip_multi_target.cpp
Comment on lines +557 to +558
/////////////////////////
template <

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Important: Reserved identifier _State

In C++, any name starting with an underscore followed by an uppercase letter is reserved in all scopes (ISO C++17 [lex.name]). _State falls into this category and is technically UB to use as a user-defined name. Rename to e.g. BundleState_ or CurState:

Suggested change
/////////////////////////
template <
template <
bundle_state BundleState_ = State,
typename = std::enable_if_t<BundleState_ != bundle_state::ext_oneapi_source>>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_State is the established convention in this header (23 occurrences across kernel_bundle template members). Renaming only this one site would create inconsistency. Happy to file a follow-up to rename the convention file-wide if desired, but suggest keeping this PR scoped to the SYCLBIN ticket.

static size_t getEncodedSize(size_t SrcSize) {
constexpr int ByteSizeInBits = 8;
constexpr int EncBitsPerChar = 6;
return (SrcSize * ByteSizeInBits + (EncBitsPerChar - 1)) / EncBitsPerChar;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Important: Integer overflow when SrcSize is large

SrcSize * ByteSizeInBits (i.e. SrcSize * 8) overflows size_t for any SrcSize > SIZE_MAX / 8. The formula can be rearranged to avoid the overflow:

Suggested change
return (SrcSize * ByteSizeInBits + (EncBitsPerChar - 1)) / EncBitsPerChar;
return (SrcSize / EncBitsPerChar) * ByteSizeInBits + ((SrcSize % EncBitsPerChar) * ByteSizeInBits + EncBitsPerChar - 1) / EncBitsPerChar;

Alternatively, since the ratio is fixed (8/6 = 4/3), the overflow-safe equivalent is (SrcSize * 4 + 2) / 3.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our copy mirrors LLVM Base64Impl::getEncodedSize line-for-line. Same overflow hazard upstream. I believe we should leave it as is. Or maybe modify the upstream part and use it. This is somewhat related to #22089

Comment thread sycl/unittests/SYCL2020/SYCLBINSerialize.cpp Outdated
Comment thread sycl/test-e2e/SYCLBIN/get_content_round_trip.cpp Outdated
Comment thread sycl/source/detail/syclbin.cpp Outdated
}
}

std::ostringstream OS{std::ios::binary};

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Suggestion: std::ios::binary has no effect on std::ostringstream

The std::ios::binary open-mode flag is meaningful only for file streams; on an ostringstream it is silently ignored by every standard library implementation. Passing it to the constructor (std::ostringstream OS{std::ios::binary}) is misleading — readers may assume it suppresses newline translation, but it does not. Remove the flag (or leave a comment explaining why it is present if there is a specific reason):

Suggested change
std::ostringstream OS{std::ios::binary};
std::ostringstream OS;

The same applies to the OB stream declared further down in the same function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8c599fd

Comment thread sycl/source/detail/property_set_io.hpp Outdated
Comment thread sycl/source/detail/base64.hpp
@koparasy koparasy force-pushed the sycl-bin/fix-CMPLRLLVM-67311 branch from a488c46 to 40a3430 Compare June 1, 2026 15:25
@koparasy koparasy requested review from 0x12CC and iclsrc June 1, 2026 15:56
@koparasy

koparasy commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@sergey-semenov, @intel/llvm-reviewers-runtime can you please review this?

Comment thread sycl/source/detail/syclbin.cpp Outdated
// Single entry pointing at the SYCLBIN payload following the entry array.
OffloadBinaryEntryType OBEntry{};
OBEntry.ImageKind = /*IMG_SYCLBIN*/ 7;
OBEntry.OffloadKind = /*OFK_SYCL*/ 5;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like SYCL offload kind is 8:

OFK_SYCL = (1 << 3),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2d1afb9

Comment on lines +1028 to +1029
if (const RTDeviceBinaryImage *Bin = DevImg.get_bin_image_ref())
Images.push_back(Bin);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to specification:
Returns: A vector of bytes containing the data of the kernel bundle in the SYCLBIN format for this implementation. The corresponding SYCLBIN format will be in State state.
So, an executable-state bundle should produce an executable-state SYCLBIN.

However, with the current implementation, this does not seem to always be the case.
For example, if the original device image is in SPIR-V format and is JIT-compiled:

auto KBInput = syclexp::get_kernel_bundle<bundle_state::input>(ctx, path);
auto KBExe   = sycl::build(KBInput);  // JIT-compiles SPIR-V -> native binary
auto bytes   = KBExe.ext_oneapi_get_content();

I believe ext_oneapi_get_content() currently serializes the original SPIR-V image rather than the compiled native binary.

Could you confirm the expected behavior here? Specifically, when a kernel_bundle is in the executable state after JIT-compiling SPIR-V device images, should ext_oneapi_get_content() return a SYCLBIN containing the executable/native form, or is it allowed to contain the original SPIR-V?

Could we also add a test that covers this case explicitly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Fixed in 2d1afb9

When the source image is IR (SPIR-V or LLVM-IR bitcode) and the bundle is in executable state with a built UR program, the runtime now extracts the per-device native binaries via urProgramGetInfo(UR_PROGRAM_INFO_BINARIES) and emits one NativeDeviceCodeImage per device (labeled with the device's arch from getArchName). For input/object states the IR payload is preserved as before. AOT-native images are unchanged.

@koparasy koparasy force-pushed the sycl-bin/fix-CMPLRLLVM-67311 branch from 2d1afb9 to 6d5f668 Compare June 8, 2026 22:16
@koparasy koparasy requested a review from againull June 9, 2026 14:06
@koparasy

koparasy commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@sergey-semenov @againull can you review this one.

Comment thread sycl/source/detail/syclbin.cpp Outdated
Comment on lines +401 to +404
// Forward all property sets from the source image into the abstract
// module metadata. This carries [SYCL/device requirements],
// [SYCL/specialization constants], etc., verbatim so that compatibility
// matching at re-load time uses the same predicates.

@againull againull Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider the following scenario with specialization constants:
Source image properties contain specialization constant with some default value.
In the program user sets some different specialization constant value on input state bundle.
Then gets the content - current implementation will serialize the default value from source image properties, not the one set by user, so user's value will be silently dropped:
Snippet:

  auto KB = sycl::get_kernel_bundle<bundle_state::input>(ctx, devs);
  KB.set_specialization_constant<MyConst>(12345);
  auto bytes = KB.ext_oneapi_get_content();   // silently drops 12345
  auto KB2 = syclexp::get_kernel_bundle<bundle_state::input>(ctx, devs, bytes);
  auto KBExe = sycl::build(KB2);              // builds with default value, not 12345

claude generated test:
https://gist.github.com/againull/150778c7296e60047b0341459646d1bd

Fails like this:

# .---command stdout------------
# | FAIL: round-tripped spec const reverted to default (got 42, expected 12345). The user-set override did not survive the SYCLBIN round-trip.
# `-----------------------------
# error: command failed with exit status: 1

This and other related scenarios either should be fixed or error should be reported accordingly or expected behavior documented.

ImageDesc &ID = AMD.NativeDeviceCodeImages.emplace_back();
ID.Bytes = Bytes;
PropertySetRegistry NDCIProps;
// arch is informational; the SYCLBIN reader does not consume it. Forward

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like device arch doesn't survive the second round trip, please see the test and details in comments here:
https://gist.github.com/againull/37123763d0f74d84e47c279af2a48641

Fail with the error:

/home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.first.syclbin  | /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-FIRST
# executed command: /home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.first.syclbin
# note: command had no output on stdout or stderr
# executed command: /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-FIRST
# note: command had no output on stdout or stderr
# RUN: at line 22
/home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.second.syclbin | /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-SECOND
# executed command: /home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.second.syclbin
# note: command had no output on stdout or stderr
# executed command: /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-SECOND
# .---command stderr------------
# | /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp:104:18: error: CHECK-SECOND: expected string not found in input
# | // CHECK-SECOND: arch:{{.*[a-zA-Z].*}}
# |                  ^
# | <stdin>:18:44: note: scanning from here
# |  SYCLBIN/native device code image metadata:
# |                                            ^
# | <stdin>:20:2: note: possible intended match here
# |  arch: 
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |              .
# |              .
# |              .
# |             13:  aspects:  
# |             14:  Number of IR Modules: 0 
# |             15:  Number of Native Device Code Images: 1 
# |             16:  Native device code image 0: 
# |             17:  Metadata: 
# |             18:  SYCLBIN/native device code image metadata: 
# | check:104'0                                                X error: no match found
# |             19:  target: �spir64-unknown 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             20:  arch:  
# | check:104'0     ~~~~~~~~
# | check:104'1      ?       possible intended match
# |             21:  Raw native device code image bytes: <Binary blob of 4360 bytes> 
# | check:104'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************
********************
Failed Tests (1):
  SYCL :: SYCLBIN/get_content_arch_round_trip.cpp


Testing Time: 3.97s

Total Discovered Tests: 1
  Failed: 1 (100.00%)

1 warning(s) in tests

}

std::vector<char>
SYCLBIN::serializeImages(const std::vector<ImageInput> &Inputs, uint8_t State) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, serializer doesn't properly emit [SYCL/kernel names] property set for bundle compiled normally with a lambda kernel (no -fsyclbin).

Reason: the SYCLBIN reader populates kernel IDs from the image's [SYCL/kernel names] property set. That property set is only emitted when invoked with -fsyclbin. For normal compiles (no -fsyclbin), those names are not part of the device image, the kernel names live in a separate static registration table so they are not getting serialized currently.

Test: https://gist.github.com/againull/b96b791a4bd339e4cefdb1c87b5f0871
Fails with error:

# .---command stdout------------
# | KB    has_kernel<K1>=1 kernel_ids=1
# | KBR   has_kernel<K1>=0 kernel_ids=0
# | FAIL: reloaded bundle dropped all kernel-id registration. ext_oneapi_get_content did not synthesize a [SYCL/kernel names] property set, so the reader has nothing to populate kernel ids from.
# `-----------------------------
# error: command failed with exit status: 1

@againull

againull commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Just wanted to note that these highlighted issues somewhat share a single root cause: the kernel bundler serializer derives output exclusively from RTDeviceBinaryImage::Raw.PropertySetsBegin/End and the original image bytes, ignoring the runtime overlays a kernel_bundle_impl carries on top. Anything the runtime added — user-set spec-constant overrides, live kernel-id registrations, etc.

@koparasy

koparasy commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Just wanted to note that these highlighted issues somewhat share a single root cause: the kernel bundler serializer derives output exclusively from RTDeviceBinaryImage::Raw.PropertySetsBegin/End and the original image bytes, ignoring the runtime overlays a kernel_bundle_impl carries on top. Anything the runtime added — user-set spec-constant overrides, live kernel-id registrations, etc.

Thanks. That was a deep design flaw in my side. I address this as follows in the next 3 commits:

SYCLBIN::serializeImages now composes per-abstract-module property
registries in two passes:

  1. Raw forward of the static RTDeviceBinaryImage's property sets (what we were doing before, minus the SYCLBIN-reserved sets).
  2. A runtime-overlay pass that consults a fixed table of category-specific overrides in OverrideTable. Each override edits th registry with information that lives on the runtime device_image_impl rather than (or in addition to) the static image.

Free-function kernel ids don't round-trip through the reader-side tryGetSYCLKernelID lookup (their DeviceKernelInfo carries no kernel_id today). Currently I am throwing an error. I will try to address this on a separate ticket.

@dm-vodopyanov

Copy link
Copy Markdown
Contributor

@koparasy one of the tests introduced in this PR fails on Linux. Also there are some failures on Windows, can you please take a look?
And is this PR ready for the next round of review from Artur?

@koparasy

Copy link
Copy Markdown
Contributor Author

@koparasy one of the tests introduced in this PR fails on Linux. Also there are some failures on Windows, can you please take a look? And is this PR ready for the next round of review from Artur?

Thank you for the reminder. No it is not ready for review, I need to see the failing tests. Will do so and ping accordingly.

koparasy added 7 commits June 17, 2026 17:33
…RLLVM-67311)

Adds the runtime-side serializer for the sycl_ext_oneapi_syclbin extension.
Users can now produce SYCLBIN bytes from any non-source kernel_bundle by
calling ext_oneapi_get_content(), the inverse of the existing SYCLBIN load
path.

Implementation
--------------
- New SYCLBIN::SYCLBINDesc / AbstractModuleDesc / ImageDesc descriptor
  structs and SYCLBIN::write in the runtime SYCLBIN module. Mirrors the
  llvm/lib/Object/SYCLBIN.cpp writer but operates on in-memory image bytes
  rather than file paths, since kernel_bundles hold their device images in
  memory. Output is wrapped in an OffloadBinary v2 entry so it round-trips
  through the existing reader path.

- SYCLBIN::serializeImages walks a kernel_bundle's live device images and
  emits one abstract module per image. Per-image property sets (notably
  [SYCL/device requirements] carrying compile_target) are forwarded
  verbatim into the abstract module metadata so device-compatibility
  matching survives the round-trip.

- kernel_bundle_impl::ext_oneapi_get_content always re-serializes from the
  current MUniqueDeviceImages, so state-promoted bundles (compile/link/
  build) emit the post-promotion images. ext_oneapi_source bundles are
  rejected at compile time via std::enable_if and at runtime as a
  belt-and-suspenders check.

- Public API: kernel_bundle<State>::ext_oneapi_get_content for any
  non-source state, plumbed through kernel_bundle_plain. New ABI symbol
  registered for both linux and windows symbol dumps.

- Feature macro SYCL_EXT_ONEAPI_SYCLBIN added.

- Helpers: Base64::encode (encode side missing in the runtime copy of
  Base64) and PropertySetRegistry::write (text serialization in the
  runtime copy of PropertySetIO).

Tests
-----
- 13 unit tests across sycl/unittests/SYCL2020/SYCLBINSerialize{,JIT,Multi}
  cover: compile_target round-trip, all three non-source bundle states,
  image payload round-trip, magic / version=1 headers, IR vs native image
  classification, multi-image bundles (one abstract module per image),
  empty bundles, mixed IR + native, spec constant property byte
  round-trip, device global property byte round-trip.

- 2 E2E tests under sycl/test-e2e/SYCLBIN: a baseline round-trip
  (compile -> load .syclbin -> ext_oneapi_get_content -> reload from
  bytes -> run kernel) and a multi-target variant gated on opencl-aot +
  opencl-cpu-rt.

Spec: sycl/doc/extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc
Design: sycl/doc/design/SYCLBINDesign.md, sycl/doc/design/PropertySets.md
when the round-tripped SYCLBIN contains both spir64 (JIT) and
spir64_x86_64 (AOT) images. GPU coverage is sufficient for this test.
functions; per-member __SYCL_EXPORT is redundant on Linux and rejected by
MSVC under MSVC error C2487 ("member of dll interface class may not be
declared with dll interface"). Matches all other member declarations in
kernel_bundle_plain.
which SYCL conventionally enforces via SFINAE at compile time, not via
runtime exception. The SFINAE guard already lives on
kernel_bundle<State>::ext_oneapi_get_content. The kernel_bundle_impl-side
runtime throw was unauthorized by spec; replace it with an assert that
backstops the SFINAE for internal callers.
koparasy added 8 commits June 17, 2026 17:33
Address reviewer feedback that the SYCLBIN serializer derived its output
exclusively from the static RTDeviceBinaryImage and missed runtime overlays
that kernel_bundle_impl carries on top.

 New SYCLBIN::ImageInput::DevImg pointer plus an always-populated
  Devices list. The serializer composes per-abstract-module property
  registries in two passes: a raw forward of the static image's property
  sets, then a runtime-overlay pass that consults DevImg for categories
  whose authoritative source is the runtime view.
@koparasy koparasy force-pushed the sycl-bin/fix-CMPLRLLVM-67311 branch from b7cf3b5 to 383c5d1 Compare June 18, 2026 14:11
@koparasy

Copy link
Copy Markdown
Contributor Author

@againull, I dug into the spec-constant round-trip failure from your test and want your guidance before I pick a fix.

What works

The serializer side is correct now: a user-set spec-constant value reaches ext_oneapi_get_content and is written into the SYCL/specialization constants default values blob. (There was a layout bug I fixed, it must be a single "all" property, not one-per-name, to match what the reader expects in get SpecConstsDefValBlob). I verified via tracing that the reloaded image's default-value blob holds the user value (12345), not the compile-time default (42).

Where it still breaks

The reloaded bundle's host-side read returns the compile-time default anyway. The public API short-circuits here kernel_bundle.hpp#L440-L442

SCType Res{SpecName.getDefaultValue()};          // 42, the C++ static default
if (!is_specialization_constant_set(SpecSymName))
  return Res;                                     // returns here
is_specialization_constant_set is image-backed by the per-descriptor IsSet flag:

device_image_impl.hpp#L504-L514

On a SYCLBIN-reloaded image IsSet is false . The reload constructor (kernel_bundle_impl.hpp#L777-L780) populates the default-value blob but never marks anything "set". So the public getter trusts the C++ getDefaultValue() (42) and never consults the image blob (the blob holds the "new" value of 12345), even though the blob is correct.

SYCLBIN has no concept of "this spec constant was user-set vs. left default". ext_oneapi_get_content necessarily folds a user-set value into the default-value blob (the only place to put it). On reload the value is therefore a "default", and get_specialization_constant_raw_value only reads the blob when IsSet is true. Result: the value is in the SYCLBIN and reloaded into memory, but the host query returns the static default.

Options (I see)

  1. Mark all reloaded SYCLBIN spec consts as "set" (~15 LOC, localized to the reload ctor). Simple, but semantically claims everything is user-set, and flipping IsSet may trigger redundant UR spec-const application on an already-native executable image.

  2. Make the public getter fall back to the image default-value blob when unset (~30-50 LOC). Cleanest semantically, but changes get_specialization_constant behavior. Offline + KernelCompiler bundles currently rely on getDefaultValue() (C++ static) for unset consts. Changing to read the image default-value blob instead would alter their behavior. Even though for those the image default should equal the C++ default. If they ever diverge, regression.

  3. Emit an optional SYCL/specialization constants set values property recording which consts were user-set; reload restores IsSet from it. Backward-compatible by presence-check (isAvailable()): old readers ignore an unknown set, new readers skip when absent, no version bump. ~55 LOC + format addition (new property-set name constant in both the LLVM and runtime PropertySetIO copies) + update documentation.

My lean is option 3. it preserves the default-vs-set distinction and round-trips losslessly, and the optional-property detection keeps it backward compatible both directions. But it adds a SYCLBIN metadata property, so I want to confirm that's acceptable before implementing.

Questions:

What do you think? Is there maybe some other better approach? Or do you agree on 3?

@againull

againull commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@againull, I dug into the spec-constant round-trip failure from your test and want your guidance before I pick a fix.

What works

The serializer side is correct now: a user-set spec-constant value reaches ext_oneapi_get_content and is written into the SYCL/specialization constants default values blob. (There was a layout bug I fixed, it must be a single "all" property, not one-per-name, to match what the reader expects in get SpecConstsDefValBlob). I verified via tracing that the reloaded image's default-value blob holds the user value (12345), not the compile-time default (42).

Where it still breaks

The reloaded bundle's host-side read returns the compile-time default anyway. The public API short-circuits here kernel_bundle.hpp#L440-L442

SCType Res{SpecName.getDefaultValue()};          // 42, the C++ static default
if (!is_specialization_constant_set(SpecSymName))
  return Res;                                     // returns here
is_specialization_constant_set is image-backed by the per-descriptor IsSet flag:

device_image_impl.hpp#L504-L514

On a SYCLBIN-reloaded image IsSet is false . The reload constructor (kernel_bundle_impl.hpp#L777-L780) populates the default-value blob but never marks anything "set". So the public getter trusts the C++ getDefaultValue() (42) and never consults the image blob (the blob holds the "new" value of 12345), even though the blob is correct.

SYCLBIN has no concept of "this spec constant was user-set vs. left default". ext_oneapi_get_content necessarily folds a user-set value into the default-value blob (the only place to put it). On reload the value is therefore a "default", and get_specialization_constant_raw_value only reads the blob when IsSet is true. Result: the value is in the SYCLBIN and reloaded into memory, but the host query returns the static default.

Options (I see)

  1. Mark all reloaded SYCLBIN spec consts as "set" (~15 LOC, localized to the reload ctor). Simple, but semantically claims everything is user-set, and flipping IsSet may trigger redundant UR spec-const application on an already-native executable image.
  2. Make the public getter fall back to the image default-value blob when unset (~30-50 LOC). Cleanest semantically, but changes get_specialization_constant behavior. Offline + KernelCompiler bundles currently rely on getDefaultValue() (C++ static) for unset consts. Changing to read the image default-value blob instead would alter their behavior. Even though for those the image default should equal the C++ default. If they ever diverge, regression.
  3. Emit an optional SYCL/specialization constants set values property recording which consts were user-set; reload restores IsSet from it. Backward-compatible by presence-check (isAvailable()): old readers ignore an unknown set, new readers skip when absent, no version bump. ~55 LOC + format addition (new property-set name constant in both the LLVM and runtime PropertySetIO copies) + update documentation.

My lean is option 3. it preserves the default-vs-set distinction and round-trips losslessly, and the optional-property detection keeps it backward compatible both directions. But it adds a SYCLBIN metadata property, so I want to confirm that's acceptable before implementing.

Questions:

What do you think? Is there maybe some other better approach? Or do you agree on 3?

In my opinion, correct way is #3, we should change the serializer to keep the original compile-time defaults in MSpecConstsDefValBlob and put user overrides in the separate property set only.

Tagging @intel/dpcpp-tools-reviewers in case if there are concerns.

…perty

ext_oneapi_get_content did not preserve a user-set specialization
constant: on reload the host getter returned the C++ static default
instead of the value set before serialization.

Root cause: SYCLBIN has no "user-set vs default" concept. A value set
via set_specialization_constant was folded into the default-value blob,
so on reload every descriptor started IsSet=false and
kernel_bundle::get_specialization_constant returned the static default
without consulting the blob.

Add an optional [SYCL/specialization constants set values] property set
(SYCL_SPEC_CONSTANTS_SET_VALUES) that the serializer emits only when at
least one descriptor is set. It carries the runtime-effective value blob
under the single "all" key, mirroring the default-values layout. On
reload, device_image_impl::updateSpecConstSymMap copies the blob into
MSpecConstsBlob and flips every descriptor's IsSet, so the host getter
returns the round-tripped value.

Backward compatible via presence-check (isAvailable()): compiler-produced
SYCLBIN never carries the set, old readers ignore an unknown set, and new
readers fall back to default-value behavior when it is absent. No SYCLBIN
version bump.
@koparasy koparasy requested a review from a team as a code owner June 22, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants