[SYCL][SYCLBIN] Implement kernel_bundle::ext_oneapi_get_content#22061
[SYCL][SYCLBIN] Implement kernel_bundle::ext_oneapi_get_content#22061koparasy wants to merge 21 commits into
Conversation
iclsrc
left a comment
There was a problem hiding this comment.
Implements kernel_bundle::ext_oneapi_get_content by serializing live device images into a SYCLBIN binary wrapped in an OffloadBinary envelope. The approach (per-image abstract modules, property-set forwarding, round-trip tests) looks sound. A few issues flagged below ranging from a UB risk in release-mode tests to a reserved C++ identifier.
| ///////////////////////// | ||
| template < |
There was a problem hiding this comment.
🟡 Important: Reserved identifier _State
In C++, any name starting with an underscore followed by an uppercase letter is reserved in all scopes (ISO C++17 [lex.name]). _State falls into this category and is technically UB to use as a user-defined name. Rename to e.g. BundleState_ or CurState:
| ///////////////////////// | |
| template < | |
| template < | |
| bundle_state BundleState_ = State, | |
| typename = std::enable_if_t<BundleState_ != bundle_state::ext_oneapi_source>> |
There was a problem hiding this comment.
_State is the established convention in this header (23 occurrences across kernel_bundle template members). Renaming only this one site would create inconsistency. Happy to file a follow-up to rename the convention file-wide if desired, but suggest keeping this PR scoped to the SYCLBIN ticket.
| static size_t getEncodedSize(size_t SrcSize) { | ||
| constexpr int ByteSizeInBits = 8; | ||
| constexpr int EncBitsPerChar = 6; | ||
| return (SrcSize * ByteSizeInBits + (EncBitsPerChar - 1)) / EncBitsPerChar; |
There was a problem hiding this comment.
🟡 Important: Integer overflow when SrcSize is large
SrcSize * ByteSizeInBits (i.e. SrcSize * 8) overflows size_t for any SrcSize > SIZE_MAX / 8. The formula can be rearranged to avoid the overflow:
| return (SrcSize * ByteSizeInBits + (EncBitsPerChar - 1)) / EncBitsPerChar; | |
| return (SrcSize / EncBitsPerChar) * ByteSizeInBits + ((SrcSize % EncBitsPerChar) * ByteSizeInBits + EncBitsPerChar - 1) / EncBitsPerChar; |
Alternatively, since the ratio is fixed (8/6 = 4/3), the overflow-safe equivalent is (SrcSize * 4 + 2) / 3.
There was a problem hiding this comment.
Our copy mirrors LLVM Base64Impl::getEncodedSize line-for-line. Same overflow hazard upstream. I believe we should leave it as is. Or maybe modify the upstream part and use it. This is somewhat related to #22089
| } | ||
| } | ||
|
|
||
| std::ostringstream OS{std::ios::binary}; |
There was a problem hiding this comment.
🔵 Suggestion: std::ios::binary has no effect on std::ostringstream
The std::ios::binary open-mode flag is meaningful only for file streams; on an ostringstream it is silently ignored by every standard library implementation. Passing it to the constructor (std::ostringstream OS{std::ios::binary}) is misleading — readers may assume it suppresses newline translation, but it does not. Remove the flag (or leave a comment explaining why it is present if there is a specific reason):
| std::ostringstream OS{std::ios::binary}; | |
| std::ostringstream OS; |
The same applies to the OB stream declared further down in the same function.
a488c46 to
40a3430
Compare
|
@sergey-semenov, @intel/llvm-reviewers-runtime can you please review this? |
| // Single entry pointing at the SYCLBIN payload following the entry array. | ||
| OffloadBinaryEntryType OBEntry{}; | ||
| OBEntry.ImageKind = /*IMG_SYCLBIN*/ 7; | ||
| OBEntry.OffloadKind = /*OFK_SYCL*/ 5; |
There was a problem hiding this comment.
It looks like SYCL offload kind is 8:
| if (const RTDeviceBinaryImage *Bin = DevImg.get_bin_image_ref()) | ||
| Images.push_back(Bin); |
There was a problem hiding this comment.
According to specification:
Returns: A vector of bytes containing the data of the kernel bundle in the SYCLBIN format for this implementation. The corresponding SYCLBIN format will be in State state.
So, an executable-state bundle should produce an executable-state SYCLBIN.
However, with the current implementation, this does not seem to always be the case.
For example, if the original device image is in SPIR-V format and is JIT-compiled:
auto KBInput = syclexp::get_kernel_bundle<bundle_state::input>(ctx, path);
auto KBExe = sycl::build(KBInput); // JIT-compiles SPIR-V -> native binary
auto bytes = KBExe.ext_oneapi_get_content();
I believe ext_oneapi_get_content() currently serializes the original SPIR-V image rather than the compiled native binary.
Could you confirm the expected behavior here? Specifically, when a kernel_bundle is in the executable state after JIT-compiling SPIR-V device images, should ext_oneapi_get_content() return a SYCLBIN containing the executable/native form, or is it allowed to contain the original SPIR-V?
Could we also add a test that covers this case explicitly?
There was a problem hiding this comment.
You are right.
Fixed in 2d1afb9
When the source image is IR (SPIR-V or LLVM-IR bitcode) and the bundle is in executable state with a built UR program, the runtime now extracts the per-device native binaries via urProgramGetInfo(UR_PROGRAM_INFO_BINARIES) and emits one NativeDeviceCodeImage per device (labeled with the device's arch from getArchName). For input/object states the IR payload is preserved as before. AOT-native images are unchanged.
2d1afb9 to
6d5f668
Compare
|
@sergey-semenov @againull can you review this one. |
| // Forward all property sets from the source image into the abstract | ||
| // module metadata. This carries [SYCL/device requirements], | ||
| // [SYCL/specialization constants], etc., verbatim so that compatibility | ||
| // matching at re-load time uses the same predicates. |
There was a problem hiding this comment.
Please consider the following scenario with specialization constants:
Source image properties contain specialization constant with some default value.
In the program user sets some different specialization constant value on input state bundle.
Then gets the content - current implementation will serialize the default value from source image properties, not the one set by user, so user's value will be silently dropped:
Snippet:
auto KB = sycl::get_kernel_bundle<bundle_state::input>(ctx, devs);
KB.set_specialization_constant<MyConst>(12345);
auto bytes = KB.ext_oneapi_get_content(); // silently drops 12345
auto KB2 = syclexp::get_kernel_bundle<bundle_state::input>(ctx, devs, bytes);
auto KBExe = sycl::build(KB2); // builds with default value, not 12345
claude generated test:
https://gist.github.com/againull/150778c7296e60047b0341459646d1bd
Fails like this:
# .---command stdout------------
# | FAIL: round-tripped spec const reverted to default (got 42, expected 12345). The user-set override did not survive the SYCLBIN round-trip.
# `-----------------------------
# error: command failed with exit status: 1
This and other related scenarios either should be fixed or error should be reported accordingly or expected behavior documented.
| ImageDesc &ID = AMD.NativeDeviceCodeImages.emplace_back(); | ||
| ID.Bytes = Bytes; | ||
| PropertySetRegistry NDCIProps; | ||
| // arch is informational; the SYCLBIN reader does not consume it. Forward |
There was a problem hiding this comment.
It looks like device arch doesn't survive the second round trip, please see the test and details in comments here:
https://gist.github.com/againull/37123763d0f74d84e47c279af2a48641
Fail with the error:
/home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.first.syclbin | /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-FIRST
# executed command: /home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.first.syclbin
# note: command had no output on stdout or stderr
# executed command: /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-FIRST
# note: command had no output on stdout or stderr
# RUN: at line 22
/home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.second.syclbin | /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-SECOND
# executed command: /home/sycl/build/bin/syclbin-dump /home/sycl/build/tools/sycl/test-e2e/SYCLBIN/Output/get_content_arch_round_trip.cpp.tmp.second.syclbin
# note: command had no output on stdout or stderr
# executed command: /home/sycl/build/bin/FileCheck /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp --check-prefix CHECK-SECOND
# .---command stderr------------
# | /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp:104:18: error: CHECK-SECOND: expected string not found in input
# | // CHECK-SECOND: arch:{{.*[a-zA-Z].*}}
# | ^
# | <stdin>:18:44: note: scanning from here
# | SYCLBIN/native device code image metadata:
# | ^
# | <stdin>:20:2: note: possible intended match here
# | arch:
# | ^
# |
# | Input file: <stdin>
# | Check file: /home/sycl/llvm/sycl/test-e2e/SYCLBIN/get_content_arch_round_trip.cpp
# |
# | -dump-input=help explains the following input dump.
# |
# | Input was:
# | <<<<<<
# | .
# | .
# | .
# | 13: aspects:
# | 14: Number of IR Modules: 0
# | 15: Number of Native Device Code Images: 1
# | 16: Native device code image 0:
# | 17: Metadata:
# | 18: SYCLBIN/native device code image metadata:
# | check:104'0 X error: no match found
# | 19: target: �spir64-unknown
# | check:104'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | 20: arch:
# | check:104'0 ~~~~~~~~
# | check:104'1 ? possible intended match
# | 21: Raw native device code image bytes: <Binary blob of 4360 bytes>
# | check:104'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1
--
********************
********************
Failed Tests (1):
SYCL :: SYCLBIN/get_content_arch_round_trip.cpp
Testing Time: 3.97s
Total Discovered Tests: 1
Failed: 1 (100.00%)
1 warning(s) in tests
| } | ||
|
|
||
| std::vector<char> | ||
| SYCLBIN::serializeImages(const std::vector<ImageInput> &Inputs, uint8_t State) { |
There was a problem hiding this comment.
Currently, serializer doesn't properly emit [SYCL/kernel names] property set for bundle compiled normally with a lambda kernel (no -fsyclbin).
Reason: the SYCLBIN reader populates kernel IDs from the image's [SYCL/kernel names] property set. That property set is only emitted when invoked with -fsyclbin. For normal compiles (no -fsyclbin), those names are not part of the device image, the kernel names live in a separate static registration table so they are not getting serialized currently.
Test: https://gist.github.com/againull/b96b791a4bd339e4cefdb1c87b5f0871
Fails with error:
# .---command stdout------------
# | KB has_kernel<K1>=1 kernel_ids=1
# | KBR has_kernel<K1>=0 kernel_ids=0
# | FAIL: reloaded bundle dropped all kernel-id registration. ext_oneapi_get_content did not synthesize a [SYCL/kernel names] property set, so the reader has nothing to populate kernel ids from.
# `-----------------------------
# error: command failed with exit status: 1
|
Just wanted to note that these highlighted issues somewhat share a single root cause: the kernel bundler serializer derives output exclusively from RTDeviceBinaryImage::Raw.PropertySetsBegin/End and the original image bytes, ignoring the runtime overlays a kernel_bundle_impl carries on top. Anything the runtime added — user-set spec-constant overrides, live kernel-id registrations, etc. |
Thanks. That was a deep design flaw in my side. I address this as follows in the next 3 commits:
Free-function kernel ids don't round-trip through the reader-side tryGetSYCLKernelID lookup (their DeviceKernelInfo carries no kernel_id today). Currently I am throwing an error. I will try to address this on a separate ticket. |
|
@koparasy one of the tests introduced in this PR fails on Linux. Also there are some failures on Windows, can you please take a look? |
Thank you for the reminder. No it is not ready for review, I need to see the failing tests. Will do so and ping accordingly. |
…RLLVM-67311)
Adds the runtime-side serializer for the sycl_ext_oneapi_syclbin extension.
Users can now produce SYCLBIN bytes from any non-source kernel_bundle by
calling ext_oneapi_get_content(), the inverse of the existing SYCLBIN load
path.
Implementation
--------------
- New SYCLBIN::SYCLBINDesc / AbstractModuleDesc / ImageDesc descriptor
structs and SYCLBIN::write in the runtime SYCLBIN module. Mirrors the
llvm/lib/Object/SYCLBIN.cpp writer but operates on in-memory image bytes
rather than file paths, since kernel_bundles hold their device images in
memory. Output is wrapped in an OffloadBinary v2 entry so it round-trips
through the existing reader path.
- SYCLBIN::serializeImages walks a kernel_bundle's live device images and
emits one abstract module per image. Per-image property sets (notably
[SYCL/device requirements] carrying compile_target) are forwarded
verbatim into the abstract module metadata so device-compatibility
matching survives the round-trip.
- kernel_bundle_impl::ext_oneapi_get_content always re-serializes from the
current MUniqueDeviceImages, so state-promoted bundles (compile/link/
build) emit the post-promotion images. ext_oneapi_source bundles are
rejected at compile time via std::enable_if and at runtime as a
belt-and-suspenders check.
- Public API: kernel_bundle<State>::ext_oneapi_get_content for any
non-source state, plumbed through kernel_bundle_plain. New ABI symbol
registered for both linux and windows symbol dumps.
- Feature macro SYCL_EXT_ONEAPI_SYCLBIN added.
- Helpers: Base64::encode (encode side missing in the runtime copy of
Base64) and PropertySetRegistry::write (text serialization in the
runtime copy of PropertySetIO).
Tests
-----
- 13 unit tests across sycl/unittests/SYCL2020/SYCLBINSerialize{,JIT,Multi}
cover: compile_target round-trip, all three non-source bundle states,
image payload round-trip, magic / version=1 headers, IR vs native image
classification, multi-image bundles (one abstract module per image),
empty bundles, mixed IR + native, spec constant property byte
round-trip, device global property byte round-trip.
- 2 E2E tests under sycl/test-e2e/SYCLBIN: a baseline round-trip
(compile -> load .syclbin -> ext_oneapi_get_content -> reload from
bytes -> run kernel) and a multi-target variant gated on opencl-aot +
opencl-cpu-rt.
Spec: sycl/doc/extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc
Design: sycl/doc/design/SYCLBINDesign.md, sycl/doc/design/PropertySets.md
when the round-tripped SYCLBIN contains both spir64 (JIT) and spir64_x86_64 (AOT) images. GPU coverage is sufficient for this test.
functions; per-member __SYCL_EXPORT is redundant on Linux and rejected by
MSVC under MSVC error C2487 ("member of dll interface class may not be
declared with dll interface"). Matches all other member declarations in
kernel_bundle_plain.
which SYCL conventionally enforces via SFINAE at compile time, not via runtime exception. The SFINAE guard already lives on kernel_bundle<State>::ext_oneapi_get_content. The kernel_bundle_impl-side runtime throw was unauthorized by spec; replace it with an assert that backstops the SFINAE for internal callers.
Address reviewer feedback that the SYCLBIN serializer derived its output exclusively from the static RTDeviceBinaryImage and missed runtime overlays that kernel_bundle_impl carries on top. New SYCLBIN::ImageInput::DevImg pointer plus an always-populated Devices list. The serializer composes per-abstract-module property registries in two passes: a raw forward of the static image's property sets, then a runtime-overlay pass that consults DevImg for categories whose authoritative source is the runtime view.
b7cf3b5 to
383c5d1
Compare
|
@againull, I dug into the spec-constant round-trip failure from your test and want your guidance before I pick a fix. What worksThe serializer side is correct now: a user-set spec-constant value reaches Where it still breaksThe reloaded bundle's host-side read returns the compile-time default anyway. The public API short-circuits here kernel_bundle.hpp#L440-L442 SCType Res{SpecName.getDefaultValue()}; // 42, the C++ static default
if (!is_specialization_constant_set(SpecSymName))
return Res; // returns here
is_specialization_constant_set is image-backed by the per-descriptor IsSet flag:device_image_impl.hpp#L504-L514 On a SYCLBIN-reloaded image SYCLBIN has no concept of "this spec constant was user-set vs. left default". Options (I see)
My lean is option 3. it preserves the default-vs-set distinction and round-trips losslessly, and the optional-property detection keeps it backward compatible both directions. But it adds a SYCLBIN metadata property, so I want to confirm that's acceptable before implementing. Questions:What do you think? Is there maybe some other better approach? Or do you agree on 3? |
In my opinion, correct way is #3, we should change the serializer to keep the original compile-time defaults in MSpecConstsDefValBlob and put user overrides in the separate property set only. Tagging @intel/dpcpp-tools-reviewers in case if there are concerns. |
…perty ext_oneapi_get_content did not preserve a user-set specialization constant: on reload the host getter returned the C++ static default instead of the value set before serialization. Root cause: SYCLBIN has no "user-set vs default" concept. A value set via set_specialization_constant was folded into the default-value blob, so on reload every descriptor started IsSet=false and kernel_bundle::get_specialization_constant returned the static default without consulting the blob. Add an optional [SYCL/specialization constants set values] property set (SYCL_SPEC_CONSTANTS_SET_VALUES) that the serializer emits only when at least one descriptor is set. It carries the runtime-effective value blob under the single "all" key, mirroring the default-values layout. On reload, device_image_impl::updateSpecConstSymMap copies the blob into MSpecConstsBlob and flips every descriptor's IsSet, so the host getter returns the round-tripped value. Backward compatible via presence-check (isAvailable()): compiler-produced SYCLBIN never carries the set, old readers ignore an unknown set, and new readers fall back to default-value behavior when it is absent. No SYCLBIN version bump.
Adds the runtime-side serializer for the sycl_ext_oneapi_syclbin extension. Users can now produce SYCLBIN bytes from any non-source kernel_bundle by calling ext_oneapi_get_content(), the inverse of the existing SYCLBIN load path.
Implementation
New SYCLBIN::SYCLBINDesc / AbstractModuleDesc / ImageDesc descriptor structs and SYCLBIN::write in the runtime SYCLBIN module. Mirrors the llvm/lib/Object/SYCLBIN.cpp writer but operates on in-memory image bytes rather than file paths, since kernel_bundles hold their device images in memory. Output is wrapped in an OffloadBinary v2 entry so it round-trips through the existing reader path.
SYCLBIN::serializeImages walks a kernel_bundle's live device images and emits one abstract module per image. Per-image property sets (notably [SYCL/device requirements] carrying compile_target) are forwarded verbatim into the abstract module metadata so device-compatibility matching survives the round-trip.
kernel_bundle_impl::ext_oneapi_get_content always re-serializes from the current MUniqueDeviceImages, so state-promoted bundles (compile/link/ build) emit the post-promotion images. ext_oneapi_source bundles are rejected at compile time via std::enable_if and at runtime as a belt-and-suspenders check.
Public API: kernel_bundle::ext_oneapi_get_content for any non-source state, plumbed through kernel_bundle_plain. New ABI symbol registered for both linux and windows symbol dumps.
Feature macro SYCL_EXT_ONEAPI_SYCLBIN added.
Helpers: Base64::encode (encode side missing in the runtime copy of Base64) and PropertySetRegistry::write (text serialization in the runtime copy of PropertySetIO).
Tests
13 unit tests across sycl/unittests/SYCL2020/SYCLBINSerialize{,JIT,Multi} cover: compile_target round-trip, all three non-source bundle states, image payload round-trip, magic / version=1 headers, IR vs native image classification, multi-image bundles (one abstract module per image), empty bundles, mixed IR + native, spec constant property byte round-trip, device global property byte round-trip.
2 E2E tests under sycl/test-e2e/SYCLBIN: a baseline round-trip (compile -> load .syclbin -> ext_oneapi_get_content -> reload from bytes -> run kernel) and a multi-target variant gated on opencl-aot + opencl-cpu-rt.
Spec: sycl/doc/extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc
Design: sycl/doc/design/SYCLBINDesign.md, sycl/doc/design/PropertySets.md