You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
New features
SYCL Compiler
Added support for -fgpu-inline-threshold which allows controlling inline
threshold of the SYCL device code [5f7b607]
Added experimental support for CUDA backend on Windows [8aa3513]
Added support for experimental option -fsycl-max-parallel-link-jobs=<N>
which can be used specify how many processes the compiler can use for
linking the device code [c2221f0]
Added sycl::ext::intel::experimental::esimd::simd_mask as a replaced for sycl::ext::intel::experimental::esimd::mask_type_t to represent Gen
predicates [01351f1]
Allowed ::printf builtin for CUDA backend only [0c55d3a]
Implemented nextafter for sycl::half on CUDA backend [53c3268]
Added atomics with scopes and memory orders for CUDA backend
[2ebde5f] [00f43b3]
Added support for missing mathematical builtins in CUDA backend
[789ec8b] [f074774] [390e105]
Added diagnostic for non-forward declarable kernel name types [653bae9]
Added group_ballot intrinsic for CUDA backend [0680e5c]
Added support for device side assert for CUDA backend [5a87b8c]
Turned on -fsycl-dead-args-optimization by default [5983dfd]
Improved compilation time by removing free function queries calls detection
[e4791d1]
Reduced memory consumption of device code linking [6266820]
Improved UX of sycl::ext::oneapi::experimental::printf by allowing format
string to reside in a non-constant address space [2d62e51]
Improved barrier and sync instructions to use full mask when targeting NVPTX
[5ce99b8]
Added match for default SPIR device architecture with host architecture i.e. x86_64 matches spir64 and i686 matches spir [f4d01cd]
Set default device code split mode to off for FPGA [bea72e6]
Improved diagnostic for invalid SYCL kernel names
[455dce8] [df1ff7a]
Made Xsycl-target-frontend= to accept device tripple aliases [7fa0569]
Improved diagnostic messages for -fsycl-libspirv-path [c54c605]
Made implied default device to force emulation for FPGA builds [074944e]
Added support for sycl::ext::oneapi::sub_group::get_local_id for HIP
backend [7a9335d]
Added a diagnostic of indirect implicit capture of this for kernel lambda
[dce4c6a]
SYCL Library
Updated joint matrix queries to report if unsigned int variants of mad
matrix instruction are supported [dd7ebce]
Reduced overhead of device code assert implementation [b94f23a] [58ac74e]
Added a diagnostic on attempt to call sycl::get_kernel_id with an invalid
kernel [9dd1ea3]
Reduced overhead on kernel submission for CUDA backend [b79ae69]
Reduced overhead on kernel submission in backend independent part of runtime
[e292aa5]
Aligned Level-Zero Interoperability API with SYCL 2020 specification
[dd7f82c] [e662166]
Made sycl::half default constructor constexpr [d32a444]
Changed CUDA and HIP backends to report each device in a separate platform
[8dddb11]
Added initial support for SYCL2020 exceptions [15e0ab1]
Added SYCL 2020 sycl::target::device enumeration value [f710886]
Added a diagnostic on attempt to print std::byte using sycl::stream
[dd5e094]
Added possibility to specify ownership of ze_module_handle_t when creating
a sycl::kernel_bundle from it [e3c9c92]
Improve performance of sycl::nd_item::get_group_range() [0cd7b7e]
Deprecated sycl::target::global_buffer
Made device_num which can be passed to SYCL_DEVICE_FILTER unique
[7aa5be0]
Added a diagnostic on using mutually exclusive sycl::handler methods
[6f620a4]
Added support for std::byte to sycl::vec class [8fa04fe]
Added sycl::make_kernel interoperability support for Level-Zero backend
[98896fd]
Optimized work with events in the Level Zero backend [973aee9]
Added support for sycl::ext::oneapi::experimental::matrix::wi_slice and sycl::ext::oneapi::experimental::matrix::joint_matrix_fill
[97127eb] [cbad428]
Enabled code location information when NDEBUG is not defined in XPTI
notifications [e9f2d64] [9ca7cea]
Added a diagnostic on attempt to pass a command group function object to sycl::queue::single_task [2614d4d]
Enlarged the maximum batch size to 64 for Level Zero backend to improve
performance [596f693]
Reduced kernel submission overhead for CUDA backend [35729a7]
Improved translation of Level Zero error codes [6699a5d], [5d9a04b]
Added support for an arbitrary number of elements to sycl::ext::intel::experimental::esimd::simd::copy_from/to methods
[2bdc4c4]
Added HIP support to sycl::ext::oneapi::filter_selector
[7224cb2], [b7cee06]
Added support for batching copy commands for Level Zero backend [4c3e699]
Reduced sycl::queue::submit overhead by enabling post-enqueue execution
graph cleanup [6fd6098]
Added support for classes implicitly converted from sycl::item in sycl::handler::parallel_for parameter to align with the SYCL 2020
specification [34b93bf]
Removed direct initialization constructor from sycl::ext::intel::experimental::bfloat16 class [81154ec]
Added sycl::vec and sycl::marray support to sycl::known_identity type
trait [8fefb25]
Added minimal support for the generic space address space to match sycl::atomic_ref class definition in specification [e99f298]
Improved cache of command-lists in the context to be per-device for Level
Zero backend [ca457d9]
Extended group algorithms to support broadened types [3205368]
Added support for alignement flags in sycl::ext::intel::experimental::esimd::simd::copy_from/copy_to operations
[27f5c12]
Made sycl::ext::oneapi::atomic_ref available in sycl namespace
[2cdcbed]
Renamed cuda and hip enum values and namespaces to ext_oneapu_cuda and ext_oneapi_hip to align with SYCL 2020 specification [97f916e]
Improved performance of kernel submission process [535ad1e]
Eliminated build of unwanted kernels when creating one with make_kernel
[53ea8b9]
Removed duplicate devices on submission to kernel_bundle API functions
[c222497]
Deprecated sycl::aspects::int64_base_atomics and sycl::aspects::int64_extended_atomics [554b79c]
Made backend specific headers be included implicitly [bc8a00a]
Excluded current working directory from DLL search path when looking for
runtime dependencies [0a65cb4]
Enabled persistent device code cache for kernel bundles [810d67a]
Removed SYCL 1.2.1-style OpenCL interoperability API [bbafe08]
Added a diagnostic on attempt to pass incorrect local size for HIP backend
[c56499c]
Added a diagnostic on attempt to create an object of sycl::bufferwith
non-device copyable underlying date type [61f1ae6]
Added a diagnostic(exception) on attempt to pass an empty sycl::device
list to sycl::link [e95c184]
Disabled fallback assert mechanism for objects of sycl::kernelcreated
using interoperability API [f8014f5]
Added a diagnostic(exception) on attempt to pass a sycl::kernel_bundle to sycl::handler::use_kernel_bundle when kernel bundle and handler are bound
to different contexts [49eb2d7]
Tools
Improved sycl-ls device indexing to handle new backends [0cee18e]
Updated OpenCL CPU and FPGA emulator runtimes configuration on Windows to
use OpenCL ICD registry records instead of OCL_ICD_FILENAMES. That is
done to use the latest OpenCL ICD loader which ignores OCL_ICD_FILENAMES
configuration in the administrative console [92bcb41]
Fixed out-of-bound behavior for none addressing mode on CUDA [fceb10e]
Fixed memory leak which happened during device code compilation [bac0a25]
Fixed compiler crash when specialization constant is value dependent
[6dcc988]
Updated the device triple population logic to also incorporate usage of -fsycl-link-targets [e877e3b]
Fixed -Xsycl option triple check with multiple -fsycl-targets to only
use the last value of the option [c623223]
Fixed a bug with default value generation for specialization constant
[4f5fa0e]
SYCL Library
Fixed sycl::is_device_copyable type trait which could conflict with range
rounding optimization [2d28cd4]
Aligned sycl::id and sycl::range implementations with the specification.
The conversion operator from sycl::id to sycl::range has been deprecated
and will be removed in future versions [560a214]
Fixed a problem with some non-users kernels showing up in kernel bundles
[49e1e74] [4112cbc]
Fixed sycl::event::get_wait_list() which could return incorrect result
[02852c5] [5bb3ab9]
Fixed conversion of zero dimension sycl::accessor with sycl::access::address_space::global_device_space to an object of sycl::atomic type [ce7725d]
Fixed possible problem which could happen if there is no active CUDA context
during a kernel submission in CUDA backend [0d3cc99]
Fixed memory leak which could happen in case of unsuccessful sycl::stream
construction [4ceba5b]
Fixed a problem which could lead to memory corruption during graph cleanup
[42c6f44]
Fixed a problem with sycl::accessor::get_range returning incorrect results
in case of 3 dimensional image accessor [266515a]
Removed SYCL 1.2.1-style OpenCL interoperability API [bbafe08]
Removed half class from global namespace [c9128e6]
Known issues
[new] Having MESA OpenCL implementation which provides no devices on a
system may cause incorrect device discovery. As a workaround such an OpenCL
implementation can be disabled by removing /etc/OpenCL/vendor/mesa.icd.
[new] Compilation may fail on Windows in debug mode if a kernel uses std::array. This happens because debug version of std::array in
Microsoft STL C++ headers calls functions that are illegal for the device
code. As a workaround the following can be done:
Dump compiler pipeline execution strings by passing -### option to the
compiler. The compiler will print the internal execution strings of
compilation tools. The actual compilation will not happen.
Modify the (usually) first execution string (it should have -fsycl-is-device option) by adding -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the
end of the string. Execute all string one by one.
[new] -fsycl-dead-args-optimization can't help eliminate offset of
accessor even though it's created with no offset specified
[new] cuMemPrefetchAsync has issues on Windows. Hence, using sycl::queue::prefetch API on Windows might lead to failure [0c33048]
SYCL 2020 barriers show worse performance than SYCL 1.2.1 do [18c80fa]
When using fallback assert in separate compilation flow it requires explicit
linking against lib/libsycl-fallback-cassert.o or lib/libsycl-fallback-cassert.spv
Performance may be impacted by JIT-ing an extra 'copier' kernel and due
running the 'copier' kernel and host-task after each kernel which uses
assert
Limit alignment of allocation requests at 64KB which is the only alignment
supported by Level Zero[7dfaf3b]
On the following scenario on Level Zero backend:
Kernel A, which uses buffer A, is submitted to queue A.
Kernel B, which uses buffer B, is submitted to queue B.
queueA.wait().
queueB.wait().
DPCPP runtime used to treat unmap/write commands for buffer A/B as host
dependencies (i.e. they were waited for prior to enqueueing any command
that's dependent on them). This allowed Level Zero plugin to detect that
each queue is idle on steps 1/2 and submit the command list right away.
This is no longer the case since we started passing these dependencies in an
event waitlist and Level Zero plugin attempts to batch these commands, so
the execution of kernel B starts only on step 4. The workaround restores the
old behavior in this case until this is resolved [2023e10][6c137f8].
User-defined functions with the name and signature matching those of any
OpenCL C built-in function (i.e. an exact match of arguments, return type
doesn't matter) can lead to Undefined Behavior.
A DPC++ system that has FPGAs installed does not support multi-process
execution. Creating a context opens the device associated with the context
and places a lock on it for that process. No other process may use that
device. Some queries about the device through device.get_info<>() also
open up the device and lock it to that process since the runtime needs
to query the actual device to obtain that information.
The format of the object files produced by the compiler can change between
versions. The workaround is to rebuild the application.
Using sycl::program/sycl::kernel_bundle API to refer to a kernel defined
in another translation unit leads to undefined behavior
Linkage errors with the following message: error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
can happen when a SYCL application is built using MS Visual Studio 2019
version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
Printing internal defines isn't supported on Windows [50628db]