Skip to content

oneAPI DPC++ Compiler 2021-12

Compare
Choose a tag to compare
@tfzhu tfzhu released this 07 Mar 00:46
· 148324 commits to sycl since this release
27f59d8

New features

SYCL Compiler

  • Added support for -fgpu-inline-threshold which allows controlling inline
    threshold of the SYCL device code [5f7b607]
  • Added experimental support for CUDA backend on Windows [8aa3513]
  • Added support for experimental option -fsycl-max-parallel-link-jobs=<N>
    which can be used specify how many processes the compiler can use for
    linking the device code [c2221f0]

SYCL Library

Documentation

Improvements

SYCL Compiler

  • Added diagnostics on attempt to pass an incorrect value to
    -fsycl-device-code-split [631fd69]
  • Improved output of -fsycl-help[2404d02]
  • Allowed ::printf builtin for CUDA backend only [0c55d3a]
  • Implemented nextafter for sycl::half on CUDA backend [53c3268]
  • Added atomics with scopes and memory orders for CUDA backend
    [2ebde5f] [00f43b3]
  • Added support for missing mathematical builtins in CUDA backend
    [789ec8b] [f074774] [390e105]
  • Added diagnostic for non-forward declarable kernel name types [653bae9]
  • Added group_ballot intrinsic for CUDA backend [0680e5c]
  • Added support for device side assert for CUDA backend [5a87b8c]
  • Turned on -fsycl-dead-args-optimization by default [5983dfd]
  • Improved compilation time by removing free function queries calls detection
    [e4791d1]
  • Reduced memory consumption of device code linking [6266820]
  • Improved UX of sycl::ext::oneapi::experimental::printf by allowing format
    string to reside in a non-constant address space [2d62e51]
  • Improved barrier and sync instructions to use full mask when targeting NVPTX
    [5ce99b8]
  • Added match for default SPIR device architecture with host architecture i.e.
    x86_64 matches spir64 and i686 matches spir [f4d01cd]
  • Set default device code split mode to off for FPGA [bea72e6]
  • Improved diagnostic for invalid SYCL kernel names
    [455dce8] [df1ff7a]
  • Made Xsycl-target-frontend= to accept device tripple aliases [7fa0569]
  • Improved diagnostic messages for -fsycl-libspirv-path [c54c605]
  • Made implied default device to force emulation for FPGA builds [074944e]
  • Added support for sycl::ext::oneapi::sub_group::get_local_id for HIP
    backend [7a9335d]
  • Added a diagnostic of indirect implicit capture of this for kernel lambda
    [dce4c6a]

SYCL Library

  • Updated joint matrix queries to report if unsigned int variants of mad
    matrix instruction are supported [dd7ebce]
  • Reduced overhead of device code assert implementation [b94f23a] [58ac74e]
  • Added a diagnostic on attempt to call sycl::get_kernel_id with an invalid
    kernel [9dd1ea3]
  • Reduced overhead on kernel submission for CUDA backend [b79ae69]
  • Reduced overhead on kernel submission in backend independent part of runtime
    [e292aa5]
  • Aligned Level-Zero Interoperability API with SYCL 2020 specification
    [dd7f82c] [e662166]
  • Made sycl::half default constructor constexpr [d32a444]
  • Changed CUDA and HIP backends to report each device in a separate platform
    [8dddb11]
  • Added initial support for SYCL2020 exceptions [15e0ab1]
  • Added SYCL 2020 sycl::target::device enumeration value [f710886]
  • Added a diagnostic on attempt to print std::byte using sycl::stream
    [dd5e094]
  • Added possibility to specify ownership of ze_module_handle_t when creating
    a sycl::kernel_bundle from it [e3c9c92]
  • Improve performance of sycl::nd_item::get_group_range() [0cd7b7e]
  • Deprecated sycl::target::global_buffer
  • Made device_num which can be passed to SYCL_DEVICE_FILTER unique
    [7aa5be0]
  • Added a diagnostic on using mutually exclusive sycl::handler methods
    [6f620a4]
  • Added support for std::byte to sycl::vec class [8fa04fe]
  • Added sycl::make_kernel interoperability support for Level-Zero backend
    [98896fd]
  • Optimized work with events in the Level Zero backend [973aee9]
  • Added support for sycl::ext::oneapi::experimental::matrix::wi_slice and
    sycl::ext::oneapi::experimental::matrix::joint_matrix_fill
    [97127eb] [cbad428]
  • Enabled code location information when NDEBUG is not defined in XPTI
    notifications [e9f2d64] [9ca7cea]
  • Added a diagnostic on attempt to pass a command group function object to
    sycl::queue::single_task [2614d4d]
  • Enlarged the maximum batch size to 64 for Level Zero backend to improve
    performance [596f693]
  • Reduced kernel submission overhead for CUDA backend [35729a7]
  • Improved translation of Level Zero error codes [6699a5d], [5d9a04b]
  • Added support for an arbitrary number of elements to
    sycl::ext::intel::experimental::esimd::simd::copy_from/to methods
    [2bdc4c4]
  • Added HIP support to sycl::ext::oneapi::filter_selector
    [7224cb2], [b7cee06]
  • Added support for batching copy commands for Level Zero backend [4c3e699]
  • Reduced sycl::queue::submit overhead by enabling post-enqueue execution
    graph cleanup [6fd6098]
  • Added support for classes implicitly converted from sycl::item in
    sycl::handler::parallel_for parameter to align with the SYCL 2020
    specification [34b93bf]
  • Removed direct initialization constructor from
    sycl::ext::intel::experimental::bfloat16 class [81154ec]
  • Added sycl::vec and sycl::marray support to sycl::known_identity type
    trait [8fefb25]
  • Added minimal support for the generic space address space to match
    sycl::atomic_ref class definition in specification [e99f298]
  • Improved cache of command-lists in the context to be per-device for Level
    Zero backend [ca457d9]
  • Extended group algorithms to support broadened types [3205368]
  • Added support for alignement flags in
    sycl::ext::intel::experimental::esimd::simd::copy_from/copy_to operations
    [27f5c12]
  • Made sycl::ext::oneapi::atomic_ref available in sycl namespace
    [2cdcbed]
  • Renamed cuda and hip enum values and namespaces to ext_oneapu_cuda and
    ext_oneapi_hip to align with SYCL 2020 specification [97f916e]
  • Improved performance of kernel submission process [535ad1e]
  • Eliminated build of unwanted kernels when creating one with make_kernel
    [53ea8b9]
  • Removed duplicate devices on submission to kernel_bundle API functions
    [c222497]
  • Deprecated sycl::aspects::int64_base_atomics and
    sycl::aspects::int64_extended_atomics [554b79c]
  • Made backend specific headers be included implicitly [bc8a00a]
  • Removed program class and related API [e7cc7b0]
  • Excluded current working directory from DLL search path when looking for
    runtime dependencies [0a65cb4]
  • Enabled persistent device code cache for kernel bundles [810d67a]
  • Removed SYCL 1.2.1-style OpenCL interoperability API [bbafe08]
  • Added a diagnostic on attempt to pass incorrect local size for HIP backend
    [c56499c]
  • Added a diagnostic on attempt to create an object of sycl::bufferwith
    non-device copyable underlying date type [61f1ae6]
  • Added a diagnostic(exception) on attempt to pass an empty sycl::device
    list to sycl::link [e95c184]
  • Disabled fallback assert mechanism for objects of sycl::kernelcreated
    using interoperability API [f8014f5]
  • Added a diagnostic(exception) on attempt to pass a sycl::kernel_bundle to
    sycl::handler::use_kernel_bundle when kernel bundle and handler are bound
    to different contexts [49eb2d7]

Tools

  • Improved sycl-ls device indexing to handle new backends [0cee18e]

Documentation

Bug fixes

SYCL Compiler

  • Fixed a problem which might affect headers searching [aad223f]
  • Fixed a problem which prevented submitting kernels from a static library
    when using CUDA backend [665a0a2]
  • Fixed backwards compatibility for libraries in archive format [53308b1]
  • Fixed crash on attempt to compile a kernel which uses templated global
    variable [226679a]
  • Fixed a possible hang caused by recursive calls to memcpy or memset on
    CPU device [cb5e8ae]
  • Allowed to have SYCL headers included in extern C++ block [fcd7e31]
  • Disabled the warning when fsycl-host-compiler is used [756c2e8]
  • Fixed handling of specializations constants when compiling for AOT and
    SPIR-V images [f82ddf4]
  • Fixed nan builtin for doubles in libclc [cc0d67f]
  • Fixed out-of-bound behavior for none addressing mode on CUDA [fceb10e]
  • Fixed memory leak which happened during device code compilation [bac0a25]
  • Fixed compiler crash when specialization constant is value dependent
    [6dcc988]
  • Updated the device triple population logic to also incorporate usage of
    -fsycl-link-targets [e877e3b]
  • Fixed -Xsycl option triple check with multiple -fsycl-targets to only
    use the last value of the option [c623223]
  • Fixed a bug with default value generation for specialization constant
    [4f5fa0e]

SYCL Library

  • Fixed sycl::is_device_copyable type trait which could conflict with range
    rounding optimization [2d28cd4]
  • Aligned sycl::id and sycl::range implementations with the specification.
    The conversion operator from sycl::id to sycl::range has been deprecated
    and will be removed in future versions [560a214]
  • Fixed a problem with some non-users kernels showing up in kernel bundles
    [49e1e74] [4112cbc]
  • Fixed sycl::event::get_wait_list() which could return incorrect result
    [02852c5] [5bb3ab9]
  • Fixed conversion of zero dimension sycl::accessor with
    sycl::access::address_space::global_device_space to an object of
    sycl::atomic type [ce7725d]
  • Fixed possible problem which could happen if there is no active CUDA context
    during a kernel submission in CUDA backend [0d3cc99]
  • Fixed memory leak which could happen in case of unsuccessful sycl::stream
    construction [4ceba5b]
  • Fixed a problem which could lead to memory corruption during graph cleanup
    [42c6f44]
  • Fixed a problem with sycl::accessor::get_range returning incorrect results
    in case of 3 dimensional image accessor [266515a]
  • Fixed sycl::ext::oneapi::sub_group_mask::insert_bits [f88a19e]
  • Fixed sycl::ext::intel::experimental::esimd::simd_mask behavior for
    sub-group sizes less than 32 [c855fd1]
  • Fixed a problem with sycl::handler::fill for HIP backend [cee76d9]
  • Fixed a problem resulting in an exception when trying to allocate a big
    sycl::image on the host [3061bc7]
  • Fixed behavior of sycl::handler::parallel_for which takes a sycl::kernel
    to ignore a sycl::kernel_bundle which may be set earlier [0b97344]
  • Made sycl::kernel_bundle::get_native const [91fef67]
  • Fixed logic, arithmetic and compare operators for
    sycl::ext::intel::experimental::esimd::simd_view with scalar values
    [7e11e48] [b88c455]
  • Aligned sycl::device::create_sub_devices exceptions with SYCL 2020
    specification [d2252e6]
  • Added a diagnostic on attempt to submit a kernel which is missing from a
    sycl::kernel_bundle set for a handler [7dbb6fb]
  • Disabled fallback assert for CUDA, HIP and FPGA devices [b0411f8] [09ece34]
  • Fixed a race which could happen when submitting a host task to the in-order
    sycl::queue [20d9346]
  • Fixed memory leak when constructing a sycl::queue using interoperability
    API with Level Zero [299f506]
  • Fixed a problem with releasing USM memory while there are not finished
    kernels that may access this memory for GPU devices of OpenCL backend
    [c4a7290]
  • Fixed a hang which could happen when cross queues dependencies are used
    [23e180b]
  • Fixed a memory leak in ext::intel::experimental::online_compiler
    [2af6ccd]
  • Fixed alignment of sycl::vec class on windows platform [826c569]
  • Fixed an issue with incorrect arguments setting in case of multiple device
    images [5ca3628]
  • Made accessor::get_pointer always return base pointer, even for accessors
    with offsets [59fcb82]
  • Fixed problems with selection of the correct binary [a346c08]
  • Fixed sycl::span type deduction [f5d08c5]
  • Made initialization of host task thread pool thread-safe [2caf555]
  • Fixed infinite loop when sycl::handler::parallel_for range exceeds
    INT_MAX [64720d8], [fd0b108]
  • Fixed platform query in USM allocation info for HIP backend [2bc3c92]
  • Fixed dangling pointer in sycl::device_event [8a3bd1c]
  • Widen (u)int8/16 to (u)int32 and half to float in
    sycl::group_broadcast to workaround missing support of (u)int8/16 and
    half on OpenCL CPU [1f3f9b9]
  • Fixed an issue with sycl::make_kernel [13443a9]
  • Fixed a memory leak of specialization constant buffer [ae711ab]
  • Fixed a memory leak in in-memory program cache [0b8cff4]
  • Fixed a problem with invalid sycl::kernel_boundle being associated with a
    sycl::kernel created using sycl::make_kernel [7431afa]
  • Disabled use of std::byte when __HAS_STD_BYTE=0 definition supplied on
    MSVC [347b114]
  • Removed half class from global namespace [c9128e6]
  • Defined missing feature test macros [504ac97]
  • Fixed a problem with using sycl::multi_ptr with std::byte as an
    underlying type [cb388aa]
  • Fixed an issue with sycl::queue::[memcpy|memset] APIs breaking dependency
    chain when used for USM memory and zero size is specified [1a08b7e]
  • Fixed device aspect check in sycl::has_kernel_bundle [88cc524]

API/ABI breakages

  • The following classes have been removed:
    • sycl::vector_class
    • sycl::string_class
    • sycl::function_class
    • sycl::mutex_class
    • sycl::unique_ptr_class
    • sycl::shared_ptr_class
    • sycl::weak_ptr_class
    • sycl::hash_class
    • sycl::exception_ptr_class.
      Replacement: analogs from STL
  • Support for the following attributes have been removed:
    • intelfpga::num_simd_work_items
    • intelfpga::max_work_group_size
    • intelfpga::max_global_work_dim
    • intelfpga::no_global_work_offset
    • intelfpga::ivdep
    • intelfpga::ii
    • intelfpga::max_concurrency
    • intelfpga::loop_coalesce
    • intelfpga::disable_loop_pipelining
    • intelfpga::max_interleaving
    • intelfpga::speculated_iterations
    • intelfpga::doublepump
    • intelfpga::singlepump
    • intelfpga::memory
    • intelfpga::register
    • intelfpga::bankwidth
    • intelfpga::numbanks
    • intelfpga::private_copies
    • intelfpga::merge
    • intelfpga::max_replicates
    • intelfpga::simple_dual_port
    • intelfpga::bank_bits
    • intelfpga::force_pow2_depth
    • intelfpga::scheduler_target_fmax_mhz
      Replacement: the same attributes, but in intel:: namespace
  • Removed sycl::ONEAPI and sycl::INTEL
    Replacement: sycl::ext::oneapi and sycl::ext::intel
  • The patch [8a3bd1c] should not break SYCL library ABI but can affect
    device-code ABI
  • Removed program class and related API [e7cc7b0]
  • Removed SYCL 1.2.1-style OpenCL interoperability API [bbafe08]
  • Removed half class from global namespace [c9128e6]

Known issues

  • [new] Having MESA OpenCL implementation which provides no devices on a
    system may cause incorrect device discovery. As a workaround such an OpenCL
    implementation can be disabled by removing /etc/OpenCL/vendor/mesa.icd.
  • [new] Compilation may fail on Windows in debug mode if a kernel uses
    std::array. This happens because debug version of std::array in
    Microsoft STL C++ headers calls functions that are illegal for the device
    code. As a workaround the following can be done:
    1. Dump compiler pipeline execution strings by passing -### option to the
      compiler. The compiler will print the internal execution strings of
      compilation tools. The actual compilation will not happen.
    2. Modify the (usually) first execution string (it should have
      -fsycl-is-device option) by adding
      -D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0 options to the
      end of the string. Execute all string one by one.
  • [new] -fsycl-dead-args-optimization can't help eliminate offset of
    accessor even though it's created with no offset specified
  • [new] cuMemPrefetchAsync has issues on Windows. Hence, using
    sycl::queue::prefetch API on Windows might lead to failure [0c33048]
  • SYCL 2020 barriers show worse performance than SYCL 1.2.1 do [18c80fa]
  • When using fallback assert in separate compilation flow it requires explicit
    linking against lib/libsycl-fallback-cassert.o or
    lib/libsycl-fallback-cassert.spv
  • Performance may be impacted by JIT-ing an extra 'copier' kernel and due
    running the 'copier' kernel and host-task after each kernel which uses
    assert
  • Limit alignment of allocation requests at 64KB which is the only alignment
    supported by Level Zero[7dfaf3b]
  • On the following scenario on Level Zero backend:
    1. Kernel A, which uses buffer A, is submitted to queue A.
    2. Kernel B, which uses buffer B, is submitted to queue B.
    3. queueA.wait().
    4. queueB.wait().
      DPCPP runtime used to treat unmap/write commands for buffer A/B as host
      dependencies (i.e. they were waited for prior to enqueueing any command
      that's dependent on them). This allowed Level Zero plugin to detect that
      each queue is idle on steps 1/2 and submit the command list right away.
      This is no longer the case since we started passing these dependencies in an
      event waitlist and Level Zero plugin attempts to batch these commands, so
      the execution of kernel B starts only on step 4. The workaround restores the
      old behavior in this case until this is resolved [2023e10][6c137f8].
  • User-defined functions with the name and signature matching those of any
    OpenCL C built-in function (i.e. an exact match of arguments, return type
    doesn't matter) can lead to Undefined Behavior.
  • A DPC++ system that has FPGAs installed does not support multi-process
    execution. Creating a context opens the device associated with the context
    and places a lock on it for that process. No other process may use that
    device. Some queries about the device through device.get_info<>() also
    open up the device and lock it to that process since the runtime needs
    to query the actual device to obtain that information.
  • The format of the object files produced by the compiler can change between
    versions. The workaround is to rebuild the application.
  • Using sycl::program/sycl::kernel_bundle API to refer to a kernel defined
    in another translation unit leads to undefined behavior
  • Linkage errors with the following message:
    error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
    can happen when a SYCL application is built using MS Visual Studio 2019
    version below 16.3.0 and user specifies -std=c++14 or /std:c++14.
  • Printing internal defines isn't supported on Windows [50628db]