Releases: intel/llvm
Releases · intel/llvm
DPC++ daily 2022-06-30
[SYCL] Use std::ignore for all unused args in bfloat builtins (#6381) Signed-off-by: Larsen, Steffen <[email protected]>
DPC++ daily 2022-06-29
sycl-nightly/20220629 [SYCL][FPGA][NFC] Refactor [[intel::num_simd_work_items()]] attribute…
oneAPI DPC++ Compiler 2022-06
New features
SYCL Compiler
- Added
-fcuda-prec-sqrt
frontend compiler option which enables higher presision version ofsqrt
in the device code [ebf9ea8] - Added support for local memory accessors for the HIP backend. [58508ba]
- Added initial support of
-lname
processing when searching for fat static libraries. [35e32d8] [a33f9c8] - Added
-fsycl-fp32-prec-sqrt
flag which enables correctly roundedsycl::sqrt
. [5c8b7e7] - Added support for
[[intel::loop_count()]]
attribute. [c536e76] - Added support for passing driver options to JIT compiler and linker. [1c93bfe]
- Added default argument support for
work_group_size_hint
attribute. [0cff80e] - Added support for float and double exchange and compare exchange atomic operations in CUDA libclc. [1d84c99]
- Added
--ffast-math
support for CUDA libclc. [0f0c5d1] - Added support for software atomics (except for the ones using system scope) for lower sm versions of CUDA architecture. Enabled
SYCL_USE_NATIVE_FP_ATOMICS
by default. [7bc8447] - Added support for the global offset for AMDGPU. [2dc3c06]
- Added support for asynchronous barrier for CUDA backend sm 80+. [6770421]
- Added
-f[no-]sycl-device-lib-jit-link
option to control JIT linking of SYCL device libraries. [dfb37a8] [c946286] - Added support for the new FPGA attribute
[[intel::fpga_pipeline(N)]]
for loop pipelining. [92aadf3] - Added
assert
support for Windows NVPTX. [f29b498] - Added support for
sycl_ext_oneapi_properties
extension. [87f60f6][1984e74][a2583ec][cdf561a][d2982c6][35c2e00]
SYCL Library
- Added support for Nvidia MMA for
bf16
, mixed precision int((u)int8/int32)
, and mixed precision float(half/float)
. [5373362] - Added a mode for the Level Zero plugin where only last command in each batch yields a host-visible event. Enabled this mode by default. [c6b7b8e]
- Added an option to query for atomic scope capabilities for the CUDA backend. Updated returns for atomics memory order capabilties. [43a4192]
- Added support for an experimental Level Zero API for host pointer import into USM. The feature can be enabled using
SYCL_USM_HOSTPTR_IMPORT
environment variable. [844d7b6] - Added support for the
wi_element
forbf16
type. [9f2b7bd] - Added complex support for the reduce and scan group algorithms. [90a4dc7]
- Added support for SYCL 2020 methods in the
group
class. [73d59ce] - Added
SYCL_RT_WARNING_LEVEL
environment variable which allows to control amount of warnings and performance hints the runtime library may print. [2741010] - Added
tanh
(for floats/halfs) andexp2
(for halfs) native definitions for CUDA backend. [250c498] - Added
bf16
builtins forfma
,fmin
,fmax
andfmax
on CUDA backend. [62651dd] - Added support for USM buffer location properties which allows to specify at what memory location the device usm allocation should be in. [12c988a]
- Added support for
buffer_location
property to thesycl::buffer
. [9808525] - Added
single_task
support for ESIMD_EMULATOR backend. [2331160] - Added support for SVM 1,2,4-elements gather/scatter for ESIMD. [e200720]
- Added support for
bf16
builtins operating on storage types for CUDA backend. [413a9ef] - Added support for
backend_version
device property for CUDA backend. [4b1a4bc] - Added support for round-robin submissions to multiple compute CCS for the Level Zero backend. Disabled by default, can be controlled using
SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE
. [a836c87] - Added support for buffer migration for contexts with multiple devices in the Level Zero plugin. [7baf152]
- Added mode where the Level Zero plugin uses immediate command-lists instead of standard command-lists. This mode is disabled by default, can be enabled using
SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS
environment variable. [b9cb1d1] - Added support for
sycl::get_native(sycl::buffer)
for OpenCL and CUDA backends. [8b3c8c4] - Added reduction overloads accepting
span
. [863383b] - Added LSC support for ESIMD_EMULATOR backend. [b78bf00]
- Added
half
type support for__esimd_convertvector_to/from
. [0bfffd6] - Added
buffer_allocator
SYCL 2020 conformant variant. [53430c8] - Added support for the USM buffer location property in
malloc_shared
. [6e89821] [9f61c8e][8c4d9a5] - Added support for the USM buffer location property in
malloc_host
. [2c7caab] - Added experimental context and device interoperability support for CUDA. [f0df89a]
- Added support for memory intrinsics for the ESIMD_EMULATOR plugin. [1a8f501]
- Added support for named barrier APIs for ESIMD. [1df0038]
- Added support for DPAS API for ESIMD. [5881938]
- Added support for LSC memory access APIs for ESIMD. [4bd50e7]
- Added support for the
invoke_simd
feature. [4072557][8471ff3][8c7bb45][62afb59][3e1c1bf] - Added support for
info::device::atomic64
for OpenCL and Level Zero backends. [8feb558] - Added support for
sycl_ext_oneapi_usm_device_read_only
extension [644c614][58c9d3a] - Added support for mapping/unmapping operations for ESIMD_EMULATOR plugin. [bc0579a]
- Added support for
make_buffer
API for the Level Zero backend. [7c49984] - Added interoperability support for HIP backend. [e06d1b5]
- Added missing
+-*/
operations forhalf
. [059efbc] - Introduced new environment variable
SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ
to control the max local memory allowed to be allocated per kernel on CUDA backend. [2e24304] - Added
ext_intel_global_host_space
in accordance withsycl_ext_intel_usm_address_spaces
extension. [7a2f44b] - Added aspect for
bfloat16
. [f84fc32] - Introduced "Intel math functions" device library with support of type cast util functions for float, double and integer type. [a310952]
- Added
bfloat16
support forjoint_matrix
[6ac62ab]
Documentation
- Added
sycl_ext_oneapi_complex_algorithms
extension [7ae7ca8] - Added a design document for
sycl_ext_oneapi_device_global
extension [8c22ef1] - Added a design document for
sycl_ext_oneapi_properties
extension [912572f] - Added new
sycl_ext_oneapi_free_function_queries
proposal. [7a93a49] - Added
sycl_ext_oneapi_group_load_store
extension. [85ccdc0] - Added validation rules to the SPIR-V extension
SPV_INTEL_global_variable_decorations
. [dfaa070] - Added
SYCL_INTEL_buffer_location
extension to supportbuffer_location
property for USM allocations. [962417d] [36a9ee2] - Added
sycl_ext_oneapi_named_sub_group_sizes
extension proposal which aims to simplify the process of using sub-groups. [4f3d7e1] - Added experimental latency control API into
SYCL_INTEL_data_flow_pipes
. [5224f78] - Added
sycl_ext_oneapi_auto_local_range
extension proposal. [cb4e702] - Added SYCL 2020 spec constants design doc. [8ec9755]
- Added
sycl_ext_oneapi_queue_status_query
extension proposal. [b6143e5] - Added initial version of
sycl_ext_oneapi_invoke_simd
andsycl_ext_oneapi_uniform
extenions proposal. [a37ca84] - Added the
sycl_ext_oneapi_annotated_arg
extension proposal for applying properties on kernel arguments. [caa696f] - Added
sycl_ext_oneapi_cuda_async_barrier
extension for CUDA backend. [6770421] - Added
bfloat16
support to thefma
,fmin
,fmax
andfabs
SYCL floating point math functions intosycl_ext_oneapi_bfloat16
extension. [c76ef5c] - Added initial version of
sycl_ext_oneapi_root_group
extension proposal. [b59cd43]
Tools
- Implemented property set generation for device globals in the sycl-post-link. Added the `--device-gl...
DPC++ daily 2022-06-28
sycl-nightly/20220628 [SYCL] Reset signalled command list if there no available command lis…
DPC++ daily 2022-06-27
[SYCL][CUDA] Retain context with CUDA event interop (#6361) This PR fixes an issue found in the SYCL-CTS cuda interop tests. Where the context had not been properly retained when creating a native event with CUDA interop.
DPC++ daily 2022-06-26
[SYCL] June 2022 Release Notes (#6317) * Release notes for commit range f34ba2c..4043dda * Update known issues: 1. cuda prefetch issue seems to be fixed by: https://github.com/intel/llvm/pull/5043 2. Performance issues with assert seem to be fixed by: https://github.com/intel/llvm/pull/4505 https://github.com/intel/llvm/pull/4516
DPC++ daily 2022-06-25
[SYCL] June 2022 Release Notes (#6317) * Release notes for commit range f34ba2c..4043dda * Update known issues: 1. cuda prefetch issue seems to be fixed by: https://github.com/intel/llvm/pull/5043 2. Performance issues with assert seem to be fixed by: https://github.com/intel/llvm/pull/4505 https://github.com/intel/llvm/pull/4516
DPC++ daily 2022-06-24
[SYCL][PI][CUDA] Fix transfer stream reuse (#6354) Fixed a bug that can cause transfer stream to be reused for compute, messing up the synchronization.
DPC++ daily 2022-06-23
[SYCL] Removes more uses of OpenCL header definitions (#6328) This commit adds various PI definitions and replaces some uses of OpenCL header definitions from the SYCL runtime library. To further isolate the remaining uses of the OpenCL headers, the includes of cl.h are moved from common.hpp to the dependent files. Tests updates at: intel/llvm-test-suite#1061
DPC++ daily 2022-06-22
[ESIMD] Implement stateless memory accesses enforcement (#6287) The driver option -f[no-]sycl-esimd-force-stateless-mem is added. -fsycl-esimd-force-stateless-mem enables the automatic conversion of stateful memory accesses via SYCL accessors or surface-index to stateless within ESIMD kernels. It also disables those ESIMD intrinsics that use stateful accesses that cannot be converted to stateless. -fsycl-esimd-force-stateless-mem defines the macro __ESIMD_FORCE_STATELESS_MEM to map the calls of ESIMD API using accessors to calls of API using pointers. It also passes a switch to sycl-post-link to signal it that it should ignore the buffer_t attribute and use svmptr_t. -fno-sycl-esimd-force-stateless-mem is used to tell the compiler not to convert stateful memory accesses to stateless. Default behavior. Draft of the design document/proposal for this change-set: #6187