Releases · IntelPython/dpctl

28 Feb 19:25

0.19.0

1336b31

v0.19.0 Latest

Latest

This release features official, out-of-the-box support for compiling dpctl for specified AMD GPU architectures, the addition of new function tensor.top_k, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice.

A number of adjustments were also made to improve performance of dpctl reductions (i.e., sum, min, max, etc.), accumulators (i.e., cumulative_sum, cumulative_logsumexp), and copy-and-cast operations.

Added

Support for compiling dpctl for specified AMD GPU architecture with use of CodePlay oneAPI plug-in gh-1731
Added tensor.top_k per Python Array API specification gh-1921
Added functions tensor.dldevice_to_sycl_device and tensor.sycl_device_to_dldevice for converting between DLPack and sycl devices, and a method get_device_id to dpctl.SyclDevice to improve interoperability with DLPack protocol gh-1953
Added DPCTL_OFFLOAD_COMPRESS cmake option (set to OFF by default) to toggle --offload-compress linker option when building dpctl gh-1961

Changed

Improved performance of copy-and-cast operations from numpy.ndarray to tensor.usm_ndarray for contiguous inputs gh-1829
py_sort and py_argsort now throw py::value_error if inputs are not C-contiguous gh-1838
Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices gh-1850
Improved performance of tensor.argsort function for all types gh-1859
Improved performance of tensor.sort and tensor.argsort for short arrays in the range [16, 64] elements gh-1866
Implemented radix sort algorithm to be used in dpt.sort and dpt.argsort gh-1867, gh-1883
Extended dpctl.SyclTimer with device_timer keyword, implementing different methods of collecting device times gh-1872
dpctl changed to see GPU devices out of the box in virtual environment on Windows gh-1922
Improved performance of tensor.cumulative_sum, tensor.cumulative_prod, tensor.cumulative_logsumexp as well as performance of boolean indexing gh-1923, gh-1942
Improved performance of tensor.min, tensor.max, tensor.logsumexp, tensor.reduce_hypot for floating point type arrays by at least 2x gh-1932, gh-1937
Updated Cython examples to use scikit-build gh-1935
Reduced binary size of _tensor_accumulation_impl by 13 MB gh-1957
Extended tensor.asarray to support objects that implement __usm_ndarray__ property to be interpreted as usm_ndarray objects gh-1959
tensor.usm_ndarray object disallows implicit conversions to NumPy array gh-1964
stream arguments in tensor.usm_ndarray methods now raise an error if stream is not a tensor.SyclQueue gh-1969
dpctl initialization sets subprocess to use SPAWN method on Linux to enable gdb-oneapi to debug kernels submitted from Python applications gh-1971
Reduced binary size of _tensor_elementwise_impl gh-1976
Allow dpctl.SyclQueue.memcpy to and from multi-dimensional buffers gh-1985

Fixed

Fixed a bug in tensor.roll for very large values of shift gh-1869
Fix for tensor.result_type when all inputs are Python built-in scalars gh-1877
Improved error in constructors tensor.full and tensor.full_like when provided a non-numeric fill value gh-1878
Added a check for pointer alignment when copying to C-contiguous memory gh-1890, gh-1891
Fixed dpctl installed into virtual environment not finding DPC++ runtime libraries by adding DPCTL_WITH_REDIST cmake option (set to OFF by default) gh-1893
Fixed incorrect result (issue gh-1901) in tensor.cumulative_sum and in advanced indexing gh-1902
Fixed __setitem__() for tensor.usm_ndarray when passed an empty boolean mask gh-1915
tensor.from_dlpack docstring now shows that return type can be NumPy array and stipulates when this will be the case gh-1919
Fixed docstring in helper class in DLPack tests gh-1920
Fixed a bug in tensor.astype where copy=False would not be respected for 1d arrays when order keyword is specified gh-1928
Replaced deprecated CL/sycl.hpp with recommended sycl/sycl.hpp in examples gh-1933
Fixed tensor.take_along_axis and tensor.put_along_axis raising an error for tensor.uint64 indices when given an array of dimension greater than 1 gh-1934
Fixed unexpected results of tensor.sum with a requested output type of bool gh-1958
Use std::move to avoid unnecessary copying of temporary in triul_ctor.cpp gh-1960
Make stream a keyword-only argument in tensor.usm_ndarray.to_device per requirement by array API specification gh-1966
Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in tensor.argsort for 1d input gh-1967
Corrected uses of NumPy constructors with tensor.usm_ndarray inputs in test suite gh-1968
Fixed array API namespace inspection utilities showing complex128 as a valid dtype on devices without double precision and device keywords not working with dpctl.SyclQueue or filter strings gh-1979
Fixed a bug in test_sycl_device_interface.cpp which would cause compilation to fail with Clang version 20.0 gh-1989
Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls gh-2002
UsmNDArray_MakeSimpleFromPtr and UsmNDArray_MakeFromPtr now raise an error when provided an invalid typenum before attempting to create the array gh-2003
Fixed typos in tensor.from_numpy and tensor.astype gh-2006

Maintenance

Revert pinning of cmake to 3.26 on Windows gh-1823
Update black version used in Python code style workflow gh-1828
Fixed CI/CD workflow for building conda packages on Windows gh-1831
Revert work-around in test_sycl_kernel_submit.py for problem in MKL 2024.2.0 gh-1836
Do not use Mambaforge variant of miniforge as deprecated gh-1844
Use pybind11=2.13.6 gh-1845
Remove unnecessary include in C++ header file gh-1846
Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library gh-1847
Add instructions for installing dpctl from Intel PyPi channel gh-1860
Fix warnings when generating docs gh-1855, gh-1861
Align conda recipe with conda-forge's {{ stdlib("c") }} migration gh-1868
Add missing include of SYCL header to "math_utils.hpp" gh-1899
Add support of CV-qualifiers in is_complex<T> helper gh-1900
Tuning work for elementwise functions with modest performance gains (under 10%) gh-1889
Reduce binary ...

Contributors

sommerlukas

Assets 4

07 Dec 18:21

oleksandr-pavlyk

0.18.3

69be39d

v0.18.3

This is a bug fix release which supports use of dpctl in virtual environment on Windows, resolving gh-1745.

Assets 4

03 Dec 20:58

oleksandr-pavlyk

0.18.2

7bac769

v0.18.2

This is a bug-fix release, see https://github.com/IntelPython/dpctl/milestone/15.

It backports fixes for

tensor.result_type behavior for scalars (see gh-1874) and
errors when using dpctl in virtual environment on Linux (gh-1892).

Changes from PR gh-1899 were also backported.

Assets 4

14 Oct 11:56

oleksandr-pavlyk

0.18.1

5e5513f

v0.18.1

This is incremental release where only installation instructions in README were updated to reflect the change in location of index with Python packages built by Intel(R) relative to 0.18.0 release.

Assets 4

30 Sep 10:42

oleksandr-pavlyk

0.18.0

786365e

v0.18.0

This release reaches an important milestone of making offloading fully asynchronous.

Calls to dpctl.tensor submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.

The full list of changes that went into this release are:

Added

Implement tensor.take_along_axis per Python Array API specification gh-1778
Implement tensor.put_along_axis to complement tensor.take_along_axis gh-1798
Support for 'device=tensor.kDLCPU' in tensor.from_dlpack function and tensor.usm_ndarray.__dlpack__ method gh-1781
Support DLPack on Windows gh-1746
Implement tensor.nextafter function per Python Array API specification gh-1730
Implement tensor.count_nonzero and tensor.diff functions from Python array API specification gh-1732, gh-1780
Add support for order="K" to *_like array creation functions, and change default order keyword value from 'C' to 'K' gh-1808
Support for 'max dimensions' in Array API capabilities info data gh-1774
Add support for device aspect 'emulated' gh-1691
dpctl::tensor::usm_memory class defined in dpctl4pybind11.hpp adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library gh-1782
Add support for COVERAGE build type in project's CMake script gh-1692

Change

Change ownership of USM allocation by dpctl.memory objects, make executions of dpctl.tensor operations asynchronous gh-1705
Add support for Python scalars by tensor.where function gh-1719
Optimize division by Python scalar in statistical functions tensor.mean, tensor.std, tensor.var gh-1820
Use transcendental functions from sycl namespace instead of std namespace gh-1707
Changes for compatibility with recent NumPy in runtime environment gh-1735, gh-1772, gh-1804
Array creation function tensor.zeros to use asynchronous memset operation gh-1806
The setter of tensor.usm_ndarray.shape property now supports Python scalar value gh-1786
Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices gh-1660
No longer set SOVERSION property in DPCTLSyclInterface library on Linux gh-1773
Update version of 'pybind11' used gh-1758, gh-1812
Handle possible exceptions by usm_host_allocator used with std::vector gh-1791
Use dpctl::tensor::offset_utils::sycl_free_noexcept instead of sycl::free in host_task tasks associated with life-time management of temporary USM allocations gh-1797
Add "same_kind"-style casting for in-place mathematical operators of tensor.usm_ndarray gh-1827, gh-1830

Fixed

Fix setting of release variable Sphinx config file gh-1685
Handle possible NULL return value from device aspect queries DPCTLDevice_GetMaxWorkGroupSize1d and DPCTLDevice_GetMaxWorkGroupSize2d gh-1690
Add license header to conda script files gh-1695
Fix tensor.round behavior on CUDA devices gh-1700
Add missing #include <sstream> gh-1701
Fix for issue 1724 gh-1728
Correct USM type for return array of tensor.extract function gh-1727
Fix for tensor.unique_all and tensor.unique_inverse to always return index arrays with default indexing data type gh-1741
Propagate read-only flag from __sycl_usm_array_interface__ in tensor.asarray function gh-1756
tensor.clip to handle Python scalars which are out of bound for the data type of integral array gh-1759
Avoid dead-locking by releasing GIL around blocking operations in libtensor gh-1753
Element-wise tensor.divide and comparison operations allow greater range of Python integer and integer array combinations gh-1771
Fix for unexpected behavior when using floating point types for array indexing gh-1792
Enable pytest --pyargs dpctl.tests gh-1833

Maintenance

Improve performance of test_sort_complex_fp_nan gh-1704
Improve exception wording raised by tensor.broadcast_arrays() gh-1720
Remove template keyword in method call of sycl::kernel_bundle gh-1726
Backport changelog edits from maintenance/0.17.x gh-1736
Replace uses of 'intel' channels in docs and readme file gh-1737
Update references to deprecated environment variable SYCL_DEVICE_FILTER gh-1740
Correction for installation instruction steps gh-1754
Fix for crash during testing with open source SYCL bundle by updating CPU RT library used gh-1762
Add missing include to fix build break with newer LLVM gh-1776
Add #include <utility> for definition of std::move used gh-1787
Change to CMake script to accomodate DPC++ transition from PI to UR architecture gh-1788
Document tensor._flags.Flags class gh-1794
Fix for unreferenced unreleased bug in copy-and-cast code logic gh-1799
Explicitly include headers used in C++ translation units implementing reduction operations gh-1802
Clean-up uses of Strided1DIndexer class gh-1805
Tweak to readability of C++ code implementing matrix-matrix multiplication gh-1810
Do not add sycl::event associated with compute task to vector of events representing execution of host_task gh-1807
Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on libze1 package which provides Level-Zero loader library gh-1801, gh-1840
Use dedicated type-support matrices for in-place element-wise binary operations gh-1816
Remove recommendation to install wheels from Anaconda PyPI index gh-1819
Removed use of post-link and pre-unlink conda scripts in dpctl gh-1821
Pin compiler used to build 0.18.0 version to 2025.0.0 gh-1822
A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
gh-1686, gh-1688, gh-1697, gh-1698, gh-1703, gh-1702, gh-1709, gh-1712, gh-1713, gh-1722, gh-1725, gh-1729, gh-1733, [gh-1721](https...

Assets 4

14 Jul 13:51

oleksandr-pavlyk

0.17.0

a5c40d9

0.17.0

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.

Added

Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602

Changed

Expanded documentation for dpctl: gh-1619
Expanded utils.intel_device_info functionality: gh-1656
Improved performance of elementwise operations: gh-1651
Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
dpctl uses pybind11 2.12.0: gh-1640
Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677

Fixed

Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
Support use of index arrays of different integral types in indexing operations: gh-47
Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
Fixed support for out keyword in tensor.matmul: gh-1610
Fixed bug in basic slicing of empty arrays: gh-1680
Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
Fixed bug in tensor.repeat on zero-size input arrays: gh-1682

New Contributors

@bdmoore1 made their first contribution in #1659
@ekomarova made their first contribution in #1666

Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md

Contributors

ekomarova and bdmoore1

Assets 4

11 Apr 01:25

oleksandr-pavlyk

0.16.1

1f13ce8

v0.16.1

This release includes bug fixes and provides a change needed by numba_dpex project to support dispatching kernels
consuming instances of sycl::local_accessor template type.

Changed

Changed behavior of dpctl.tensor.usm_ndarray.__dlpack_device__ method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
Array creation functions and the usm_ndarray constructor in dpctl.tensor submodule now use cached default-selected device to improve performance: #1606
Changed treatment of axis keyword for dpctl.tensor.tensordot and dpctl.tensor.vecdot to align with Python Array API 2023.12 specification: #1608
Changed implementation of DPCTLQueue_SubmitRange, DPCTLQueue_SubmitNDRange in DPCTLSyclInterface library to support sycl::local_accessor arguments needed by numba_dpex; the enum DPCTLKernelArgT\ ype to correspond to C++ disjoint types: #1609, #1611, #1612

Fixed

Fixed a crash on Windows platform during execution of getter of dpctl.SyclPlatfom.default_context property: : #1604
Fixed kernel submission error on NVidia CUDA GPUs during dpctl.tensor.matmul operation: #1605
Fixed corruption of context cache table entries: #1607
Fixed incorrect result from dpctl.tensor.tensordot reported in issue #1570: #1608
Fixed output of python -m dpctl --library to fix specified library name: #1615

Assets 4

28 Mar 02:59

oleksandr-pavlyk

0.16.0

6efb2c9

v0.16.0

This release is virtually identical to 0.15.1 as far as features are concerned.

This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.

Assets 4

10 Feb 21:51

oleksandr-pavlyk

0.15.1

94fc707

v0.15.1

Summary

This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.

Added

Added reduction functions dpctl.tensor.min, dpctl.tensor.max, dpctl.tensor.argmin, dpctl.tensor.argmax, and dpctl.tensor.prod per Python Array API specifications: #1399
Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of dpctl.tensor.usm_ndarray type: #1431, #1447
Added new elementwise functions dpctl.tensor.cbrt, dpctl.tensor.rsqrt, dpctl.tensor.exp2, dpctl.tensor.copysign, dpctl.tensor.angle, and dpctl.tensor.reciprocal: #1443, #1474
Added statistical functions dpctl.tensor.mean, dpctl.tensor.std, dpctl.tensor.var per Python Array API specifications: #1465
Added sorting functions dpctl.tensor.sort and dpctl.tensor.argsort, and set functions dpctl.tensor.unique_values, dpctl.tensor.unique_counts, dpctl.tensor.unique_inverse, dpctl.tensor.unique_all: #1483
Added linear algebra functions from the Array API namespace dpctl.tensor.matrix_transpose, dpctl.tensor.matmul, dpctl.tensor.vecdot, and dpctl.tensor.tensordot: #1490, #1525, #1541
Added dpctl.tensor.clip function: #1444, #1505
Added custom reduction functions dpt.logsumexp (reduction using binary function dpctl.tensor.logaddexp), dpt.reduce_hypot (reduction using binary function dpctl.tensor.hypot): #1446
Added inspection API to query capabilities of Python Array API specification implementation: #1469
Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
Added dpctl.utils.intel_device_info function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445
Added support for two new device descriptors, dpctl.SyclDevice.max_mem_alloc_size and dpctl.SyclDevice.max_clock_frequency: #1530

Changed

Functions dpctl.tensor.result_type and dpctl.tensor.can_cast became device-aware: #1488, #1473
Implementation of method dpctl.SyclEvent.wait_for changed to use sycl::event::wait instead of sycl::event::wait_and_throw: gh-1436
dpctl.tensor.astype was changed to support device keyword as per Python Array API specification: #1511
C++ header files in libtensor/include/kernels containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516

Fixed

Fixed issues with dpctl.tensor.repeat support for axis keyword: #1427, #1433
Fix for gh-1503 for bug usm_ndarray.__setitem__: #1504
Other bug fixes: #1485, #1477, #1512

Assets 4

29 Sep 16:06

oleksandr-pavlyk

0.15.0

5bd924e

v0.15.0

Summary

The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray object now implements all special Python operators, except __matmul__ and __rmatmul__.

The dpctl.tensor increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).

Details

Added

Added dpctl.tensor.floor, dpctl.tensor.ceil, dpctl.tensor.trunc elementwise functions.
Added dpctl.tensor.hypot, dpctl.tensor.logaddexp elementwise functions.
Added trigonometric (dpctl.tensor.sin, dpctl.tensor.cos, dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh, dpctl.tensor.cosh, dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin, dpctl.tensor.asinh, dpctl.tensor.acos, dpctl.tensor.acosh, dpctl.tensor.atan, dpctl.tensor.atanh).
Added dpctl.tensor.round function.
Added dpctl.tensor.sign and dpctl.tensor.remainder elementwise functions.
Added bitwise elementwise functions dpctl.tensor.bitwise_and, dpctl.tensor.bitwise_xor, dpctl.tensor.bitwise_or, dpctl.tensor.bitwise_invert
Added bitwise shift functions dpctl.tensor.bitwise_left_shift and dpctl.tensor.bitwise_right_shift.
Added dpctl.tensor.atan2 and dpctl.tensor.signbit elementwise functions.
Added dpctl.tensor.minumum and dpctl.tensor.maximum binary elementwise functions.
Supported equality checking and hashing for dpctl.SyclPlatform.
Implemented types property for all unary and binary elementwise functions #1361
Added dpctl.tensor.repeat and dpctl.tensor.tile functions.
Added dpctl.tensor.matrix_transpose function.

Changed

Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for dpctl.tensor.usm_ndarray type #1324.
Removed dpctl.tensor.numpy_usm_shared obsolete class and associated tests which were being skipped #1310
Transitioned dpctl codebase to Cython 3.
Improved performance of boolean reduction functions dpctl.tensor.all and dpctl.tensor.any.
Improved performance of summation function dpctl.tensor.sum.
Improved in-place arithmetic operations for addition, subtraction and multiplication.
Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
Removed deprecated DPCTLDevice_GetMaxWorkItemSizes function from the SyclInterface library.
Improved performance of dpctl.tensor.reshape in the case when a copy is being made.
Improved performance of dpctl.tensor.roll function.

Fixed

Fixed issues identified by Coverity security scans.
Fixed issues #1279, #1350, #1344, #1327, #1241, #1250, #1293.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added

Changed

Fixed

Maintenance

Contributors

Added

Change

Fixed

Maintenance

Added

Changed

Fixed

New Contributors

Contributors

Changed

Fixed

Summary

Added

Changed

Fixed

Summary

Details

Added

Changed

Fixed

Releases: IntelPython/dpctl

v0.19.0

Added

Changed

Fixed

Maintenance

Contributors

v0.18.3

v0.18.2

v0.18.1

v0.18.0

Added

Change

Fixed

Maintenance

0.17.0

Added

Changed

Fixed

New Contributors

Contributors

v0.16.1

Changed

Fixed

v0.16.0

v0.15.1

Summary

Added

Changed

Fixed

v0.15.0

Summary

Details

Added

Changed

Fixed