Skip to content

cuda.core: release notes update #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 15, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions cuda_core/docs/source/release/0.2.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,30 @@ New features
- Expose :class:`ObjectCode` as a public API, which allows loading cubins from memory or disk. For loading other kinds of code types, please continue using :class:`Program`.
- A C++ helper function ``get_cuda_native_handle()`` is provided in the new ``include/utility.cuh`` header to retrive the underlying CUDA C objects (ex: ``CUstream``) from a Python object returned by the ``.handle`` attribute (ex: :attr:`Stream.handle`).
- For objects such as :class:`Program` and :class:`Linker` that could dispatch to different backends, a new ``.backend`` attribute is provided to query this information.
- Support CUDA event timing.
- An :class:`~_event.Event` may now be created without recording it to a :class:`~_stream.Stream` using the :meth:`Device.create_event` method.
- Support CUDA :class:`Event` timing. (#481, #498, #508)
- An :class:`Event` may now be created without recording it to a :class:`~_stream.Stream` using the :meth:`Device.create_event` method.
- :class:`Program` now supports the additional ``PTX`` code type. (#317)
- :meth:`Linker.link` exceptions now include the original error log. (#423)
- In a systematic sweep through the cuda.core implementations, many exceptions messages were made more consistent and informative. (#458)

New examples
------------
- ``jit_lto_fractal.py`` — Demonstrates just-in-time link-time optimization for fractal generation. (:class:`Device`, :class:`LaunchConfig`, :class:`Linker`, :class:`LinkerOptions`, :class:`Program`, :class:`ProgramOptions`) (#475)
- ``simple_multi_gpu_example.py`` — Example of using multiple GPUs. (:class:`Device`, :class:`Program`, :class:`LaunchConfig`) (#304)
- ``show_device_properties.py`` — Displays detailed device properties. (:class:`Device`) (#474)

Minor fixes and enhancements
----------------------------
- A dangling pointer problem in ``_linker.py`` was fixed. (#516)
- Add ``@functools.lru_cache`` decorator for :func:`get_binding_version`. (#512)
- Selected ``.decode()`` were changed to ``.decode("utf-8", errors="backslashreplace")`` to ensure that decoding error messages does not abort the process. (#510)
- The performance of :meth:`Device.compute_capability` was improved. (#459)
- The :class:`Program` constructor now issues a warning when falling back to :func:`cuLink`. (#315)
- To avoid deprecation warnings, the cuda.bindings imports in the cuda.core implementations were cleaned up. (#404)

Test fixes
----------
- Clean up device initialization in some tests. (#507)

Limitations
-----------
Expand Down