Skip to content

Commit 85da64b

Browse files
Copilotleofangkkraus14
authored
Implement release threshold configuration for DeviceMemoryResource performance optimization (#875)
* Initial plan * Implement release threshold configuration for DeviceMemoryResource performance optimization Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add performance demo for DeviceMemoryResource release threshold optimization Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Fix linting issues and format code with ruff Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Remove try-except wrapper and performance demo per code review feedback Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add release note for DeviceMemoryResource performance optimization Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add skip decorator for mempool support check in device memory test Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Address code review feedback: move skip logic, add docstring note, update release note Co-authored-by: kkraus14 <3665167+kkraus14@users.noreply.github.com> * Remove verbose docstring Notes section per code review feedback Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: kkraus14 <3665167+kkraus14@users.noreply.github.com>
1 parent ee16510 commit 85da64b

File tree

3 files changed

+45
-1
lines changed

3 files changed

+45
-1
lines changed

cuda_core/cuda/core/experimental/_memory.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -326,6 +326,23 @@ def __init__(self, device_id: int):
326326
self._handle = handle_return(driver.cuDeviceGetMemPool(device_id))
327327
self._dev_id = device_id
328328

329+
# Set a higher release threshold to improve performance when there are no active allocations.
330+
# By default, the release threshold is 0, which means memory is immediately released back
331+
# to the OS when there are no active suballocations, causing performance issues.
332+
# Check current release threshold
333+
current_threshold = handle_return(
334+
driver.cuMemPoolGetAttribute(self._handle, driver.CUmemPool_attribute.CU_MEMPOOL_ATTR_RELEASE_THRESHOLD)
335+
)
336+
# If threshold is 0 (default), set it to maximum to retain memory in the pool
337+
if int(current_threshold) == 0:
338+
handle_return(
339+
driver.cuMemPoolSetAttribute(
340+
self._handle,
341+
driver.CUmemPool_attribute.CU_MEMPOOL_ATTR_RELEASE_THRESHOLD,
342+
driver.cuuint64_t(0xFFFFFFFFFFFFFFFF),
343+
)
344+
)
345+
329346
def allocate(self, size: int, stream: Stream = None) -> Buffer:
330347
"""Allocate a buffer of the requested size.
331348

cuda_core/docs/source/release/0.X.Y-notes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,5 @@ None.
3636
Fixes and enhancements
3737
----------------------
3838

39+
- Improved :class:`DeviceMemoryResource` allocation performance when there are no active allocations by setting a higher release threshold (addresses issue #771).
3940
- Fix :class:`LaunchConfig` grid unit conversion when cluster is set (addresses issue #867).

cuda_core/tests/test_memory.py

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
import pytest
1212

13-
from cuda.core.experimental import Buffer, Device, MemoryResource
13+
from cuda.core.experimental import Buffer, Device, DeviceMemoryResource, MemoryResource
1414
from cuda.core.experimental._memory import DLDeviceType
1515
from cuda.core.experimental._utils.cuda_utils import handle_return
1616

@@ -257,3 +257,29 @@ def test_buffer_dunder_dlpack_device_failure():
257257
buffer = dummy_mr.allocate(size=1024)
258258
with pytest.raises(BufferError, match=r"^buffer is neither device-accessible nor host-accessible$"):
259259
buffer.__dlpack_device__()
260+
261+
262+
def test_device_memory_resource_initialization():
263+
"""Test that DeviceMemoryResource can be initialized successfully.
264+
265+
This test verifies that the DeviceMemoryResource initializes properly,
266+
including the release threshold configuration for performance optimization.
267+
"""
268+
device = Device()
269+
if not device.properties.memory_pools_supported:
270+
pytest.skip("memory pools not supported")
271+
device.set_current()
272+
273+
# This should succeed and configure the memory pool release threshold
274+
mr = DeviceMemoryResource(device.device_id)
275+
276+
# Verify basic properties
277+
assert mr.device_id == device.device_id
278+
assert mr.is_device_accessible is True
279+
assert mr.is_host_accessible is False
280+
281+
# Test allocation/deallocation works
282+
buffer = mr.allocate(1024)
283+
assert buffer.size == 1024
284+
assert buffer.device_id == device.device_id
285+
buffer.close()

0 commit comments

Comments
 (0)