Skip to content

UCT/CUDA/CUDA_IPC: cuCtxPushCurrent before cuIpcCloseMemHandle. #10618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 11, 2025

Conversation

rakhmets
Copy link
Contributor

@rakhmets rakhmets commented Apr 9, 2025

What?

Push the primary device context before calling cuIpcCloseMemHandle.

Why?

There is a warning if there is no CUDA context bound to the current CPU thread while closing memory handle.

cuda_ipc_cache.c:132  UCX  WARN  cuIpcCloseMemHandle( (CUdeviceptr)region->mapped_addr) failed: invalid device context

Comment on lines 163 to 168
if (status == UCS_OK) {
return UCT_CUDADRV_FUNC_LOG_WARN(cuMemAddressFree(
status = UCT_CUDADRV_FUNC_LOG_WARN(cuMemAddressFree(
(CUdeviceptr)region->mapped_addr, region->key.b_len));
}

return status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe smth like

status = UCT_CUDADRV_FUNC_LOG_WARN(cuMemUnmap(...));
if (status != UCS_OK) {
    return status;
}

return UCT_CUDADRV_FUNC_LOG_WARN(cuMemAddressFree(...));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

ucs_status_t status;

status = uct_cuda_primary_ctx_retain(cuda_device, 1, &cuda_ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: imo no need to force, as primary context is supposed to be active

Copy link
Contributor Author

@rakhmets rakhmets Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it forced in uct_cuda_ipc_open_memhandle_legacy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably leftover from the time we thought we need this focring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 0.

brminich
brminich previously approved these changes Apr 10, 2025
@yosefe yosefe enabled auto-merge April 10, 2025 16:59
@rakhmets rakhmets force-pushed the topic/cuda-ipc-close-memhandle-fix branch from ed1dc49 to 9a52b3c Compare April 10, 2025 17:08
@yosefe yosefe merged commit d2bd26a into openucx:master Apr 11, 2025
151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants