[WIP][UR][CUDA][TEST] Add P2P initialization to multi-device test by kekaczma · Pull Request #21311 · intel/llvm

kekaczma · 2026-02-18T14:13:07Z

Initialize P2P access between device pairs in
urEnqueueKernelLaunchIncrementMultiDeviceTest to enable cross-device USM memcpy operations on CUDA.

Add urUsmP2PEnablePeerAccessExp calls in SetUp()
Add urUsmP2PDisablePeerAccessExp calls in TearDown()
Skip P2P for duplicate device handles (single GPU case)
Handle already-enabled and unsupported device pairs

This is a test commit to validate the fix on multi-GPU hardware.

Fix CUDA adapter to properly map P2P access errors: - Map CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED to UR_RESULT_ERROR_INVALID_OPERATION - Map CUDA_ERROR_PEER_ACCESS_NOT_ENABLED to UR_RESULT_ERROR_INVALID_OPERATION Initialize P2P access in urEnqueueKernelLaunchIncrementMultiDeviceTest: - Add urUsmP2PEnablePeerAccessExp calls in SetUp() for cross-device memcpy - Add urUsmP2PDisablePeerAccessExp calls in TearDown() for cleanup - Skip P2P operations for duplicate device handles (single GPU case) - Accept INVALID_OPERATION for already-enabled or unsupported pairs This fixes test failures on multi-GPU CUDA systems where P2P must be explicitly enabled before cross-device USM memory operations. Fixes #19033

Changes: - Track enabled P2P pairs in member variable enabledP2PPairs - SetUp: Only record pairs WE successfully enabled (both SUCCESS) - TearDown: Disable P2P bidirectionally for our pairs, ignore errors - Removes global P2P state dependency between test instances Works for both: - 2 physical GPUs duplicated 4× (8 logical devices) - 8 distinct physical GPUs Fixes #19033

The CUDA adapter was using cuMemcpyAsync() for all USM memory copies, including cross-device copies. However, CUDA requires cuMemcpyPeerAsync() for peer-to-peer copies between different devices, even when P2P access is enabled via cuCtxEnablePeerAccess(). This change: - Detects cross-device copies by querying CU_POINTER_ATTRIBUTE_CONTEXT for both source and destination pointers - Uses cuMemcpyPeerAsync() when contexts differ (cross-device copy) - Falls back to cuMemcpyAsync() for same-device or host-device copies This fixes the urEnqueueKernelLaunchIncrementMultiDeviceTest which chains kernel launches and cross-device memcpy operations. Fixes: #19033

In single-context multi-device setup on CUDA, pointer attributes cannot reliably distinguish cross-device copies because all allocations share the same CUDA context and may report device ordinal 0. Solution: When context has >1 device, try cuMemcpyPeerAsync for all device pairs until one succeeds. Falls back to cuMemcpyAsync if none work or if single-device context. This is a workaround - proper solution would track allocation metadata.

kekaczma changed the title ~~[UR][CUDA][TEST] Add P2P initialization to multi-device test~~ [WIP][UR][CUDA][TEST] Add P2P initialization to multi-device test Feb 18, 2026

kekaczma force-pushed the multi-device-test branch from 6adf0fa to bdf212d Compare February 18, 2026 15:46

kekaczma temporarily deployed to WindowsCILock February 18, 2026 15:47 — with GitHub Actions Inactive

kekaczma had a problem deploying to WindowsCILock February 18, 2026 16:12 — with GitHub Actions Error

kekaczma had a problem deploying to WindowsCILock February 18, 2026 16:12 — with GitHub Actions Failure

kekaczma temporarily deployed to WindowsCILock February 18, 2026 16:12 — with GitHub Actions Inactive

kekaczma force-pushed the multi-device-test branch from bdf212d to 8218555 Compare February 18, 2026 16:50

kekaczma temporarily deployed to WindowsCILock February 18, 2026 16:50 — with GitHub Actions Inactive

kekaczma temporarily deployed to WindowsCILock February 18, 2026 17:15 — with GitHub Actions Inactive

kekaczma temporarily deployed to WindowsCILock February 18, 2026 18:37 — with GitHub Actions Inactive

kekaczma had a problem deploying to WindowsCILock February 18, 2026 19:07 — with GitHub Actions Error

kekaczma had a problem deploying to WindowsCILock February 18, 2026 19:07 — with GitHub Actions Failure

kekaczma force-pushed the multi-device-test branch from eb4568d to 06a8429 Compare February 18, 2026 19:12

kekaczma temporarily deployed to WindowsCILock February 18, 2026 19:13 — with GitHub Actions Inactive

kekaczma temporarily deployed to WindowsCILock February 18, 2026 19:38 — with GitHub Actions Inactive

kekaczma had a problem deploying to WindowsCILock February 18, 2026 19:38 — with GitHub Actions Failure

kekaczma temporarily deployed to WindowsCILock February 20, 2026 08:59 — with GitHub Actions Inactive

kekaczma temporarily deployed to WindowsCILock February 20, 2026 09:21 — with GitHub Actions Inactive

kekaczma had a problem deploying to WindowsCILock February 20, 2026 09:21 — with GitHub Actions Failure

kekaczma temporarily deployed to WindowsCILock February 20, 2026 10:16 — with GitHub Actions Inactive

kekaczma temporarily deployed to WindowsCILock February 20, 2026 10:37 — with GitHub Actions Inactive

kekaczma had a problem deploying to WindowsCILock February 20, 2026 10:37 — with GitHub Actions Failure

kekaczma temporarily deployed to WindowsCILock February 20, 2026 10:37 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[WIP][UR][CUDA][TEST] Add P2P initialization to multi-device test#21311

[WIP][UR][CUDA][TEST] Add P2P initialization to multi-device test#21311
kekaczma wants to merge 4 commits intosyclfrom
multi-device-test

kekaczma commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

kekaczma commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kekaczma commented Feb 18, 2026 •

edited

Loading