Feat/multi qps per endpoint #318

avinashkethineedi · 2025-11-12T01:39:40Z

Motivation

This PR adds support for configurable QP allocation per PE in the GDA context.
It introduces environment variables to control the number of QPs for the default and user contexts.
Previously, the number of QPs per PE was fixed to 1.

Technical Details

Environment Variables

Added two new environment variables to configure QP allocation:
- ROCSHMEM_GDA_NUM_QPS_PER_PE_DEFAULT_CTX: Number of QPs per PE in the default context.
- ROCSHMEM_GDA_NUM_QPS_PER_PE_USR_CTX: Number of QPs per PE in each user context.
Introduced get_qp_index() device method to compute QP index using atomic counter
for round-robin access.

Test Plan

Verify correctness by initializing multiple contexts with varying environment variable values.
Verifiy functionality and correctness across all three NIC types: mlx, bnxt, and ionic.
Measure and validate performance improvements with configurable QP allocation under different workloads.

Test Result

Submission Checklist

Code compiles successfully
All relevant unit tests pass
Verified behavior under multiple context configurations
Verified functionality and performance improvements with mlx
Verified functionality and performance improvements with bnxt
Verified functionality and performance improvements with ionic

- `ROCSHMEM_GDA_NUM_QPS_PER_PE_DEFAULT_CTX` to control the number of QPs per PE in the default context. - `ROCSHMEM_GDA_NUM_QPS_PER_PE_USR_CTX` to control the number of QPs per PE in each user context.

- Added per-context QP allocation logic using environment variables - Added `get_qp_index(int pe)` to compute QP index using atomic counter for round-robin access.

…d update RMA/atomic APIs - Replaced per-thread atomic fetch with warp-synchronous logic using `__match_any_sync` and `__shfl_sync` to group threads targeting the same PE. - Only the leader lane performs the atomic increment, reducing contention. - Broadcasts the computed QP index to all participating lanes for efficiency. - Updated RMA and atomic APIs to use the new warp-synchronized QP indexing.

…ddr calculation

avinashkethineedi added 5 commits November 11, 2025 18:50

Add environment variables for QPs per PE in default and user contexts

a262075

- `ROCSHMEM_GDA_NUM_QPS_PER_PE_DEFAULT_CTX` to control the number of QPs per PE in the default context. - `ROCSHMEM_GDA_NUM_QPS_PER_PE_USR_CTX` to control the number of QPs per PE in each user context.

Compute total QPs per PE and overall QPs from environment variables.

32fb119

Update GDAContext to support multiple QPs per PE and dynamic QP indexing

acad370

- Added per-context QP allocation logic using environment variables - Added `get_qp_index(int pe)` to compute QP index using atomic counter for round-robin access.

build: enable HIP warp sync builtins via compile definitions

514df7a

Yiltan mentioned this pull request Nov 13, 2025

[GDA] Alltoall optimization - single warp #319

Open

GDA: update put signal ops with new QP indexing and separate remote a…

4cea42d

…ddr calculation

avinashkethineedi marked this pull request as ready for review November 13, 2025 20:09

avinashkethineedi requested review from BKP, Yiltan, abouteiller, akolliasAMD, edgargabriel, gaoikawa and omor1 as code owners November 13, 2025 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/multi qps per endpoint #318

Feat/multi qps per endpoint #318

avinashkethineedi commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat/multi qps per endpoint #318

Are you sure you want to change the base?

Feat/multi qps per endpoint #318

Conversation

avinashkethineedi commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Environment Variables

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

avinashkethineedi commented Nov 12, 2025 •

edited

Loading