Skip to content

Refactor WarpExchangeShfl#8183

Draft
bernhardmgruber wants to merge 2 commits intoNVIDIA:mainfrom
bernhardmgruber:ref_WarpExchangeShfl
Draft

Refactor WarpExchangeShfl#8183
bernhardmgruber wants to merge 2 commits intoNVIDIA:mainfrom
bernhardmgruber:ref_WarpExchangeShfl

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented Mar 26, 2026

  • SASS check

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner March 26, 2026 09:57
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 26, 2026
@bernhardmgruber bernhardmgruber marked this pull request as draft March 26, 2026 09:57
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Mar 26, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 26, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 1h 46m: Pass: 16%/249 | Total: 7d 07h | Max: 1h 45m | Hits: 72%/61152

See results here.

CompileTimeArray<OutputT, 0, ITEMS_PER_THREAD> arr{input_items, output_items};
arr.Transpose(lane_id, member_mask);
InputT vals[ITEMS_PER_THREAD];
for (int i = 0; i < ITEMS_PER_THREAD; i++)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are commonly adding

Suggested change
for (int i = 0; i < ITEMS_PER_THREAD; i++)
_CCCL_PRAGMA_UNROLL_FULL()
for (int i = 0; i < ITEMS_PER_THREAD; i++)


transpose<ITEMS_PER_THREAD / 2>(lane_id, member_mask);

for (int i = 0; i < ITEMS_PER_THREAD; i++)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int i = 0; i < ITEMS_PER_THREAD; i++)
_CCCL_PRAGMA_UNROLL_FULL()
for (int i = 0; i < ITEMS_PER_THREAD; i++)

, val{input_items[IDX]}
{}
constexpr int next_idx = IDX + 1 + ((IDX + 1) % NUM_ENTRIES == 0) * NUM_ENTRIES;
transpose_foreach<next_idx, NUM_ENTRIES>(vals, xor_bit_set, mask);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am stupid, but how does this function terminate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are not. I indeed missed the termination condition. Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants