Skip to content

Conversation

@Yiltan
Copy link
Collaborator

@Yiltan Yiltan commented Nov 12, 2025

Motivation

  • Allow each thread to write to its own QP
  • Allow for the execution of alltoall/barrier in a single warp

Conflicts with #318, the two PRs need to be merged in some order

const T *src, int nelems) {
alltoall_linear_thread_puts(team, dst, src, nelems);
if (gda_provider_ == GDAProvider::BNXT) {
alltoall_linear_thread_puts(team, dst, src, nelems);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not applicable for other NICs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should work on IONIC (untested, so I've left it out)
  • There is a pretty major PR in the works for CX7, adding the logic to support multiple threads writing into their own QP would create a major conflict for them. It would be best to add this support once the other PR is merged

@gaoikawa gaoikawa changed the title [GDA] Alltoall optmization - single warp [GDA] Alltoall optimization - single warp Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants