Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix #13153

Open
wants to merge 6 commits into
base: sycl
Choose a base branch
from

Conversation

dkhaldi
Copy link
Contributor

@dkhaldi dkhaldi commented Mar 25, 2024

Currently, CUDA code that use this pattern:
for (int i = 0; i < c_frag.num_elements; i++) {
c_frag.x[i] = alpha * acc_frag.x[i] + beta * c_frag.x[i];
}
cannot be migrated to SYCL joint matrix.
This added overload addresses this limitation.

…o be able to return result into a different matrix
@dkhaldi
Copy link
Contributor Author

dkhaldi commented Mar 29, 2024

@gmlueck, can you please review this?

sarnex pushed a commit that referenced this pull request Mar 29, 2024
…able to return result into a different matrix (#13151)

Currently, CUDA code that use this pattern:
  for (int i = 0; i < c_frag.num_elements; i++) {
    c_frag.x[i] = alpha * acc_frag.x[i] + beta * c_frag.x[i];
  }
cannot be migrated to SYCL joint matrix.
This added overload addresses this.
Spec API is added here #13153
joint_matrix_apply(sg, C, D, [=](const T &x, T &y) {
y = x * alpha;
});
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with this API, but I wonder if it should be more general. For example:

  • Is there a reason that one matrix must be the "source" and the other the "destination"? Is it hard to support the case where the lambda writes to elements in both matrices?

  • I think it makes sense that Rows and Cols must be the same for both matrices. What about Use and Layout, though? Is there a reason these can't be different for the two matrices? What about the type T? Could that be different?

  • Is there a reason to limit this to just two matrices? You could imagine this API taking a parameter pack, allowing an arbitrary number of matrices.

Copy link
Contributor Author

@dkhaldi dkhaldi Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck,

  • I relaxed the definition to not require read or write assumptions. Both matrices can be either read or written into.

  • I think it is safer to require the same use and layout. To change the use or layout, joint_matrix_copy can be used https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc#copy.
    T should be the same. This is because there are different casts that can be used depending on T. It is better to have the user specify the right cast in the lambda.

  • Using a parameter pack will require the lambda to also use a parameter pack. Also, how do we pack the wi_data elements and pass them to the lambda (see line 22 in https://godbolt.org/z/E1sT5oY3P).
    I don' think passing a variadic number will be useful as the user can implement any number of arguments with the two arguments version. As an optimization, we can also add a three arguments version to avoid using intermediate matrices in the case of ternary operation like C=A*B.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why it is "safer" to require use, layout, and T to be the same for the two matrices. In what sense is it safer? What use cases do you have in mind for the form that takes two matrices? For these use cases, will it always be the case that use, layout, and T are the same?

Maybe there is an implementation problem. The current implementation probably assumes that the distribution of matrix elements across work-items will be the same for the two matrices. For example, I suspect your implementation assumes that wi_data[j] for work-item I has the same Row / Col coordinate in both the A and B matrices. Does this assumption become invalidated if use, layout, or T are different for A and B?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, if use, layout and T are different, there is no guarantee work items will have same number of data.
Yes, the CUDA use cases suggest use, layout and T are the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

You should clarify the description:

  • Clarify that x is an element from jm0 and y is an element from jm1.
  • Clarify that x and y are guaranteed to have identical coordinates in their respective matrices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck, as a correction, if use and layout are different, there is not guarantee of ownership.
So changing type should be fine.
I will relax the type restriction to enable the use cases where conversion with extra arguments is needed.

@AlexeySachkov
Copy link
Contributor

@dkhaldi, a friendly reminder about this PR. Without it, we have an undocumented public APIs in our implementation (#13151), which is never a good idea.

@dkhaldi
Copy link
Contributor Author

dkhaldi commented Sep 9, 2024

@dkhaldi, a friendly reminder about this PR. Without it, we have an undocumented public APIs in our implementation (#13151), which is never a good idea.

@AlexeySachkov, Yes it is on my TODO list. I will give it higher priority to work on it this week

`jm0` and `jm1` that have the same `use`, number of rows, number of
columns, and `layout`. `jm0` and `jm1` can be read-only, write-only,
or read and write arguments. The callable object must be invocable
with two parameters `x` and `y` of type `T&` where `x` is an element
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with two parameters `x` and `y` of type `T&` where `x` is an element
with two parameters `x` and `y` of types `T0&` and `T1&`, where `x` is an element

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants