[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix #13153

dkhaldi · 2024-03-25T20:53:41Z

Currently, CUDA code that use this pattern:
for (int i = 0; i < c_frag.num_elements; i++) {
c_frag.x[i] = alpha * acc_frag.x[i] + beta * c_frag.x[i];
}
cannot be migrated to SYCL joint matrix.
This added overload addresses this limitation.

…o be able to return result into a different matrix

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

dkhaldi · 2024-03-29T15:30:03Z

@gmlueck, can you please review this?

…able to return result into a different matrix (#13151) Currently, CUDA code that use this pattern: for (int i = 0; i < c_frag.num_elements; i++) { c_frag.x[i] = alpha * acc_frag.x[i] + beta * c_frag.x[i]; } cannot be migrated to SYCL joint matrix. This added overload addresses this. Spec API is added here #13153

gmlueck · 2024-04-01T17:23:14Z

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

+joint_matrix_apply(sg, C, D, [=](const T &x, T &y) {
+    y = x * alpha;
+});
+```


I don't see a problem with this API, but I wonder if it should be more general. For example:

Is there a reason that one matrix must be the "source" and the other the "destination"? Is it hard to support the case where the lambda writes to elements in both matrices?

I think it makes sense that Rows and Cols must be the same for both matrices. What about Use and Layout, though? Is there a reason these can't be different for the two matrices? What about the type T? Could that be different?

Is there a reason to limit this to just two matrices? You could imagine this API taking a parameter pack, allowing an arbitrary number of matrices.

@gmlueck,

I relaxed the definition to not require read or write assumptions. Both matrices can be either read or written into.

I think it is safer to require the same use and layout. To change the use or layout, joint_matrix_copy can be used https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc#copy.
T should be the same. This is because there are different casts that can be used depending on T. It is better to have the user specify the right cast in the lambda.

Using a parameter pack will require the lambda to also use a parameter pack. Also, how do we pack the wi_data elements and pass them to the lambda (see line 22 in https://godbolt.org/z/E1sT5oY3P).
I don' think passing a variadic number will be useful as the user can implement any number of arguments with the two arguments version. As an optimization, we can also add a three arguments version to avoid using intermediate matrices in the case of ternary operation like C=A*B.

What do you think?

I think it is safer to require the same use and layout. To change the use or layout, joint_matrix_copy can be used https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc#copy.
T should be the same. This is because there are different casts that can be used depending on T. It is better to have the user specify the right cast in the lambda.

I don't understand why it is "safer" to require use, layout, and T to be the same for the two matrices. In what sense is it safer? What use cases do you have in mind for the form that takes two matrices? For these use cases, will it always be the case that use, layout, and T are the same?

Maybe there is an implementation problem. The current implementation probably assumes that the distribution of matrix elements across work-items will be the same for the two matrices. For example, I suspect your implementation assumes that wi_data[j] for work-item I has the same Row / Col coordinate in both the A and B matrices. Does this assumption become invalidated if use, layout, or T are different for A and B?

Correct, if use, layout and T are different, there is no guarantee work items will have same number of data.
Yes, the CUDA use cases suggest use, layout and T are the same.

OK.

You should clarify the description:

Clarify that x is an element from jm0 and y is an element from jm1.

Clarify that x and y are guaranteed to have identical coordinates in their respective matrices.

@gmlueck, as a correction, if use and layout are different, there is not guarantee of ownership.
So changing type should be fine.
I will relax the type restriction to enable the use cases where conversion with extra arguments is needed.

AlexeySachkov · 2024-09-09T13:16:05Z

@dkhaldi, a friendly reminder about this PR. Without it, we have an undocumented public APIs in our implementation (#13151), which is never a good idea.

dkhaldi · 2024-09-09T15:31:12Z

@dkhaldi, a friendly reminder about this PR. Without it, we have an undocumented public APIs in our implementation (#13151), which is never a good idea.

@AlexeySachkov, Yes it is on my TODO list. I will give it higher priority to work on it this week

…n with extra arguments

gmlueck · 2024-09-18T19:13:35Z

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

+`jm0` and `jm1` that have the same `use`, number of rows, number of
+columns, and `layout`. `jm0` and `jm1` can be read-only, write-only,
+or read and write arguments. The callable object must be invocable
+with two parameters `x` and `y` of type `T&` where `x` is an element


Suggested change

with two parameters `x` and `y` of type `T&` where `x` is an element

with two parameters `x` and `y` of types `T0&` and `T1&`, where `x` is an element

[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply t…

3dbc43b

…o be able to return result into a different matrix

dkhaldi requested a review from a team as a code owner March 25, 2024 20:53

dkhaldi mentioned this pull request Mar 25, 2024

[SYCL][Joint Matrix] Add a new overload for joint_matrix_apply to be … #13151

Merged

YuriPlyakhin reviewed Mar 26, 2024

View reviewed changes

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc Outdated Show resolved Hide resolved

add clarification about the fact that the input matrix is read only

51f0a90

gmlueck reviewed Apr 1, 2024

View reviewed changes

dkhaldi added 3 commits September 16, 2024 07:53

Relaxed joint_matrix_apply to not require read or write assumptions

5d0e968

More clarifications on the coordinates of the apply elements

1289c56

Relax the type restriction so this API can be used for type conversio…

0d1b7b5

…n with extra arguments

gmlueck reviewed Sep 18, 2024

View reviewed changes

fix T reference

0174f5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix #13153

[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix #13153

dkhaldi commented Mar 25, 2024

dkhaldi commented Mar 29, 2024

gmlueck Apr 1, 2024

dkhaldi Sep 16, 2024 •

edited

Loading

gmlueck Sep 16, 2024

dkhaldi Sep 17, 2024

gmlueck Sep 17, 2024

dkhaldi Sep 18, 2024

AlexeySachkov commented Sep 9, 2024

dkhaldi commented Sep 9, 2024

gmlueck Sep 18, 2024

	with two parameters `x` and `y` of type `T&` where `x` is an element
	with two parameters `x` and `y` of types `T0&` and `T1&`, where `x` is an element

[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix #13153

Are you sure you want to change the base?

[SYCL Spec][Joint Matrix] Add a new overload for joint_matrix_apply to be able to return result into a different matrix #13153

Conversation

dkhaldi commented Mar 25, 2024

dkhaldi commented Mar 29, 2024

gmlueck Apr 1, 2024

Choose a reason for hiding this comment

dkhaldi Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

gmlueck Sep 16, 2024

Choose a reason for hiding this comment

dkhaldi Sep 17, 2024

Choose a reason for hiding this comment

gmlueck Sep 17, 2024

Choose a reason for hiding this comment

dkhaldi Sep 18, 2024

Choose a reason for hiding this comment

AlexeySachkov commented Sep 9, 2024

dkhaldi commented Sep 9, 2024

gmlueck Sep 18, 2024

Choose a reason for hiding this comment

dkhaldi Sep 16, 2024 •

edited

Loading