Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension proposal for command-buffer internal buffer property #1233

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

linehill
Copy link

@linehill linehill commented Aug 22, 2024

This patch proposes a new buffer creation property for creating internal buffers for command-buffers.

HTML is rendered here.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@linehill
Copy link
Author

cc @pjaaskel.

Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//include::{generated}/api/api-dictionary.asciidoc[]
:source-highlighter: coderay

= cl_khr_command_buffer_internal_buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having "buffer" twice in the name looks funny. Perhaps "data" or "storage"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also best not use 'khr' before it's khr. Let's use 'ext' or 'exp' for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the extension.

command-buffers are not executed and reallocate them when needed.

* reduce memory usage by sharing data storage among the internal
buffers. In C analogy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need the "C analogy" as the idea is pretty clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the analogy.

}
----
* fuse kernels together as intermediate results do not need to be
preserved. In C analogy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. The idea is clear.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the analogy.

convolutionPlusReluKernel(in, w, out);
}
----

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention about the OpenVX counterpart, just for reference?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added reference to OpenVX.

is a buffer object or references a buffer object created with
*CL_MEM_COMMAND_BUFFER_INTERNAL_KHR* property.

// "references a buffer": E.g. sub-buffers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow sub-buffers to be created of these internal buffers? I wonder if there's a use case for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, we want to because OpenVX supports sub-objects of virtual objects.


== Issues

. Should we add memory object query for returning the associated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added query.

*UNRESOLVED*
--

. Should we add a command-buffer query for returning total internal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not. I'd leave this as an implementation internal.

@pjaaskel
Copy link
Contributor

pjaaskel commented Aug 22, 2024

Interesting proposal thanks, has similarities to the SYCL extension https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_graph_fusion.asciidoc

Good that you pointed the SYCL's graph extension out. I'm not familiar with it. So far we have looked mostly on ML/AI formats, OpenVX and CUDA/HIP (task) graphs. Do you think cmdbuf extended with this simple extension can be used to implement the SYCL graph and the fusion extension here? Do you see gaps that we cannot yet fill efficiently?

@EwanC
Copy link
Contributor

EwanC commented Aug 22, 2024

Good that you pointed the SYCL's graph extension out. I'm not familiar with it. So far we have looked mostly on ML/AI formats, OpenVX and CUDA/HIP (task) graphs. Do you think cmdbuf extended with this simple extension can be used to implement the SYCL graph and the fusion extension here? Do you see gaps that we cannot yet fill efficiently?

The graph fusion extension is based on the same underlying implementation this IWOCL talk https://www.youtube.com/watch?v=s39awJ_-W_k

@sommerlukas Do you have any insights into whether it would be feasible to layer Graph fusion ontop of this for buffer only use-cases.

@sommerlukas
Copy link

Good that you pointed the SYCL's graph extension out. I'm not familiar with it. So far we have looked mostly on ML/AI formats, OpenVX and CUDA/HIP (task) graphs. Do you think cmdbuf extended with this simple extension can be used to implement the SYCL graph and the fusion extension here? Do you see gaps that we cannot yet fill efficiently?

The graph fusion extension is based on the same underlying implementation this IWOCL talk https://www.youtube.com/watch?v=s39awJ_-W_k

@sommerlukas Do you have any insights into whether it would be feasible to layer Graph fusion ontop of this for buffer only use-cases.

I think there's definitely some overlap in ideas here. We have similar properties that in combination express similar semantics as the property proposed here.

In the prototype implementation, we also act on that property by internalizing the dataflow if the property is present. This corresponds to:

fuse kernels together as intermediate results do not need to be preserved.

I think the biggest difference is the lifetime of the buffer: Our extension and properties do not restrict the buffer's lifetime to the graph. They only indicate that the updates made to the buffer by kernels in the fused graph may not be observable to kernels outside the graph or on the host. That part is identical to what is proposed here.

However, we allow the buffer to be used outside the graph. So, while our properties imply a part of the semantics proposed here, I think additional properties (or other mechanisms) would be required on the SYCL side to apply the proposed property to a buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants