-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how do cl_mem_flags affect fills and copies #770
Comments
I disagree about some of these
|
I'm flexible as to the exact behavior. The thought experiment I've been going through is: If I have a memory object that was created with the flags
|
Who initializes the contents of a buffer if it's immutable, if neither the host nor the device can write to the buffer under any circumstance? My understanding is that the memory flags are mostly intended as hints about the physical location of the buffer. For example, device read-only can go into constant memory, host no-access can go into host-unmappable memory, etc. Usage hints about read/write only can also be used in mapping to configure hardware caching as appropriate. WriteBuffer/ReadBuffer are generally implemented through the copy engine, which while arguably physically resident on the device, it's completely distinct from the computational part of the device, which is what CL actually cares about as a standard (compute units and memory, everything else is completely irrelevant standard-wise). Access isn't a matter of who calls the API, but who accesses the buffer data. So memory object copies do not care about the host access rules (the host isn't intended to see the data, even though the copy might actually involve a temporary transfer to host, e.g. if the two memory objects are stored on different devices), but since the compute engine is not involved, they can write even to read-only memory-objects. |
Good question. In my thought experiment it'd have to be initialized with
I think this is the crux of the problem: these flags are not defined precisely and we're trying to infer their meaning.
After typing this up I definitely don't think the device access flags are applicable to fills and copies, as described in my original proposal. I do think the host access flags could be applicable though, depending how the descriptions above are interpreted. |
The first at least would be incompatible with
I see what you mean better now. I think that the device ones are reasonably well-defined, but I do agree that the host ones could be clarified. My understanding is that the affected APIs are those that expose the buffer contents for direct read or write access from the host. APIs such as the copy functions are not affected, because the copy does not expose the buffer contents to the host (although the host may still be involved in the copy as an implementation detail, e.g. dev-to-dev copies between buffers stored on devices that cannot copy via DMA). Similarly, the fill operation isn't affected because again the filling is assumed to be done on device, at least to my understanding. My preference would therefore be to have a clarification of the wording for the standard rather than a possible change in behavior. |
This is somewhat secondary and I don't want to get too far out in the weeds, but why would
What would this look like exactly? In case it is helpful, here is my updated proposal if we decided to restrict fills and copies based on the host access flags:
|
They aren't incompatible now, but they would be in the vision you're exploring.
The text describing the memory flags and/or APIs should clarify that the host memory flags only refer to APIs that explicitly expose the memory object data to the host. This is in contrast to what you are proposing, which instead changes the current behavior. Note that the specification for filling operations is already clear about the lack of effect of memory access flags:
and ditto for clEnqueueFillImage. What we should have are similar explanatory texts for the copy APIs. I think that this has not been mentioned explicitly because it can be inferred by the lack of a host ptr argument in the APIs, but given your perplexity it might be appropriate to declare this explicitly, copying the blurb from the Fill commands. For SVM and USM I don't think host flags make sense, since their whole point is to provide shared host/device access to the resource. |
Well that's a little embarassing, I missed this line completely. 😢 I guess the currently documented behavior is intentional, at least for clEnqueueFillBuffer. This was before my involvement with OpenCL so I did a bit of spec archaeology to see if I could figure out the rationale behind this. I didn't find a clear answer, but here are my notes to save Khronos folks some time tracking this down in the future:
Since it seems like this behavior is intentional I suppose the only possible action would be to clarify that the memory access flags also do not apply to clEnqueueCopyBuffer. If this is sufficiently clear as-is, I'm fine closing this issue with no action required. Thank you @Oblomov for the insightful comments, as always! I learned something today. |
That's a very interesting archaelogical find. It's interesting that neither the host nor device flags apply in the fill case, despite the fact that the fill operation would normally be implemented via a kernel. My guess would be that this intentional, since otherwise it would require very inefficient workarounds to manage to do a buffer fill. It might be worth adding the same clarification text to the various copy APIs. |
For e.g. clEnqueueReadBuffer there is an error condition if the buffer being read from does not support the proper cl_mem_flags:
There are no such error condition for fills and copies, however, and this seems like an omission. The lack of error conditions means that we cannot reliably allocate memory that is guaranteed to be immutable from the host or on the device.
Proposal:
(edit: this is the old proposal, see update below: #770 (comment))
For memory object fills (clEnqueueFillBuffer and clEnqueueFillImage), treat this as access from the host. Add an error condition if a fill is called on a memory object created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
For memory object copies (there are a lot of these! clEnqueueCopyBuffer, clEnqueueCopyImage, plus copies between buffers and images, and the rect variants), treat this as access from the device. Add an error condition if the copy source was created with CL_MEM_WRITE_ONLY, and an error condition if the copy destination was created with CL_MEM_READ_ONLY. Note that there isn't a standard CL_MEM_NO_ACCESS flag to indicate no access on the device, though there is a version added by an extension.
For SVM fills (clEnqueueSVMMemFill), there will be no error condition because SVM is always accessible on the host. Revisit if we change the error condition for memory object fills.
For SVM copies (clEnqueueSVMMemcpy), add error conditions similar to memory object copies. Revisit if we change the error condition for memory object copies.
If we go with the behavior above we will also want to update the descriptions for CL_MEM_READ_ONLY and CL_MEM_WRITE_ONLY because these flags will also affect copies and not just access "inside a kernel".
For completeness, USM allocations are currently not subject to additional cl_mem_flags, but we would want to add similar error conditions for USM fills and copies if we did eventually support cl_mem_flags for USM allocations.
The text was updated successfully, but these errors were encountered: