how do cl_mem_flags affect fills and copies #770

bashbaug · 2022-03-09T16:47:11Z

For e.g. clEnqueueReadBuffer there is an error condition if the buffer being read from does not support the proper cl_mem_flags:

CL_INVALID_OPERATION if clEnqueueReadBuffer is called on buffer which has been created with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS.

There are no such error condition for fills and copies, however, and this seems like an omission. The lack of error conditions means that we cannot reliably allocate memory that is guaranteed to be immutable from the host or on the device.

Proposal:

(edit: this is the old proposal, see update below: #770 (comment))

For memory object fills (clEnqueueFillBuffer and clEnqueueFillImage), treat this as access from the host. Add an error condition if a fill is called on a memory object created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
For memory object copies (there are a lot of these! clEnqueueCopyBuffer, clEnqueueCopyImage, plus copies between buffers and images, and the rect variants), treat this as access from the device. Add an error condition if the copy source was created with CL_MEM_WRITE_ONLY, and an error condition if the copy destination was created with CL_MEM_READ_ONLY. Note that there isn't a standard CL_MEM_NO_ACCESS flag to indicate no access on the device, though there is a version added by an extension.
For SVM fills (clEnqueueSVMMemFill), there will be no error condition because SVM is always accessible on the host. Revisit if we change the error condition for memory object fills.
For SVM copies (clEnqueueSVMMemcpy), add error conditions similar to memory object copies. Revisit if we change the error condition for memory object copies.
If we go with the behavior above we will also want to update the descriptions for CL_MEM_READ_ONLY and CL_MEM_WRITE_ONLY because these flags will also affect copies and not just access "inside a kernel".

For completeness, USM allocations are currently not subject to additional cl_mem_flags, but we would want to add similar error conditions for USM fills and copies if we did eventually support cl_mem_flags for USM allocations.

Oblomov · 2022-03-11T11:21:39Z

I disagree about some of these

object fills do not involve host access; in fact, they are usually implemented as kernels or copies between device buffers; so the host access flag should not cause any error;
I disagree that CL_MEM_READ_ONLY and CL_MEM_WRITE_ONLY should apply to memory copies as well; this would be a big change wrt previous versions of the standard, and one that would significantly (and negatively) affect the usability of the standard; it would also be irrational that you can update the buffer from host but not from device for a read-only one.

bashbaug · 2022-03-11T20:21:33Z

I'm flexible as to the exact behavior. The thought experiment I've been going through is: If I have a memory object that was created with the flags CL_MEM_READ_ONLY and CL_MEM_HOST_READ_ONLY can I guarantee the contents of the buffer are immutable? If the answer is "yes" then we need to clarify the error behavior somehow. If the answer is "no", then that seems a little confusing, but I suppose it could mean that the spec is fine as-is.

I agree that memory object fills typically do not involve host access, although writing to a memory object (via e.g. clEnqueueWriteBuffer) is also typically done on the device, yet an error condition is still returned if the buffer is created with CL_MEM_HOST_READ_ONLY.

In an extreme case, with the current spec I can write to a buffer byte-by-byte using memory object fills, even though I cannot write to the buffer using clEnqueueWriteBuffer, which probably isn't intended 😄.
Would it make sense if memory object copies also obeyed the host read-only or host write-only flags, similarly? This seemed like even more of a stretch because there isn't even a host_ptr argument to a memory object copy, but if we consider "access from the host" to mean "access via a host API" maybe this could work?

Oblomov · 2022-03-12T06:49:29Z

Who initializes the contents of a buffer if it's immutable, if neither the host nor the device can write to the buffer under any circumstance?

My understanding is that the memory flags are mostly intended as hints about the physical location of the buffer. For example, device read-only can go into constant memory, host no-access can go into host-unmappable memory, etc. Usage hints about read/write only can also be used in mapping to configure hardware caching as appropriate.

WriteBuffer/ReadBuffer are generally implemented through the copy engine, which while arguably physically resident on the device, it's completely distinct from the computational part of the device, which is what CL actually cares about as a standard (compute units and memory, everything else is completely irrelevant standard-wise).

Access isn't a matter of who calls the API, but who accesses the buffer data. So memory object copies do not care about the host access rules (the host isn't intended to see the data, even though the copy might actually involve a temporary transfer to host, e.g. if the two memory objects are stored on different devices), but since the compute engine is not involved, they can write even to read-only memory-objects.

bashbaug · 2022-03-14T16:17:23Z

Who initializes the contents of a buffer if it's immutable, if neither the host nor the device can write to the buffer under any circumstance?

Good question. In my thought experiment it'd have to be initialized with CL_MEM_COPY_HOST_PTR or perhaps CL_MEM_USE_HOST_PTR, if neither the host nor the device can write to the buffer.

My understanding is that the memory flags are mostly intended as [...]

I think this is the crux of the problem: these flags are not defined precisely and we're trying to infer their meaning.

The device access flags all have text similar to "read and written by a kernel". This is reasonably well-defined, assuming we all agree what "by a kernel" means, and are OK that the device may access the memory object via non-kernel mechanisms that would not be subject to these flags.
The host access flags are especially loosey-goosey though:
- The host write-only flag says: "This flag specifies that the host will only write to the memory object (using OpenCL APIs that enqueue a write or a map for write)." - does this include a fill or a copy? Both of these APIs certainly "enqueue a write"...
- The host read-only flag says: "This flag specifies that the host will only read the memory object (using OpenCL APIs that enqueue a read or a map for read)." - does this include a copy? A copy command certainly "enqueues a read"...

After typing this up I definitely don't think the device access flags are applicable to fills and copies, as described in my original proposal. I do think the host access flags could be applicable though, depending how the descriptions above are interpreted.

Oblomov · 2022-03-15T11:04:21Z

Good question. In my thought experiment it'd have to be initialized with CL_MEM_COPY_HOST_PTR or perhaps CL_MEM_USE_HOST_PTR, if neither the host nor the device can write to the buffer.

The first at least would be incompatible with CL_MEM_HOST_READ_ONLY, so CL_MEM_USE_HOST_PTR would be the only one that makes sense.

I think this is the crux of the problem: these flags are not defined precisely and we're trying to infer their meaning.

I see what you mean better now. I think that the device ones are reasonably well-defined, but I do agree that the host ones could be clarified. My understanding is that the affected APIs are those that expose the buffer contents for direct read or write access from the host.

APIs such as the copy functions are not affected, because the copy does not expose the buffer contents to the host (although the host may still be involved in the copy as an implementation detail, e.g. dev-to-dev copies between buffers stored on devices that cannot copy via DMA).

Similarly, the fill operation isn't affected because again the filling is assumed to be done on device, at least to my understanding.

My preference would therefore be to have a clarification of the wording for the standard rather than a possible change in behavior.

bashbaug · 2022-03-16T04:49:23Z

Good question. In my thought experiment it'd have to be initialized with CL_MEM_COPY_HOST_PTR or perhaps CL_MEM_USE_HOST_PTR, if neither the host nor the device can write to the buffer.

The first at least would be incompatible with CL_MEM_HOST_READ_ONLY, so CL_MEM_USE_HOST_PTR would be the only one that makes sense.

This is somewhat secondary and I don't want to get too far out in the weeds, but why would CL_MEM_COPY_HOST_PTR be incompatible with CL_MEM_HOST_READ_ONLY? I did a quick search for the spec for both flags and I don't see anything that would prohibit their interaction.

My preference would therefore be to have a clarification of the wording for the standard rather than a possible change in behavior.

What would this look like exactly?

In case it is helpful, here is my updated proposal if we decided to restrict fills and copies based on the host access flags:

For memory object fills (clEnqueueFillBuffer and clEnqueueFillImage), restrict access based on the host flags because these APIs "enqueue a write". Add an error condition if a fill is called on a memory object created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
For memory object copies (clEnqueueCopyBuffer, clEnqueueCopyImage, plus copies between buffers and images, and the rect variants), also restrict access based on the host flags because these APIs "enqueue a read" from the source and "enqueue a write" to the destination. Add an error condition if the copy source was created with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS, and an error condition if the copy destination was created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
For SVM fills (clEnqueueSVMMemFill), there will be no new error condition because SVM is not subject to any host access flags.
For SVM copies (clEnqueueSVMMemcpy), there will also be no new error condition because SVM not subject to any host access flags.
For completeness, USM allocations are currently not subject to any cl_mem_flags, but we may want to add similar error conditions for USM fills and copies if we did eventually support cl_mem_flags for USM allocations.

Oblomov · 2022-03-16T10:41:03Z

This is somewhat secondary and I don't want to get too far out in the weeds, but why would CL_MEM_COPY_HOST_PTR be incompatible with CL_MEM_HOST_READ_ONLY? I did a quick search for the spec for both flags and I don't see anything that would prohibit their interaction.

They aren't incompatible now, but they would be in the vision you're exploring.

What would this look like exactly?

The text describing the memory flags and/or APIs should clarify that the host memory flags only refer to APIs that explicitly expose the memory object data to the host. This is in contrast to what you are proposing, which instead changes the current behavior. Note that the specification for filling operations is already clear about the lack of effect of memory access flags:

The usage information which indicates whether the memory object can be read or written by a kernel and/or the host and is given by the cl_mem_flags argument value specified when buffer is created is ignored by clEnqueueFillBuffer.

and ditto for clEnqueueFillImage. What we should have are similar explanatory texts for the copy APIs. I think that this has not been mentioned explicitly because it can be inferred by the lack of a host ptr argument in the APIs, but given your perplexity it might be appropriate to declare this explicitly, copying the blurb from the Fill commands.

For SVM and USM I don't think host flags make sense, since their whole point is to provide shared host/device access to the resource.

bashbaug · 2022-03-18T05:23:11Z

This is in contrast to what you are proposing, which instead changes the current behavior. Note that the specification for filling operations is already clear about the lack of effect of memory access flags:

The usage information which indicates whether the memory object can be read or written by a kernel and/or the host and is given by the cl_mem_flags argument value specified when buffer is created is ignored by clEnqueueFillBuffer.

Well that's a little embarassing, I missed this line completely. 😢

I guess the currently documented behavior is intentional, at least for clEnqueueFillBuffer.

This was before my involvement with OpenCL so I did a bit of spec archaeology to see if I could figure out the rationale behind this. I didn't find a clear answer, but here are my notes to save Khronos folks some time tracking this down in the future:

The "buffer fill" feature was proposed in Bugzilla 6810. The feature proposal didn't describe any interactions with the host access flags, though this isn't terribly surprising, since both features were added to OpenCL 1.2 around the same time.
The "host access flags" feature was proposed in Bugzilla 5963. Interestingly, in the discussion these flags were purely hints, but the final writeup included the error behavior for clEnqueueReadBuffer|Image, clEnqueueWriteBuffer|Image, and clEnqueueMapBuffer|Image.
The text describing the (lack of) interaction between clEnqueueFillBuffer and the host access flags was added in revision 09 of the OpenCL 1.2 spec, dated August 30, 2011. There's a brief mention of discussion in the August 23, 2011 teleconference notes indicating that there was confusion why clEnqueueWriteBuffer was treated differently than clEnqueueFillBuffer, but no notes describing why the final resolution was that the access flags do not apply to clEnqueueFillBuffer.

Since it seems like this behavior is intentional I suppose the only possible action would be to clarify that the memory access flags also do not apply to clEnqueueCopyBuffer. If this is sufficiently clear as-is, I'm fine closing this issue with no action required.

Thank you @Oblomov for the insightful comments, as always! I learned something today.

Oblomov · 2022-03-18T10:55:56Z

That's a very interesting archaelogical find. It's interesting that neither the host nor device flags apply in the fill case, despite the fact that the fill operation would normally be implemented via a kernel. My guess would be that this intentional, since otherwise it would require very inefficient workarounds to manage to do a buffer fill.

It might be worth adding the same clarification text to the various copy APIs.

bashbaug added the OpenCL API Spec Issues related to the OpenCL API specification. label Mar 9, 2022

bashbaug mentioned this issue Mar 27, 2024

The current specification makes it impossible to support a fully read-only image format (and related cl_mem_flags issues) #1110

Open

bashbaug linked a pull request Aug 18, 2024 that will close this issue

clarify cl_mem_flags to not affect copies #1230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how do cl_mem_flags affect fills and copies #770

how do cl_mem_flags affect fills and copies #770

bashbaug commented Mar 9, 2022 •

edited

Loading

Oblomov commented Mar 11, 2022

bashbaug commented Mar 11, 2022

Oblomov commented Mar 12, 2022

bashbaug commented Mar 14, 2022

Oblomov commented Mar 15, 2022

bashbaug commented Mar 16, 2022

Oblomov commented Mar 16, 2022

bashbaug commented Mar 18, 2022

Oblomov commented Mar 18, 2022

how do cl_mem_flags affect fills and copies #770

how do cl_mem_flags affect fills and copies #770

Comments

bashbaug commented Mar 9, 2022 • edited Loading

Oblomov commented Mar 11, 2022

bashbaug commented Mar 11, 2022

Oblomov commented Mar 12, 2022

bashbaug commented Mar 14, 2022

Oblomov commented Mar 15, 2022

bashbaug commented Mar 16, 2022

Oblomov commented Mar 16, 2022

bashbaug commented Mar 18, 2022

Oblomov commented Mar 18, 2022

bashbaug commented Mar 9, 2022 •

edited

Loading