Fix reorder ops in cpu implementation with cl_mem #28286

WeldonWangwang · 2025-01-07T03:58:11Z

Details:

Add buffer_ptr() API in gpu_buffer struct
Mark the output in reorder cpu implementation as read_write to enable the result(type is reorder) with cl_mem

When use HETERO:GPU.0,GPU.1 pipeline parallel split the llama_v2 model, there is a __module.model.layers.0.self_attn/prim::ListConstruct_3 node is marked in shape_of subgraph, and it's result will pass to next device, thus this node and it's result node can only get CPU implementations.

In inference stage, when prepare outputs for the Result_44438, B580 dGPU will only use cl_mem in

openvino/src/plugins/intel_gpu/src/plugin/sync_infer_request.cpp

Line 548 in d757efd

tensor_type = TensorType::BT_BUF_INTERNAL;

, the reorder cpu impl can not set result to cl_mem.

So, we have two solutions:

Use this PR mark reorder output_lock in cpu impl to read_write, so it can use cl_mem
In PR28476 Skip mark_node for result node with reorder type

Tickets:

CVS-158971

src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.hpp

riverlijunjie · 2025-01-07T05:42:34Z

src/plugins/intel_gpu/src/graph/impls/cpu/reorder.cpp

@@ -56,7 +56,7 @@ struct reorder_impl : public typed_primitive_impl<reorder> {
        auto output_mem_ptr = instance.output_memory_ptr();

        cldnn::mem_lock<uint8_t, mem_lock_type::read> input_lock(input_mem_ptr, stream);
-        cldnn::mem_lock<uint8_t, mem_lock_type::read> output_lock(output_mem_ptr, stream);
+        cldnn::mem_lock<uint8_t, mem_lock_type::read_write> output_lock(output_mem_ptr, stream);


output memory will be updated as the reason solved by this PR, could you descript the detail reason?

who allocate cl_mem output for this node? and why cpu impl is choosen?
btw, this line will break if output is usm_device, it's unsafe to directly change to read_write here

Sure, River, I will add some test case to explain this

usm_device

If it is usm_device, it will do copying operation by mem_lock itself, right? But we need logic to write back to usm_device memory if needed.

@WeldonWangwang, could you please update access modifiers for the other CPU impls as well?

@sshlyapn Sure, will add it.

Co-authored-by: River Li <[email protected]>

sshlyapn · 2025-01-21T09:19:42Z

src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.hpp

@@ -39,6 +39,9 @@ struct gpu_buffer : public lockable_gpu_mem, public memory {
        assert(0 == _lock_count);
        return _buffer;
    }
+    void* buffer_ptr() const override {


I believe it's not needed?

Hi Sergey, it's not needed, this API just used in debug messages here:

openvino/src/plugins/intel_gpu/src/plugin/sync_infer_request.cpp

Line 996 in 8a7c974

GPU_DEBUG_TRACE_DETAIL << internal_name << " with index " << output_idx << " prepare output: " << output_memory->buffer_ptr() << std::endl;

The ptr would be 0 without this API, so I added it.
such as: GPU_Debug: sync_infer_request.cpp:989:prepare_output: result:Result_51154 with index 9 prepare output: 0

sshlyapn · 2025-01-21T09:21:11Z

src/plugins/intel_gpu/src/graph/impls/cpu/reorder.cpp

@@ -56,7 +56,7 @@ struct reorder_impl : public typed_primitive_impl<reorder> {
        auto output_mem_ptr = instance.output_memory_ptr();

        cldnn::mem_lock<uint8_t, mem_lock_type::read> input_lock(input_mem_ptr, stream);
-        cldnn::mem_lock<uint8_t, mem_lock_type::read> output_lock(output_mem_ptr, stream);
+        cldnn::mem_lock<uint8_t, mem_lock_type::read_write> output_lock(output_mem_ptr, stream);


@WeldonWangwang, could you please update access modifiers for the other CPU impls as well?

Fix reorder ops in cpu implementation with cl_mem

fc4964c

WeldonWangwang requested review from a team as code owners January 7, 2025 03:58

WeldonWangwang requested review from peterchen-intel and songbell January 7, 2025 03:58

github-actions bot added the category: GPU OpenVINO GPU plugin label Jan 7, 2025

WeldonWangwang requested a review from riverlijunjie January 7, 2025 03:58

riverlijunjie reviewed Jan 7, 2025

View reviewed changes

Update src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.hpp

550d39a

Co-authored-by: River Li <[email protected]>

WeldonWangwang requested a review from sshlyapn January 20, 2025 09:36

sshlyapn reviewed Jan 21, 2025

View reviewed changes

Update access modifiers for CPU impls

1b4468a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reorder ops in cpu implementation with cl_mem #28286

Fix reorder ops in cpu implementation with cl_mem #28286

WeldonWangwang commented Jan 7, 2025 •

edited

Loading

riverlijunjie Jan 7, 2025

songbell Jan 7, 2025

WeldonWangwang Jan 7, 2025

riverlijunjie Jan 7, 2025

sshlyapn Jan 21, 2025

WeldonWangwang Jan 23, 2025

sshlyapn Jan 21, 2025

WeldonWangwang Jan 23, 2025

sshlyapn Jan 21, 2025

Fix reorder ops in cpu implementation with cl_mem #28286

Are you sure you want to change the base?

Fix reorder ops in cpu implementation with cl_mem #28286

Conversation

WeldonWangwang commented Jan 7, 2025 • edited Loading

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeldonWangwang commented Jan 7, 2025 •

edited

Loading