Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update access modifiers for CPU impls #28286

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/plugins/intel_gpu/src/graph/impls/cpu/reorder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ struct reorder_impl : public typed_primitive_impl<reorder> {
auto output_mem_ptr = instance.output_memory_ptr();

cldnn::mem_lock<uint8_t, mem_lock_type::read> input_lock(input_mem_ptr, stream);
cldnn::mem_lock<uint8_t, mem_lock_type::read> output_lock(output_mem_ptr, stream);
cldnn::mem_lock<uint8_t, mem_lock_type::read_write> output_lock(output_mem_ptr, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output memory will be updated as the reason solved by this PR, could you descript the detail reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who allocate cl_mem output for this node? and why cpu impl is choosen?
btw, this line will break if output is usm_device, it's unsafe to directly change to read_write here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, River, I will add some test case to explain this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usm_device

If it is usm_device, it will do copying operation by mem_lock itself, right? But we need logic to write back to usm_device memory if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeldonWangwang, could you please update access modifiers for the other CPU impls as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sshlyapn Sure, will add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the memory is used for readonly, why change them to read_write?


input_host_tensors.push_back(make_tensor(params->input_layouts[0], input_lock.data()));
output_host_tensors.push_back(make_tensor(params->output_layouts[0], output_lock.data()));
Expand Down
3 changes: 3 additions & 0 deletions src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ struct gpu_buffer : public lockable_gpu_mem, public memory {
assert(0 == _lock_count);
return _buffer;
}
void* buffer_ptr() const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sergey, it's not needed, this API just used in debug messages here:

GPU_DEBUG_TRACE_DETAIL << internal_name << " with index " << output_idx << " prepare output: " << output_memory->buffer_ptr() << std::endl;

The ptr would be 0 without this API, so I added it.
such as: GPU_Debug: sync_infer_request.cpp:989:prepare_output: result:Result_51154 with index 9 prepare output: 0

return get_buffer().get();
}

event::ptr copy_from(stream& stream, const void* data_ptr, size_t src_offset, size_t dst_offset, size_t size, bool blocking) override;
event::ptr copy_from(stream& stream, const memory& src_mem, size_t src_offset, size_t dst_offset, size_t size, bool blocking) override;
Expand Down
Loading