Skip to content

[NFCI][SYCL] Copy handler::MQueue to handler_impl #18830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2025

Conversation

aelovikov-intel
Copy link
Contributor

Ideally, it should be just moved there, but doing the whole move in one PR would be harder to review as it would require multiple internal API changes to accept queue_impl by raw ptr/ref instead of std::shared_ptr<queue_impl> value/ref. As such, this PR does the following:

  • Store std::variant of queue/graph instead of just graph as two are mutually exclusive
  • Store them by reference (through std::reference_wrapper as 'std::variant' can't store raw references directly) instead of std::shared_ptr<graph_impl>. The reason for that is that the object handler is submitted to is guaranteed to be alive for the whole lifetime of the handler and we don't need to extend the lifetime of queue/graph. Corresponding changes for handler::MQueue have been implemented recently already (although via std::shared_ptr<queue_impl> & and not raw reference) but are limited to prevew-only. Do the same for graphs here, essentially.
  • Update all uses of the old handler_impl::MGraph data member as it needs to go through new getters accessing the std::variant described above.
  • Update some of the direct usages of handler::MQueue that don't require APIs refactoring elsewhere. The remaining uses are left to the subsequent PR(s).
  • We'll probably need to keep the handler::MQueue initialized properly even after the move is complete and all internal SYCL RT accesses are through handler_impl as some direct handler::MQueue accesses might have been inlined into the users' applications (I'd be especially worried about reductions).

Ideally, it should be just moved there, but doing the whole move in one
PR would be harder to review as it would require multiple internal API
changes to accept `queue_impl` by raw ptr/ref instead of
`std::shared_ptr<queue_impl>` value/ref. As such, this PR does the
following:

* Store `std::variant` of queue/graph instead of just graph as two are
  mutually exclusive
* Store them by reference (through `std::reference_wrapper` as
  'std::variant' can't store raw references directly) instead of
  `std::shared_ptr<graph_impl>`. The reason for that is that the object
  `handler` is submitted to is guaranteed to be alive for the whole
  lifetime of the `handler` and we don't need to extend the lifetime of
  queue/graph. Corresponding changes for `handler::MQueue` have been
  implemented recently already (although via
  `std::shared_ptr<queue_impl> &` and not raw reference) but are
  limited to prevew-only. Do the same for graphs here, essentially.
* Update all uses of the old `handler_impl::MGraph` data member as it
  needs to go through new getters accessing the `std::variant` described
  above.
* Update some of the direct usages of `handler::MQueue` that don't
  require APIs refactoring elsewhere. The remaining uses are left to the
  subsequent PR(s).
* We'll probably need to keep the `handler::MQueue` initialized properly
  even after the move is complete and all internal SYCL RT accesses are
  through `handler_impl` as some direct `handler::MQueue` accesses might
  have been inlined into the users' applications (I'd be especially
  worried about reductions).
@@ -14,15 +14,6 @@
namespace sycl {
inline namespace _V1 {
namespace detail {
device getDeviceFromHandler(handler &cgh) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exported, I've moved it to handler.cpp.

@aelovikov-intel
Copy link
Contributor Author

Just for a bit of extra context, #18767 is a draft/wip/poc showing something close to the targeted state.

Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me. Just a couple of nits, and a couple of questions:

  • What is the motivation behind these changes?
  • Is this not breaking ABI?

@@ -43,13 +43,15 @@ inline namespace _V1 {

namespace detail {

#ifdef __INTEL_PREVIEW_BREAKING_CHANGES
// TODO: Check if two ABI exports below are still necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we address this here or in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, during the ABI breaking window would be most effective, hence the preview guards around the comment.

Copy link
Contributor

@maarquitos14 maarquitos14 Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question was not very precise, sorry. Let me clarify: should we try to address this within the guards already, so it's ready and tested for the next ABI breaking window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary and I'd rather focus on more complicated things (like changes in this PR) first. Having single cleanup for trivial cases like this later, after I do most of the hard refactoring, is my current plan.

range<2> ItemLimit = Dev.get_info<info::device::max_work_item_sizes<2>>() *
Dev.get_info<info::device::max_compute_units>();
return id<2>{std::min(ItemLimit[0], Height), std::min(ItemLimit[1], Width)};
}

// TODO: do we need this still?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we address this here or in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

@aelovikov-intel
Copy link
Contributor Author

aelovikov-intel commented Jun 6, 2025

What is the motivation behind these changes?

Looks like a much clear design to me. If properties are mutually exclusive, then it should be an std::variant. Having all the data in impl is also just the right thing to do. Also, the ownership bit for the Graph from the description comment.

Is this not breaking ABI?

Don't see how would it...

@aelovikov-intel
Copy link
Contributor Author

@intel/sycl-graphs-reviewers , could you take a look before EOD in your TZ? I was hoping I'd be able to proceed with subsequent PRs during my working hours today.

@maarquitos14
Copy link
Contributor

maarquitos14 commented Jun 6, 2025

Looks like a much clear design to me. If properties are mutually exclusive, then it should be an std::variant. Having all the data in impl is also just the right thing to do. Also, the ownership bit for the Graph from the description comment.

I agree. Just wanted to know if this was also expected to have any performance implications, as we are doing some efforts on that direction.

Is this not breaking ABI?

Don't see how would it...

Sorry, I mixed up the title of this PR with the draft PR you linked, so in my head this was already effectively moving, not copying.

@aelovikov-intel
Copy link
Contributor Author

Just wanted to know if this was also expected to have any performance implications, as we are doing some efforts on that direction.

We've had multiple PRs that were eliminating unnecessary shared_ptr copies on hot paths. With my refactoring I'm trying to approach this issue on the design/architecture level by using raw ptr/ref by default and only doing explicit ref count increments via shared_from_this. In that sense I believe it should be neutral/slightly positive. The only extra cost I see is two memory accesses (via extra impl indirection) vs one, but a) it's very cheap; b) that's the price of the pImpl idiom and ABI compatibility well justifies that kind of overhead.

Copy link
Contributor

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One meaningful comment about handler_impl(ext::oneapi::experimental::detail::graph_impl &Graph) usage with __INTEL_PREVIEW_BREAKING_CHANGES. All other comments are superficial nitpicks.

Comment on lines +40 to +41
handler_impl(ext::oneapi::experimental::detail::graph_impl &Graph)
: MQueueOrGraph{Graph} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any %if preview-breaking-changes-supported Graph E2E tests to catch this. When we call this constructor with under __INTEL_PREVIEW_BREAKING_CHANGES
https://github.com/intel/llvm/blob/sycl/sycl/source/detail/graph_impl.cpp#L507 it will pass a std::shared_ptr<ext::oneapi::experimental::detail::graph_impl> from shared_from_this() can that compile after this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source/detail is compiled unconditionally in two modes, because we provide two variants of libsycl.so/libsycl-preview.so. All existing E2E tests are run in preview mode in this nightly task as well:

- name: Preview mode on SPR/PVC
runner: '["Linux", "pvc"]'
image_options: -u 1001 --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --privileged --cap-add SYS_ADMIN
target_devices: level_zero:gpu
extra_lit_opts: --param test-preview-mode=True

That said, I'll take another look locally to be 100% sure, but I wouldn't expect any failures here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, in that case I've approved the PR, but if you could double check locally that would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already modified as part of this PR - see changes in the above file (graph_impl.cpp).

@aelovikov-intel aelovikov-intel merged commit 0dff0ff into intel:sycl Jun 6, 2025
24 checks passed
@aelovikov-intel aelovikov-intel deleted the copy-MQueue branch June 6, 2025 16:58
aelovikov-intel added a commit to aelovikov-intel/llvm that referenced this pull request Jun 6, 2025
aelovikov-intel added a commit that referenced this pull request Jun 9, 2025
Initially started in #18830
Subsequent PRs before this final one:

#18794
#18834
#18748
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants