-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PJRT:GPU] Treat GPU collective memory space as device memory space #16921
Conversation
Hi @zhenying-liu |
bfc99d7
to
c37f245
Compare
Thanks. Resolved conflicts. |
…ory space Imported from GitHub PR openxla/xla#16921 Enhance Collective Memory Handling when nccl user buffers are enabled. - Treat output collective memory space as device memory. - Utilize allreduce HLO in a unit test to verify the collective memory space. - Replace hardcoded collective memory space color (1) with a constant. - Collective memory allocation and deallocation based on the proper memory space. Copybara import of the project: -- c37f245ec719c505a69eb3fcdbe6cd37070ae9bf by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output c37f245ec719c505a69eb3fcdbe6cd37070ae9bf PiperOrigin-RevId: 672618973
Any chance we could separate out XLA and PJRT changes? I'm somewhat new to that code, but hardcoding memory spaces in layout.h seems strange, is that the usual approach? |
Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs? |
It's defined in XLA. It's harder to separate XLA and PJRT. But I can try my best to split. |
There is a crash when using NCCL user buffer: --xla_gpu_enable_nccl_user_buffers=true. The error message is: "INVALID_ARGUMENT: Unexpected memory space 1 in output layout". It happened in pjrt_stream_executor_client.cc:MemoryKindFromSimpleShape(). |
This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true |
This is a regression fix. Google JAX team want to make a releases, so merging this soon would be great. |
c37f245
to
155ff9d
Compare
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- c37f245ec719c505a69eb3fcdbe6cd37070ae9bf by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output c37f245ec719c505a69eb3fcdbe6cd37070ae9bf PiperOrigin-RevId: 672618973
xla/layout.h
Outdated
@@ -404,6 +404,7 @@ class Layout { | |||
static constexpr int64_t kDefaultMemorySpace = 0; | |||
static constexpr int64_t kGenericFastMemorySpace = 1; | |||
static constexpr int64_t kHostMemorySpace = 5; | |||
static constexpr int64_t kCollectiveMemorySpace = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this enum have to live in layout.h
? Maybe we could move it to a GPU-specific file? After all , GPU has to not collide only with GPU. Also having to recompiling XLA on any changes to this enum does not sound great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheshire At the same time, it would be good if the constants would not collide, so I would prefer to have all the constants in one place. How about creating a new header file just for the memory space constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaro-sevcik Actually why not? If other device D has different memory spaces using same numbers, why does it matter? If they are allowed to collide, it means two backends can develop separately, which in general seems good? I guess here the "implementation details" need to be present at the API boundary, which makes it difficult.
Looking at the usage, conceptually, it seems like the design could be improved. The buffers allocated by XLA are internal, and should not be exposed to the API boundary (PjRT), but here they are. I guess ideally that logic should be moved out of PjRT into XLA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheshire Conceptually what you suggest sounds consistent. I would be concerned that someone takes a layout from an array, tries to use for some other device and gets surprising results without any warning, but perhaps that's already not possible because one cannot really sneak in memory spaces in layout.
In practice, this PR is holding up a JAX release, so we should decide what we want soon. Are you proposing to undo the move/rename of the kCollectiveMemorySpaceColor constant and change of its value to 6 (without worrying about the collective space id 1 colliding with kGenericFastMemorySpace) and just keep the changes to xla/pjrt/pjrt_stream_executor_client.cc and to xla/pjrt/gpu/se_gpu_pjrt_client_test.cc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, this PR is holding up a JAX release
Is it? What I understand from @hawkinsp that some combination of flags became slower, which is not the property release has to uphold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheshire I have the smallest changes to xla/pjrt/pjrt_stream_executor_client.cc and xla/pjrt/gpu/se_gpu_pjrt_client_test.cc ready. With that one, I have to use kGenericFastMemorySpace for collective space id 1. If so, we can fix the regression now and do the clean up the definition later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheshire Please take another look at the code review as I changed the code as mentioned above.
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
EXPECT_EQ(memory_kinds[0].size(), 1); | ||
EXPECT_EQ(memory_kinds[0][0], "device"); | ||
|
||
TF_ASSERT_OK_AND_ASSIGN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we execute it without the changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will have the assertion failure in pjrt_stream_execution_client.cc:MemoryKindFromSimpleShape():
INVALID_ARGUMENT: Unexpected memory space 1 in output layout
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
155ff9d
to
1b73040
Compare
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc96776 by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 155ff9d PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973
This approved PR is failing in Google internal import. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this PR is blocked on a compile error. I wonder whether this actually compiles on your end?
|
||
auto device = client->addressable_devices()[0]; | ||
TF_EXPECT_OK(device->default_memory_space()); | ||
TF_ASSIGN_OR_RETURN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void function 'TestBody' should not return a value [-Wreturn-mismatch]
1239 | TF_ASSIGN_OR_RETURN(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Sorry for the mistake. Fixed.
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089 PiperOrigin-RevId: 672618973
1b73040
to
b5e43d6
Compare
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 Reverts 65811e5 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973
…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 PiperOrigin-RevId: 674073363
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672205509
Reverts 5adafde FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 674073765
This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.