Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PJRT:GPU] Treat GPU collective memory space as device memory space #16921

Closed
wants to merge 2 commits into from

Conversation

zhenying-liu
Copy link
Contributor

@zhenying-liu zhenying-liu commented Sep 9, 2024

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.

@NaiyerRizz
Copy link

Hi @zhenying-liu
Seems like there are some conflicts in the branch, Can you please resolve those.
Thank you

@zhenying-liu
Copy link
Contributor Author

Hi @zhenying-liu Seems like there are some conflicts in the branch, Can you please resolve those. Thank you

Thanks. Resolved conflicts.

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 9, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

Enhance Collective Memory Handling when nccl user buffers are enabled.
- Treat output collective memory space as device memory.
- Utilize allreduce HLO in a unit test to verify the collective memory space.
- Replace hardcoded collective memory space color (1) with a constant.
- Collective memory allocation and deallocation based on the proper memory space.
Copybara import of the project:

--
c37f245ec719c505a69eb3fcdbe6cd37070ae9bf by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output c37f245ec719c505a69eb3fcdbe6cd37070ae9bf
PiperOrigin-RevId: 672618973
@cheshire
Copy link
Member

cheshire commented Sep 9, 2024

Any chance we could separate out XLA and PJRT changes?

I'm somewhat new to that code, but hardcoding memory spaces in layout.h seems strange, is that the usual approach?

@cheshire
Copy link
Member

cheshire commented Sep 9, 2024

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

@zhenying-liu
Copy link
Contributor Author

Any chance we could separate out XLA and PJRT changes?

I'm somewhat new to that code, but hardcoding memory spaces in layout.h seems strange, is that the usual approach?

It's defined in XLA. It's harder to separate XLA and PJRT. But I can try my best to split.
I think defining memory spaces in layout.h is the best way that I can think of. Otherwise, defining it in different files could cause the conflicts. Currently both fast memory space and collect memory space has the value 1.

@zhenying-liu
Copy link
Contributor Author

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

There is a crash when using NCCL user buffer: --xla_gpu_enable_nccl_user_buffers=true. The error message is: "INVALID_ARGUMENT: Unexpected memory space 1 in output layout". It happened in pjrt_stream_executor_client.cc:MemoryKindFromSimpleShape().
Jax doesn't support any memory kind other than device and pinned_host. In this case, the output memory kind is neither device nor pinned_host. Our strategy is to treat CollectiveMemorySpace as device memory space to please Jax. But inside XLA, we keep using collective memory space to call nccl memory allocator to handle allocation/deallocation. The existing memory space color for collective memory space is 1, conflicting with Layout::kGenericFastMemorySpace. So we define a constant kCollectiveMemorySpace in layout.h with a unique value 6, and use this constant whenever it is needed.

@nouiz
Copy link
Contributor

nouiz commented Sep 9, 2024

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

There is a crash when using NCCL user buffer: --xla_gpu_enable_nccl_user_buffers=true. The error message is: "INVALID_ARGUMENT: Unexpected memory space 1 in output layout". It happened in pjrt_stream_executor_client.cc:MemoryKindFromSimpleShape(). Jax doesn't support any memory kind other than device and pinned_host. In this case, the output memory kind is neither device nor pinned_host. Our strategy is to treat CollectiveMemorySpace as device memory space to please Jax. But inside XLA, we keep using collective memory space to call nccl memory allocator to handle allocation/deallocation. The existing memory space color for collective memory space is 1, conflicting with Layout::kGenericFastMemorySpace. So we define a constant kCollectiveMemorySpace in layout.h with a unique value 6, and use this constant whenever it is needed.

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true
@zhenying-liu , can you update the description to tell that?

@nouiz
Copy link
Contributor

nouiz commented Sep 9, 2024

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

This is a regression fix. Google JAX team want to make a releases, so merging this soon would be great.
@hawkinsp approved the changes, so all PJRT part should be good.
Splitting into multiple from outside would take extra times and this compete with doing a release of JAX very soon. So I let you @cheshire and @hawkinsp discuss that. Having regression in public version isn't great.

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 9, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
c37f245ec719c505a69eb3fcdbe6cd37070ae9bf by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output c37f245ec719c505a69eb3fcdbe6cd37070ae9bf
PiperOrigin-RevId: 672618973
@NaiyerRizz NaiyerRizz self-assigned this Sep 10, 2024
xla/layout.h Outdated
@@ -404,6 +404,7 @@ class Layout {
static constexpr int64_t kDefaultMemorySpace = 0;
static constexpr int64_t kGenericFastMemorySpace = 1;
static constexpr int64_t kHostMemorySpace = 5;
static constexpr int64_t kCollectiveMemorySpace = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this enum have to live in layout.h ? Maybe we could move it to a GPU-specific file? After all , GPU has to not collide only with GPU. Also having to recompiling XLA on any changes to this enum does not sound great.

Copy link
Contributor

@jaro-sevcik jaro-sevcik Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire At the same time, it would be good if the constants would not collide, so I would prefer to have all the constants in one place. How about creating a new header file just for the memory space constants?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaro-sevcik Actually why not? If other device D has different memory spaces using same numbers, why does it matter? If they are allowed to collide, it means two backends can develop separately, which in general seems good? I guess here the "implementation details" need to be present at the API boundary, which makes it difficult.

Looking at the usage, conceptually, it seems like the design could be improved. The buffers allocated by XLA are internal, and should not be exposed to the API boundary (PjRT), but here they are. I guess ideally that logic should be moved out of PjRT into XLA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire Conceptually what you suggest sounds consistent. I would be concerned that someone takes a layout from an array, tries to use for some other device and gets surprising results without any warning, but perhaps that's already not possible because one cannot really sneak in memory spaces in layout.

In practice, this PR is holding up a JAX release, so we should decide what we want soon. Are you proposing to undo the move/rename of the kCollectiveMemorySpaceColor constant and change of its value to 6 (without worrying about the collective space id 1 colliding with kGenericFastMemorySpace) and just keep the changes to xla/pjrt/pjrt_stream_executor_client.cc and to xla/pjrt/gpu/se_gpu_pjrt_client_test.cc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, this PR is holding up a JAX release

Is it? What I understand from @hawkinsp that some combination of flags became slower, which is not the property release has to uphold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire I have the smallest changes to xla/pjrt/pjrt_stream_executor_client.cc and xla/pjrt/gpu/se_gpu_pjrt_client_test.cc ready. With that one, I have to use kGenericFastMemorySpace for collective space id 1. If so, we can fix the regression now and do the clean up the definition later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheshire Please take another look at the code review as I changed the code as mentioned above.

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
EXPECT_EQ(memory_kinds[0].size(), 1);
EXPECT_EQ(memory_kinds[0][0], "device");

TF_ASSERT_OK_AND_ASSIGN(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we execute it without the changes in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will have the assertion failure in pjrt_stream_execution_client.cc:MemoryKindFromSimpleShape():
INVALID_ARGUMENT: Unexpected memory space 1 in output layout

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc96776 by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 155ff9d
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 10, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Clarify GPU collective memory space to be a constant in XLA and use it consistently.
Copybara import of the project:

--
dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>:

[PJRT:GPU] Treat GPU collective memory space as device memory space

--
155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>:

put the header file in the alphabetical order

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 11, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b73040 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040
PiperOrigin-RevId: 672618973
@nouiz
Copy link
Contributor

nouiz commented Sep 11, 2024

This approved PR is failing in Google internal import.
Who can look at it? @xla-rotation

Copy link
Member

@akuegel akuegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this PR is blocked on a compile error. I wonder whether this actually compiles on your end?


auto device = client->addressable_devices()[0];
TF_EXPECT_OK(device->default_memory_space());
TF_ASSIGN_OR_RETURN(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void function 'TestBody' should not return a value [-Wreturn-mismatch]
1239 | TF_ASSIGN_OR_RETURN(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Sorry for the mistake. Fixed.

copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b73040 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b73040 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b73040 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6f by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6f by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6f by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

Reverts 65811e5

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6f by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 12, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit that referenced this pull request Sep 13, 2024
…ory space

Imported from GitHub PR #16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6f by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 13, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 672618973
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 13, 2024
…ory space

Imported from GitHub PR openxla/xla#16921

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.
Copybara import of the project:

--
8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>:

Treat collective memory space as device memory space when using as an output

--
b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>:

fix the test

Merging this change closes #16921

PiperOrigin-RevId: 674073363
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 13, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 672205509
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 13, 2024
Reverts 5adafde

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170
PiperOrigin-RevId: 674073765
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants