[PJRT:GPU] Treat GPU collective memory space as device memory space #16921

zhenying-liu · 2024-09-09T06:47:52Z

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true.
Return device memory space when collective memory space is used as an output on GPU.

NaiyerRizz · 2024-09-09T09:19:09Z

Hi @zhenying-liu
Seems like there are some conflicts in the branch, Can you please resolve those.
Thank you

zhenying-liu · 2024-09-09T16:11:33Z

Hi @zhenying-liu Seems like there are some conflicts in the branch, Can you please resolve those. Thank you

Thanks. Resolved conflicts.

…ory space Imported from GitHub PR openxla/xla#16921 Enhance Collective Memory Handling when nccl user buffers are enabled. - Treat output collective memory space as device memory. - Utilize allreduce HLO in a unit test to verify the collective memory space. - Replace hardcoded collective memory space color (1) with a constant. - Collective memory allocation and deallocation based on the proper memory space. Copybara import of the project: -- c37f245ec719c505a69eb3fcdbe6cd37070ae9bf by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output c37f245ec719c505a69eb3fcdbe6cd37070ae9bf PiperOrigin-RevId: 672618973

cheshire · 2024-09-09T19:03:01Z

Any chance we could separate out XLA and PJRT changes?

I'm somewhat new to that code, but hardcoding memory spaces in layout.h seems strange, is that the usual approach?

cheshire · 2024-09-09T19:31:46Z

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

zhenying-liu · 2024-09-09T19:44:47Z

Any chance we could separate out XLA and PJRT changes?

I'm somewhat new to that code, but hardcoding memory spaces in layout.h seems strange, is that the usual approach?

It's defined in XLA. It's harder to separate XLA and PJRT. But I can try my best to split.
I think defining memory spaces in layout.h is the best way that I can think of. Otherwise, defining it in different files could cause the conflicts. Currently both fast memory space and collect memory space has the value 1.

zhenying-liu · 2024-09-09T19:59:31Z

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

There is a crash when using NCCL user buffer: --xla_gpu_enable_nccl_user_buffers=true. The error message is: "INVALID_ARGUMENT: Unexpected memory space 1 in output layout". It happened in pjrt_stream_executor_client.cc:MemoryKindFromSimpleShape().
Jax doesn't support any memory kind other than device and pinned_host. In this case, the output memory kind is neither device nor pinned_host. Our strategy is to treat CollectiveMemorySpace as device memory space to please Jax. But inside XLA, we keep using collective memory space to call nccl memory allocator to handle allocation/deallocation. The existing memory space color for collective memory space is 1, conflicting with Layout::kGenericFastMemorySpace. So we define a constant kCollectiveMemorySpace in layout.h with a unique value 6, and use this constant whenever it is needed.

nouiz · 2024-09-09T20:06:37Z

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

There is a crash when using NCCL user buffer: --xla_gpu_enable_nccl_user_buffers=true. The error message is: "INVALID_ARGUMENT: Unexpected memory space 1 in output layout". It happened in pjrt_stream_executor_client.cc:MemoryKindFromSimpleShape(). Jax doesn't support any memory kind other than device and pinned_host. In this case, the output memory kind is neither device nor pinned_host. Our strategy is to treat CollectiveMemorySpace as device memory space to please Jax. But inside XLA, we keep using collective memory space to call nccl memory allocator to handle allocation/deallocation. The existing memory space color for collective memory space is 1, conflicting with Layout::kGenericFastMemorySpace. So we define a constant kCollectiveMemorySpace in layout.h with a unique value 6, and use this constant whenever it is needed.

This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true
@zhenying-liu , can you update the description to tell that?

nouiz · 2024-09-09T20:09:02Z

Is it possible to describe in general terms, not in specific fixes, what the change is doing? Is it fixing a crash? Is it performance improvement? If it's a bullet list of five things, maybe it should be 5 PRs?

This is a regression fix. Google JAX team want to make a releases, so merging this soon would be great.
@hawkinsp approved the changes, so all PJRT part should be good.
Splitting into multiple from outside would take extra times and this compete with doing a release of JAX very soon. So I let you @cheshire and @hawkinsp discuss that. Having regression in public version isn't great.

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- c37f245ec719c505a69eb3fcdbe6cd37070ae9bf by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output c37f245ec719c505a69eb3fcdbe6cd37070ae9bf PiperOrigin-RevId: 672618973

cheshire · 2024-09-10T08:09:50Z

xla/layout.h

@@ -404,6 +404,7 @@ class Layout {
  static constexpr int64_t kDefaultMemorySpace = 0;
  static constexpr int64_t kGenericFastMemorySpace = 1;
  static constexpr int64_t kHostMemorySpace = 5;
+  static constexpr int64_t kCollectiveMemorySpace = 6;


Does this enum have to live in layout.h ? Maybe we could move it to a GPU-specific file? After all , GPU has to not collide only with GPU. Also having to recompiling XLA on any changes to this enum does not sound great.

@cheshire At the same time, it would be good if the constants would not collide, so I would prefer to have all the constants in one place. How about creating a new header file just for the memory space constants?

@jaro-sevcik Actually why not? If other device D has different memory spaces using same numbers, why does it matter? If they are allowed to collide, it means two backends can develop separately, which in general seems good? I guess here the "implementation details" need to be present at the API boundary, which makes it difficult.

Looking at the usage, conceptually, it seems like the design could be improved. The buffers allocated by XLA are internal, and should not be exposed to the API boundary (PjRT), but here they are. I guess ideally that logic should be moved out of PjRT into XLA.

@cheshire Conceptually what you suggest sounds consistent. I would be concerned that someone takes a layout from an array, tries to use for some other device and gets surprising results without any warning, but perhaps that's already not possible because one cannot really sneak in memory spaces in layout.

In practice, this PR is holding up a JAX release, so we should decide what we want soon. Are you proposing to undo the move/rename of the kCollectiveMemorySpaceColor constant and change of its value to 6 (without worrying about the collective space id 1 colliding with kGenericFastMemorySpace) and just keep the changes to xla/pjrt/pjrt_stream_executor_client.cc and to xla/pjrt/gpu/se_gpu_pjrt_client_test.cc?

In practice, this PR is holding up a JAX release

Is it? What I understand from @hawkinsp that some combination of flags became slower, which is not the property release has to uphold?

@cheshire I have the smallest changes to xla/pjrt/pjrt_stream_executor_client.cc and xla/pjrt/gpu/se_gpu_pjrt_client_test.cc ready. With that one, I have to use kGenericFastMemorySpace for collective space id 1. If so, we can fix the regression now and do the clean up the definition later.

@cheshire Please take another look at the code review as I changed the code as mentioned above.

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973

cheshire · 2024-09-10T17:05:03Z

xla/pjrt/gpu/se_gpu_pjrt_client_test.cc

+  EXPECT_EQ(memory_kinds[0].size(), 1);
+  EXPECT_EQ(memory_kinds[0][0], "device");
+
+  TF_ASSERT_OK_AND_ASSIGN(


What happens if we execute it without the changes in this PR?

It will have the assertion failure in pjrt_stream_execution_client.cc:MemoryKindFromSimpleShape():
INVALID_ARGUMENT: Unexpected memory space 1 in output layout

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc96776 by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 155ff9d PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Clarify GPU collective memory space to be a constant in XLA and use it consistently. Copybara import of the project: -- dc967765fb26629c600e7d4820a50f8d1a3b8e9a by Jane Liu <[email protected]>: [PJRT:GPU] Treat GPU collective memory space as device memory space -- 155ff9d4307e42546053d07d5fe5cb491b45e18c by Jane Liu <[email protected]>: put the header file in the alphabetical order Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 155ff9d4307e42546053d07d5fe5cb491b45e18c PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973

nouiz · 2024-09-11T20:32:33Z

This approved PR is failing in Google internal import.
Who can look at it? @xla-rotation

akuegel

Merging this PR is blocked on a compile error. I wonder whether this actually compiles on your end?

akuegel · 2024-09-12T07:05:29Z

xla/pjrt/gpu/se_gpu_pjrt_client_test.cc

+
+  auto device = client->addressable_devices()[0];
+  TF_EXPECT_OK(device->default_memory_space());
+  TF_ASSIGN_OR_RETURN(


void function 'TestBody' should not return a value [-Wreturn-mismatch]
1239 | TF_ASSIGN_OR_RETURN(

Thanks for pointing this out. Sorry for the mistake. Fixed.

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b73040 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output 1b73040 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 1b730405577b926030c3fbde1132141717590089 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output 1b730405577b926030c3fbde1132141717590089 PiperOrigin-RevId: 672618973

… output

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 Reverts 65811e5 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR #16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6f by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16921 from zhenying-liu:nccl-buffer-output b5e43d6 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672618973

…ory space Imported from GitHub PR openxla/xla#16921 This is a regression fix when using --xla_gpu_enable_nccl_user_buffers=true. Return device memory space when collective memory space is used as an output on GPU. Copybara import of the project: -- 8113e6fbe23d5902ecdd406793555c602c1b7f81 by Jane Liu <[email protected]>: Treat collective memory space as device memory space when using as an output -- b5e43d6455adc49f5ac99a9a9e95cf495eb46170 by Jane Liu <[email protected]>: fix the test Merging this change closes #16921 PiperOrigin-RevId: 674073363

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 672205509

Reverts 5adafde FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16921 from zhenying-liu:nccl-buffer-output b5e43d6455adc49f5ac99a9a9e95cf495eb46170 PiperOrigin-RevId: 674073765

NaiyerRizz requested review from cheshire and reedwm September 9, 2024 09:19

zhenying-liu force-pushed the nccl-buffer-output branch from bfc99d7 to c37f245 Compare September 9, 2024 15:52

hawkinsp approved these changes Sep 9, 2024

View reviewed changes

copybara-service bot mentioned this pull request Sep 9, 2024

PR #16921: [PJRT:GPU] Treat GPU collective memory space as device memory space tensorflow/tensorflow#75430

Merged

zhenying-liu force-pushed the nccl-buffer-output branch from c37f245 to 155ff9d Compare September 9, 2024 20:51

NaiyerRizz self-assigned this Sep 10, 2024

cheshire requested changes Sep 10, 2024

View reviewed changes

cheshire reviewed Sep 10, 2024

View reviewed changes

zhenying-liu force-pushed the nccl-buffer-output branch from 155ff9d to 1b73040 Compare September 10, 2024 18:30

copybara-service bot mentioned this pull request Sep 10, 2024

PR #16921: [PJRT:GPU] Treat GPU collective memory space as device memory space #17030

Merged

akuegel reviewed Sep 12, 2024

View reviewed changes

zhenying-liu added 2 commits September 12, 2024 08:05

Treat collective memory space as device memory space when using as an…

8113e6f

… output

fix the test

b5e43d6

zhenying-liu force-pushed the nccl-buffer-output branch from 1b73040 to b5e43d6 Compare September 12, 2024 15:09

akuegel approved these changes Sep 12, 2024

View reviewed changes

copybara-service bot closed this in c5346ef Sep 13, 2024

copybara-service bot mentioned this pull request Sep 13, 2024

[lrt-model] model implementation tensorflow/tensorflow#75668

Merged

copybara-service bot mentioned this pull request Sep 13, 2024

Internal breakage. tensorflow/tensorflow#75676

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PJRT:GPU] Treat GPU collective memory space as device memory space #16921

[PJRT:GPU] Treat GPU collective memory space as device memory space #16921

zhenying-liu commented Sep 9, 2024 •

edited

Loading

NaiyerRizz commented Sep 9, 2024

zhenying-liu commented Sep 9, 2024

cheshire commented Sep 9, 2024

cheshire commented Sep 9, 2024

zhenying-liu commented Sep 9, 2024

zhenying-liu commented Sep 9, 2024

nouiz commented Sep 9, 2024 •

edited

Loading

nouiz commented Sep 9, 2024

cheshire Sep 10, 2024

jaro-sevcik Sep 10, 2024 •

edited

Loading

cheshire Sep 10, 2024

jaro-sevcik Sep 10, 2024

cheshire Sep 10, 2024

zhenying-liu Sep 10, 2024

zhenying-liu Sep 10, 2024

cheshire Sep 10, 2024

zhenying-liu Sep 10, 2024

nouiz commented Sep 11, 2024

akuegel left a comment

akuegel Sep 12, 2024

zhenying-liu Sep 12, 2024

[PJRT:GPU] Treat GPU collective memory space as device memory space #16921

[PJRT:GPU] Treat GPU collective memory space as device memory space #16921

Conversation

zhenying-liu commented Sep 9, 2024 • edited Loading

NaiyerRizz commented Sep 9, 2024

zhenying-liu commented Sep 9, 2024

cheshire commented Sep 9, 2024

cheshire commented Sep 9, 2024

zhenying-liu commented Sep 9, 2024

zhenying-liu commented Sep 9, 2024

nouiz commented Sep 9, 2024 • edited Loading

nouiz commented Sep 9, 2024

Choose a reason for hiding this comment

jaro-sevcik Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nouiz commented Sep 11, 2024

akuegel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhenying-liu commented Sep 9, 2024 •

edited

Loading

nouiz commented Sep 9, 2024 •

edited

Loading

jaro-sevcik Sep 10, 2024 •

edited

Loading