-
Notifications
You must be signed in to change notification settings - Fork 74
Minor cleanup to HostIrEvaluator #5750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
!test |
|
Review updated until commit 33b4043 Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
| ⚡ Recommended focus areas for review |
Potential Behavioral Change
communicator_->is_available() check. This could potentially change behavior when the communicator exists but is not available. The new code assumes that if communicator is not null, it should be used for device ID, regardless of availability status. |
Greptile SummaryThis PR performs minor cleanup to the HostIrEvaluator class, with changes focused on code style consistency and simplification. The main changes include:
All changes appear to be intentional cleanup with minimal functional impact. The previous comments raised concerns about the Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant caller as Caller
participant constructor as HostIrEvaluator::ctor
participant validate as validate()
participant getCUDAStream as getCUDAStream()
participant pool as c10::cuda::getStreamFromPool()
caller->>constructor: new HostIrEvaluator(container, communicator)
constructor->>constructor: Set my_local_device_index_<br/>(communicator ? communicator->local_rank() : 0)
constructor->>constructor: Set device_index<br/>(communicator ? communicator->deviceId() : 0)
constructor->>validate: validate()
validate->>validate: Check requested_n_gpus
alt requested_n_gpus > 1
validate->>validate: NVF_CHECK(communicator != nullptr)
validate->>validate: NVF_CHECK(communicator->is_available())
else requested_n_gpus == 1
validate->>validate: return early
end
constructor->>constructor: Initialize default stream
caller->>getCUDAStream: getCUDAStream(stream)
getCUDAStream->>pool: getStreamFromPool()<br/>(no device index)
pool->>getCUDAStream: CUDAStream (uses current device context)
getCUDAStream->>caller: return CUDAStream
|
| (communicator_ != nullptr && communicator_->is_available()) | ||
| ? communicator_->deviceId() | ||
| : 0; | ||
| communicator_ == nullptr ? 0 : communicator_->deviceId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deviceId() returns 0 when the communicator isn't available:
Fuser/csrc/multidevice/communicator.cpp
Line 179 in d395676
| rank_(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 2 comments
csrc/host_ir/evaluator.cpp
Outdated
| auto [it, inserted] = | ||
| streams_.try_emplace(stream_key, c10::cuda::getStreamFromPool()); | ||
| return it->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The refactored getStreamFromPool() call is missing the device index parameter that was previously being passed. The old code explicitly specified the device index to ensure the stream was created on the correct device:
c10::cuda::getStreamFromPool(/*isHighPriority=*/false, static_cast<c10::DeviceIndex>(i))The new code calls it without arguments:
c10::cuda::getStreamFromPool()In multi-device scenarios, this could cause streams to be created on the wrong device, which would lead to incorrect device placement of operations and potential crashes. The device index calculation (communicator_->deviceId()) was intentionally removed, but without it, the stream pool may not use the correct device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior is OK -- it'll use the device associated with the default stream.
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
csrc/host_ir/evaluator.cpp
Outdated
| auto [it, inserted] = | ||
| streams_.try_emplace(stream_key, c10::cuda::getStreamFromPool()); | ||
| return it->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The getStreamFromPool() call is now missing the device index parameter. Previously: getStreamFromPool(/*isHighPriority=*/false, static_cast<c10::DeviceIndex>(i)) where i was the device ID. Without explicitly specifying the device, the stream pool may not create streams on the correct device in multi-device scenarios, relying instead on implicit device context which may be unset or incorrect.
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| const DeviceIdxType device_index = | ||
| (communicator_ != nullptr && communicator_->is_available()) | ||
| ? communicator_->deviceId() | ||
| : 0; | ||
| communicator_ == nullptr ? 0 : communicator_->deviceId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Changed from (communicator_ != nullptr && communicator_->is_available()) to just communicator_ == nullptr. This removes the availability check, but the validate() method at line 131 explicitly requires communicator_->is_available() when requested_n_gpus > 1. If a communicator exists but is unavailable, calling deviceId() on it may return stale or incorrect device information, causing streams to be created on the wrong device.
| auto it = streams_.find(stream_key); | ||
| if (it == streams_.end()) { | ||
| it = streams_.emplace(stream_key, c10::cuda::getStreamFromPool()).first; | ||
| } | ||
| return streams_.at(stream_key); | ||
| return it->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The getStreamFromPool() call now lacks an explicit device index parameter. Previously, it passed communicator_->deviceId() to ensure streams were created on the correct device. Without it, the function relies on the implicit current CUDA device context. In multi-device scenarios, if the current context differs from the intended device, streams will be created on the wrong device. The validate() method ensures communicator availability for multi-GPU setups, so the previous safety check pattern should be preserved here to match that requirement.
|
!test |
No description provided.