Skip to content

Conversation

@wujingyue
Copy link
Collaborator

No description provided.

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue wujingyue requested a review from Priya2698 January 3, 2026 02:49
@github-actions
Copy link

github-actions bot commented Jan 3, 2026

Review updated until commit 33b4043

Description

  • Simplified conditional logic in HostIrEvaluator constructor for device index initialization

  • Removed unnecessary is_available() checks in device index calculations

  • Streamlined getCUDAStream method by removing device index specification and using iterator approach

  • Removed unnecessary include of "exceptions.h" from communicator.h

Changes walkthrough

Relevant files
Enhancement
evaluator.cpp
Simplify conditional logic and stream handling                     

csrc/host_ir/evaluator.cpp

  • Simplified my_local_device_index_ initialization using explicit
    nullptr check
  • Removed is_available() check from device_index calculation in
    constructor
  • Streamlined getCUDAStream method by removing device index logic and
    using iterator approach
  • Improved code readability by using more concise conditional
    expressions
  • +8/-13   
    communicator.h
    Remove unnecessary include                                                             

    csrc/multidevice/communicator.h

    • Removed unnecessary include of "exceptions.h" header file
    +0/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Potential Behavioral Change

    The device index calculation in the constructor has been simplified by removing the communicator_->is_available() check. This could potentially change behavior when the communicator exists but is not available. The new code assumes that if communicator is not null, it should be used for device ID, regardless of availability status.

    communicator_ == nullptr ? 0 : communicator_->deviceId();
    Stream Device Assignment

    The getCUDAStream method now creates streams without explicitly specifying the device index, unlike the previous implementation that passed the device index to getStreamFromPool. This could potentially change which device the streams are created on, especially in multi-device scenarios.

    it = streams_.emplace(stream_key, c10::cuda::getStreamFromPool()).first;

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Jan 3, 2026

    Greptile Summary

    This PR performs minor cleanup to the HostIrEvaluator class, with changes focused on code style consistency and simplification. The main changes include:

    • Refactored ternary operators for consistency (lines 49-50, 53-54): Changed from condition ? expr : 0 to condition == nullptr ? 0 : expr pattern
    • Removed is_available() check from device index computation in constructor (line 54): The removal is safe because the validate() method (called immediately after in constructor) guarantees that if requested_n_gpus > 1, the communicator is available. For single-GPU scenarios, the check was unnecessary.
    • Simplified getCUDAStream() method (lines 150-154): Refactored to use iterator's emplace for cleaner insertion, and removed explicit device index parameter from getStreamFromPool() call. Per previous review feedback, the default behavior is confirmed to be safe.
    • Removed unused include of exceptions.h from communicator.h header file

    All changes appear to be intentional cleanup with minimal functional impact. The previous comments raised concerns about the is_available() removal and device parameter handling, but these have been addressed by the developer with explicit confirmation on the safety of the default stream behavior.

    Confidence Score: 4/5

    • This PR is safe to merge with minor considerations around the implicit device context handling in streams.
    • Score reflects that these are minor cleanup changes with low functional risk. The removal of is_available() check is safe due to the validate() method's guarantees. The removal of explicit device index parameter from getStreamFromPool() has been confirmed by previous developer review. The main uncertainty is whether the implicit current CUDA device context is always correctly set when getCUDAStream() is called in all code paths, but this appears to have been verified in prior reviews.
    • No files require special attention. The changes are straightforward refactorings with safety implications already addressed in previous review discussions.

    Important Files Changed

    Filename Overview
    csrc/host_ir/evaluator.cpp Refactored ternary operators for consistency and simplified getCUDAStream method. Removed is_available() check when computing device index in constructor (safe because validate() guarantees availability for multi-GPU cases, and single-GPU cases don't require the check). Removed explicit device parameter from getStreamFromPool() call, relying on implicit current CUDA device context. Changes are mostly style improvements with senior developer confirmation on stream behavior.
    csrc/multidevice/communicator.h Removed unused #include directive for exceptions.h. This is a straightforward cleanup with no functional impact.

    Sequence Diagram

    sequenceDiagram
        participant caller as Caller
        participant constructor as HostIrEvaluator::ctor
        participant validate as validate()
        participant getCUDAStream as getCUDAStream()
        participant pool as c10::cuda::getStreamFromPool()
    
        caller->>constructor: new HostIrEvaluator(container, communicator)
        constructor->>constructor: Set my_local_device_index_<br/>(communicator ? communicator->local_rank() : 0)
        constructor->>constructor: Set device_index<br/>(communicator ? communicator->deviceId() : 0)
        constructor->>validate: validate()
        validate->>validate: Check requested_n_gpus
        alt requested_n_gpus > 1
            validate->>validate: NVF_CHECK(communicator != nullptr)
            validate->>validate: NVF_CHECK(communicator->is_available())
        else requested_n_gpus == 1
            validate->>validate: return early
        end
        constructor->>constructor: Initialize default stream
        
        caller->>getCUDAStream: getCUDAStream(stream)
        getCUDAStream->>pool: getStreamFromPool()<br/>(no device index)
        pool->>getCUDAStream: CUDAStream (uses current device context)
        getCUDAStream->>caller: return CUDAStream
    
    Loading

    (communicator_ != nullptr && communicator_->is_available())
    ? communicator_->deviceId()
    : 0;
    communicator_ == nullptr ? 0 : communicator_->deviceId();
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    deviceId() returns 0 when the communicator isn't available:

    @wujingyue wujingyue changed the title Minor cleanup to Communicator Minor cleanup to HostIrEvaluator Jan 4, 2026
    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    3 files reviewed, 2 comments

    Edit Code Review Agent Settings | Greptile

    Comment on lines 150 to 152
    auto [it, inserted] =
    streams_.try_emplace(stream_key, c10::cuda::getStreamFromPool());
    return it->second;
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    logic: The refactored getStreamFromPool() call is missing the device index parameter that was previously being passed. The old code explicitly specified the device index to ensure the stream was created on the correct device:

    c10::cuda::getStreamFromPool(/*isHighPriority=*/false, static_cast<c10::DeviceIndex>(i))

    The new code calls it without arguments:

    c10::cuda::getStreamFromPool()

    In multi-device scenarios, this could cause streams to be created on the wrong device, which would lead to incorrect device placement of operations and potential crashes. The device index calculation (communicator_-&gt;deviceId()) was intentionally removed, but without it, the stream pool may not use the correct device.

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    The default behavior is OK -- it'll use the device associated with the default stream.

    @wujingyue
    Copy link
    Collaborator Author

    !test

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    2 files reviewed, 2 comments

    Edit Code Review Agent Settings | Greptile

    Comment on lines 150 to 152
    auto [it, inserted] =
    streams_.try_emplace(stream_key, c10::cuda::getStreamFromPool());
    return it->second;
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    logic: The getStreamFromPool() call is now missing the device index parameter. Previously: getStreamFromPool(/*isHighPriority=*/false, static_cast<c10::DeviceIndex>(i)) where i was the device ID. Without explicitly specifying the device, the stream pool may not create streams on the correct device in multi-device scenarios, relying instead on implicit device context which may be unset or incorrect.

    @wujingyue
    Copy link
    Collaborator Author

    !test

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    2 files reviewed, 2 comments

    Edit Code Review Agent Settings | Greptile

    Comment on lines 53 to +54
    const DeviceIdxType device_index =
    (communicator_ != nullptr && communicator_->is_available())
    ? communicator_->deviceId()
    : 0;
    communicator_ == nullptr ? 0 : communicator_->deviceId();
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    logic: Changed from (communicator_ != nullptr && communicator_->is_available()) to just communicator_ == nullptr. This removes the availability check, but the validate() method at line 131 explicitly requires communicator_->is_available() when requested_n_gpus > 1. If a communicator exists but is unavailable, calling deviceId() on it may return stale or incorrect device information, causing streams to be created on the wrong device.

    Comment on lines +150 to +154
    auto it = streams_.find(stream_key);
    if (it == streams_.end()) {
    it = streams_.emplace(stream_key, c10::cuda::getStreamFromPool()).first;
    }
    return streams_.at(stream_key);
    return it->second;
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    logic: The getStreamFromPool() call now lacks an explicit device index parameter. Previously, it passed communicator_->deviceId() to ensure streams were created on the correct device. Without it, the function relies on the implicit current CUDA device context. In multi-device scenarios, if the current context differs from the intended device, streams will be created on the wrong device. The validate() method ensures communicator availability for multi-GPU setups, so the previous safety check pattern should be preserved here to match that requirement.

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue merged commit a19cc32 into main Jan 5, 2026
    56 of 57 checks passed
    @wujingyue wujingyue deleted the wjy/comm branch January 5, 2026 20:27
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants