Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 6, 2025

Replaced hardcoded CU values with dynamic detection:

  • Add helper functions to iris/hip.py for computing default SM values based on CU count
  • Update examples/08_gemm_atomics_all_reduce/benchmark.py to use dynamic defaults
  • Update examples/09_gemm_one_shot_all_reduce/benchmark.py to use dynamic defaults
  • Update examples/07_gemm_all_scatter/benchmark.py to use dynamic defaults
  • Update examples/10_gemm_all_scatter_wg_specialization/benchmark.py to use dynamic defaults
  • Update examples/11_gemm_all_scatter_producer_consumer/benchmark.py to use dynamic defaults
  • Update examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py to use dynamic defaults
  • Update examples/benchmark/bench_all_shapes.py to use dynamic defaults
  • Update benchmark/examples/benchmark_all_gather_gemm_pull.py to use dynamic defaults
  • Update benchmark/examples/benchmark_all_gather_gemm_push.py to use dynamic defaults
  • Update scripts/link_bandwidth.py to use dynamic defaults
  • Test changes with linting and formatting
  • Verify examples syntax is valid
  • Address review feedback: Inline logic at call sites instead of using helper function
  • Fix all_reduce SM calculation: use simple power-of-2 formula
  • Use torch.cuda API instead of iris custom function, move math import to top
  • Update args dict with auto-detected values for correct logging
  • Remove unused get_cu_count imports from benchmark files
  • Hoist repeated power-of-2 calculation to avoid duplication

Summary

Eliminated code duplication by hoisting the power-of-2 calculation:

  • example 11 (producer_consumer): Calculated next_pow2 once and reused for both gemm_sms and comm_sms
  • example 12 (bulk_synchronous): Calculated next_pow2 once and reused for both gemm_sms and comm_sms

Before:

if args["gemm_sms"] is None:
    args["gemm_sms"] = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1
if args["comm_sms"] is None:
    args["comm_sms"] = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1

After:

next_pow2 = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1

if args["gemm_sms"] is None:
    args["gemm_sms"] = next_pow2
if args["comm_sms"] is None:
    args["comm_sms"] = next_pow2  # or cu_count - next_pow2 for example 11

All command-line defaults remain None (auto-detected), with manual override still available via CLI arguments.

Original prompt

This section details on the original issue you should resolve

<issue_title>Do not use hardcoded number of CUs</issue_title>
<issue_description>The code base has hardcoded values from number of CUs (or SMs) of 304 and other places where we define gemm_sms to be the next smaller power of 2 number (e.g., 256 When the number of CUs is 304 then comm_sms is 48).

All these numbers need to be programmatically set (even for default argument values). Use the hip module to get the number of CUs.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #197

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@Copilot Copilot AI changed the title [WIP] Do not use hardcoded number of CUs Replace hardcoded CU/SM values with dynamic detection using HIP runtime Oct 6, 2025
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 00:38
Copilot finished work on behalf of mawad-amd October 6, 2025 00:38
@Copilot Copilot AI requested a review from mawad-amd October 6, 2025 00:50
Copilot finished work on behalf of mawad-amd October 6, 2025 00:50
@Copilot Copilot AI requested a review from mawad-amd October 8, 2025 21:02
Copilot finished work on behalf of mawad-amd October 8, 2025 21:02
@Copilot Copilot AI changed the title Replace hardcoded CU/SM values with dynamic detection using HIP runtime Replace hardcoded CU/SM values with dynamic detection using PyTorch Oct 8, 2025
@Copilot Copilot AI requested a review from mawad-amd October 8, 2025 21:32
Copilot finished work on behalf of mawad-amd October 8, 2025 21:32
@Copilot Copilot AI requested a review from mawad-amd October 8, 2025 21:48
Copilot finished work on behalf of mawad-amd October 8, 2025 21:48
Copilot finished work on behalf of mawad-amd October 8, 2025 21:51
@mawad-amd mawad-amd marked this pull request as ready for review October 8, 2025 22:02
@mawad-amd mawad-amd requested review from neoblizz and BKP as code owners October 8, 2025 22:02
@Copilot Copilot AI review requested due to automatic review settings October 8, 2025 22:02
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces hardcoded CU (Compute Unit) and SM (Streaming Multiprocessor) values with dynamic detection using PyTorch's CUDA API. Instead of using fixed values like 304 CUs or 288 SMs, the code now auto-detects the GPU's compute capabilities at runtime.

Key changes:

  • Updated all benchmark scripts to use torch.cuda.get_device_properties().multi_processor_count for dynamic CU detection
  • Changed default argument values from hardcoded numbers to None with auto-detection fallbacks
  • Added proper math calculations for SM allocation (e.g., next smaller power of 2 for workgroup specialization)

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/link_bandwidth.py Added dynamic CU detection with fallback to hardcoded MI300 default
examples/benchmark/bench_all_shapes.py Enhanced CU detection with PyTorch API and partition-based fallbacks
examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py Updated to use auto-detected SM values with math-based calculations
examples/11_gemm_all_scatter_producer_consumer/benchmark.py Added dynamic SM detection with leftover calculation for comm_sms
examples/10_gemm_all_scatter_wg_specialization/benchmark.py Updated to auto-detect total and GEMM SMs using power-of-2 logic
examples/09_gemm_one_shot_all_reduce/benchmark.py Replaced hardcoded SM values with dynamic detection
examples/08_gemm_atomics_all_reduce/benchmark.py Updated to use auto-detected SM values for all-reduce operations
examples/07_gemm_all_scatter/benchmark.py Added dynamic CU detection for persistent GEMM algorithm
benchmark/examples/benchmark_all_gather_gemm_push.py Updated to auto-detect SM count with proper args dict updating
benchmark/examples/benchmark_all_gather_gemm_pull.py Added dynamic SM detection with fallback to provided values

@Copilot Copilot AI requested a review from mawad-amd October 8, 2025 22:06
Copilot finished work on behalf of mawad-amd October 8, 2025 22:06
@mawad-amd mawad-amd merged commit 4b181a3 into main Oct 9, 2025
17 checks passed
@mawad-amd mawad-amd deleted the copilot/fix-1c88c8c0-e834-48ff-9193-74430e6066a2 branch October 9, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not use hardcoded number of CUs
2 participants