Replace hardcoded CU/SM values with dynamic detection using PyTorch #198

Copilot · 2025-10-06T00:21:22Z

Replaced hardcoded CU values with dynamic detection:

Summary

Eliminated code duplication by hoisting the power-of-2 calculation:

example 11 (producer_consumer): Calculated next_pow2 once and reused for both gemm_sms and comm_sms
example 12 (bulk_synchronous): Calculated next_pow2 once and reused for both gemm_sms and comm_sms

Before:

if args["gemm_sms"] is None:
    args["gemm_sms"] = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1
if args["comm_sms"] is None:
    args["comm_sms"] = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1

After:

next_pow2 = 2 ** int(math.log2(cu_count)) if cu_count > 0 else 1

if args["gemm_sms"] is None:
    args["gemm_sms"] = next_pow2
if args["comm_sms"] is None:
    args["comm_sms"] = next_pow2  # or cu_count - next_pow2 for example 11

All command-line defaults remain None (auto-detected), with manual override still available via CLI arguments.

Original prompt

This section details on the original issue you should resolve

<issue_title>Do not use hardcoded number of CUs</issue_title>
<issue_description>The code base has hardcoded values from number of CUs (or SMs) of 304 and other places where we define gemm_sms to be the next smaller power of 2 number (e.g., 256 When the number of CUs is 304 then comm_sms is 48).

All these numbers need to be programmatically set (even for default argument values). Use the hip module to get the number of CUs.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #197

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: mawad-amd <[email protected]>

iris/hip.py

…ction Co-authored-by: mawad-amd <[email protected]>

examples/08_gemm_atomics_all_reduce/benchmark.py

… comm Co-authored-by: mawad-amd <[email protected]>

examples/07_gemm_all_scatter/benchmark.py

examples/08_gemm_atomics_all_reduce/benchmark.py

examples/11_gemm_all_scatter_producer_consumer/benchmark.py

examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py

examples/benchmark/bench_all_shapes.py

scripts/link_bandwidth.py

… to top Co-authored-by: mawad-amd <[email protected]>

benchmark/examples/benchmark_all_gather_gemm_pull.py

benchmark/examples/benchmark_all_gather_gemm_push.py

Co-authored-by: mawad-amd <[email protected]>

…66a2

examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py

Copilot

Pull Request Overview

This PR replaces hardcoded CU (Compute Unit) and SM (Streaming Multiprocessor) values with dynamic detection using PyTorch's CUDA API. Instead of using fixed values like 304 CUs or 288 SMs, the code now auto-detects the GPU's compute capabilities at runtime.

Key changes:

Updated all benchmark scripts to use torch.cuda.get_device_properties().multi_processor_count for dynamic CU detection
Changed default argument values from hardcoded numbers to None with auto-detection fallbacks
Added proper math calculations for SM allocation (e.g., next smaller power of 2 for workgroup specialization)

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
scripts/link_bandwidth.py	Added dynamic CU detection with fallback to hardcoded MI300 default
examples/benchmark/bench_all_shapes.py	Enhanced CU detection with PyTorch API and partition-based fallbacks
examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py	Updated to use auto-detected SM values with math-based calculations
examples/11_gemm_all_scatter_producer_consumer/benchmark.py	Added dynamic SM detection with leftover calculation for comm_sms
examples/10_gemm_all_scatter_wg_specialization/benchmark.py	Updated to auto-detect total and GEMM SMs using power-of-2 logic
examples/09_gemm_one_shot_all_reduce/benchmark.py	Replaced hardcoded SM values with dynamic detection
examples/08_gemm_atomics_all_reduce/benchmark.py	Updated to use auto-detected SM values for all-reduce operations
examples/07_gemm_all_scatter/benchmark.py	Added dynamic CU detection for persistent GEMM algorithm
benchmark/examples/benchmark_all_gather_gemm_push.py	Updated to auto-detect SM count with proper args dict updating
benchmark/examples/benchmark_all_gather_gemm_pull.py	Added dynamic SM detection with fallback to provided values

Co-authored-by: mawad-amd <[email protected]>

…66a2

Initial plan

a3249a4

Copilot AI assigned Copilot and mawad-amd Oct 6, 2025

Copilot started work on behalf of mawad-amd October 6, 2025 00:22 View session

Add dynamic CU count detection and helper functions

af9d39c

Co-authored-by: mawad-amd <[email protected]>

Copilot AI changed the title ~~[WIP] Do not use hardcoded number of CUs~~ Replace hardcoded CU/SM values with dynamic detection using HIP runtime Oct 6, 2025

Copilot AI requested a review from mawad-amd October 6, 2025 00:38

Copilot finished work on behalf of mawad-amd October 6, 2025 00:38

mawad-amd reviewed Oct 6, 2025

View reviewed changes

iris/hip.py Outdated Show resolved Hide resolved

Copilot started work on behalf of mawad-amd October 6, 2025 00:44 View session

Inline SM calculation logic at call sites instead of using helper fun…

9381677

…ction Co-authored-by: mawad-amd <[email protected]>

Copilot AI requested a review from mawad-amd October 6, 2025 00:50

Copilot finished work on behalf of mawad-amd October 6, 2025 00:50

mawad-amd reviewed Oct 8, 2025

View reviewed changes

examples/08_gemm_atomics_all_reduce/benchmark.py Outdated Show resolved Hide resolved

Copilot started work on behalf of mawad-amd October 8, 2025 20:59 View session

Fix all_reduce SM calculation: use next power of 2 for GEMM, rest for…

0de4d43

… comm Co-authored-by: mawad-amd <[email protected]>

Copilot AI requested a review from mawad-amd October 8, 2025 21:02

Copilot finished work on behalf of mawad-amd October 8, 2025 21:02

mawad-amd requested changes Oct 8, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 8, 2025 21:25 View session

Use torch.cuda API instead of iris.hip.get_cu_count, move math import…

0012367

… to top Co-authored-by: mawad-amd <[email protected]>

Copilot AI changed the title ~~Replace hardcoded CU/SM values with dynamic detection using HIP runtime~~ Replace hardcoded CU/SM values with dynamic detection using PyTorch Oct 8, 2025

Copilot AI requested a review from mawad-amd October 8, 2025 21:32

Copilot finished work on behalf of mawad-amd October 8, 2025 21:32

mawad-amd reviewed Oct 8, 2025

View reviewed changes

benchmark/examples/benchmark_all_gather_gemm_pull.py Show resolved Hide resolved

Copilot started work on behalf of mawad-amd October 8, 2025 21:44 View session

mawad-amd reviewed Oct 8, 2025

View reviewed changes

benchmark/examples/benchmark_all_gather_gemm_pull.py Outdated Show resolved Hide resolved

mawad-amd reviewed Oct 8, 2025

View reviewed changes

benchmark/examples/benchmark_all_gather_gemm_push.py Outdated Show resolved Hide resolved

Update run_args dict with auto-detected num_sms for correct logging

95c653b

Co-authored-by: mawad-amd <[email protected]>

Copilot AI requested a review from mawad-amd October 8, 2025 21:48

Copilot finished work on behalf of mawad-amd October 8, 2025 21:48

Copilot started work on behalf of mawad-amd October 8, 2025 21:49 View session

Remove unused get_cu_count imports from benchmark files

71404e4

Co-authored-by: mawad-amd <[email protected]>

Copilot finished work on behalf of mawad-amd October 8, 2025 21:51

Merge branch 'main' into copilot/fix-1c88c8c0-e834-48ff-9193-74430e60…

ed1809f

…66a2

mawad-amd reviewed Oct 8, 2025

View reviewed changes

examples/12_gemm_all_scatter_bulk_synchronous/benchmark.py Outdated Show resolved Hide resolved

mawad-amd approved these changes Oct 8, 2025

View reviewed changes

mawad-amd marked this pull request as ready for review October 8, 2025 22:02

mawad-amd requested review from neoblizz and BKP as code owners October 8, 2025 22:02

Copilot AI review requested due to automatic review settings October 8, 2025 22:02

Copilot AI reviewed Oct 8, 2025

View reviewed changes

Copilot started work on behalf of mawad-amd October 8, 2025 22:02 View session

Hoist repeated power-of-2 calculation to avoid duplication

ad29338

Co-authored-by: mawad-amd <[email protected]>

Copilot AI requested a review from mawad-amd October 8, 2025 22:06

Copilot finished work on behalf of mawad-amd October 8, 2025 22:06

mawad-amd approved these changes Oct 8, 2025

View reviewed changes

Merge branch 'main' into copilot/fix-1c88c8c0-e834-48ff-9193-74430e60…

6951c96

…66a2

mawad-amd merged commit 4b181a3 into main Oct 9, 2025
17 checks passed

mawad-amd deleted the copilot/fix-1c88c8c0-e834-48ff-9193-74430e6066a2 branch October 9, 2025 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace hardcoded CU/SM values with dynamic detection using PyTorch #198

Replace hardcoded CU/SM values with dynamic detection using PyTorch #198

Uh oh!

Copilot AI commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Replace hardcoded CU/SM values with dynamic detection using PyTorch #198

Replace hardcoded CU/SM values with dynamic detection using PyTorch #198

Uh oh!

Conversation

Copilot AI commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Oct 6, 2025 •

edited

Loading