Skip to content

Conversation

@0ax1
Copy link
Contributor

@0ax1 0ax1 commented Jan 16, 2026

Example run on an A10:

FoR_cuda/u32_FoR/1K     time:   [6.2240 µs 6.2360 µs 6.2574 µs]
                        thrpt:  [609.63 MiB/s 611.72 MiB/s 612.90 MiB/s]
                 change:
                        time:   [−0.1225% +0.7492% +1.6114%] (p = 0.12 > 0.05)
                        thrpt:  [−1.5859% −0.7436% +0.1226%]
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
FoR_cuda/u32_FoR/10K    time:   [8.4275 µs 8.4335 µs 8.4386 µs]
                        thrpt:  [4.4146 GiB/s 4.4173 GiB/s 4.4204 GiB/s]
                 change:
                        time:   [−0.2470% −0.0898% +0.0725%] (p = 0.31 > 0.05)
                        thrpt:  [−0.0724% +0.0899% +0.2476%]
                        No change in performance detected.
FoR_cuda/u32_FoR/100K   time:   [11.292 µs 11.324 µs 11.370 µs]
                        thrpt:  [32.765 GiB/s 32.899 GiB/s 32.991 GiB/s]
                 change:
                        time:   [−4.0893% −3.6758% −3.2679%] (p = 0.00 < 0.05)
                        thrpt:  [+3.3783% +3.8161% +4.2637%]
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high severe
FoR_cuda/u32_FoR/1M     time:   [15.445 µs 15.526 µs 15.615 µs]
                        thrpt:  [238.57 GiB/s 239.94 GiB/s 241.19 GiB/s]
                 change:
                        time:   [−1.8313% −1.2258% −0.6705%] (p = 0.00 < 0.05)
                        thrpt:  [+0.6751% +1.2410% +1.8655%]
                        Change within noise threshold.
FoR_cuda/u32_FoR/10M    time:   [180.16 µs 180.58 µs 180.79 µs]
                        thrpt:  [206.05 GiB/s 206.30 GiB/s 206.78 GiB/s]
                 change:
                        time:   [+1.0296% +1.4123% +1.6948%] (p = 0.00 < 0.05)
                        thrpt:  [−1.6666% −1.3926% −1.0191%]
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low severe
FoR_cuda/u32_FoR/100M   time:   [1.7089 ms 1.7110 ms 1.7139 ms]
                        thrpt:  [217.36 GiB/s 217.73 GiB/s 218.00 GiB/s]
                 change:
                        time:   [−0.3698% −0.0661% +0.3273%] (p = 0.74 > 0.05)
                        thrpt:  [−0.3263% +0.0662% +0.3711%]
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

Signed-off-by: Alexander Droste <[email protected]>
@0ax1 0ax1 added the chore Release label indicating a trivial change label Jan 16, 2026
Signed-off-by: Alexander Droste <[email protected]>
0ax1 added 2 commits January 16, 2026 15:58
Signed-off-by: Alexander Droste <[email protected]>
Signed-off-by: Alexander Droste <[email protected]>
@0ax1 0ax1 enabled auto-merge (squash) January 16, 2026 15:59
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.86%. Comparing base (12ba988) to head (a3bfa65).
⚠️ Report is 1 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@0ax1 0ax1 merged commit 5f72ad2 into develop Jan 16, 2026
47 of 48 checks passed
@0ax1 0ax1 deleted the ad/cuda-bench-infra branch January 16, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Release label indicating a trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants