Skip to content

Conversation

wsmoses
Copy link
Contributor

@wsmoses wsmoses commented Jul 1, 2025

now that @vchuravy fixed the GPUCompiler compat

@wsmoses wsmoses requested a review from maleadt July 1, 2025 23:54
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 2afe6c6 Previous: bb0a27f Ratio
latency/precompile 56628328674.5 ns 56531370914 ns 1.00
latency/ttfp 8465367383.5 ns 8348294090 ns 1.01
latency/import 4530861281 ns 4485476728 ns 1.01
integration/volumerhs 9606804.5 ns 9611273 ns 1.00
integration/byval/slices=1 146932 ns 146807 ns 1.00
integration/byval/slices=3 426049 ns 426057.5 ns 1.00
integration/byval/reference 145114 ns 145125 ns 1.00
integration/byval/slices=2 286406 ns 286538 ns 1.00
integration/cudadevrt 103457 ns 103588 ns 1.00
kernel/indexing 14186 ns 14206 ns 1.00
kernel/indexing_checked 15006.5 ns 14985 ns 1.00
kernel/occupancy 673.343949044586 ns 669.9113924050633 ns 1.01
kernel/launch 2168.6666666666665 ns 2201.5555555555557 ns 0.99
kernel/rand 17115 ns 15084 ns 1.13
array/reverse/1d 20177.5 ns 20085 ns 1.00
array/reverse/2dL_inplace 66812 ns 66806.5 ns 1.00
array/reverse/1dL 70343 ns 70229 ns 1.00
array/reverse/2d 21840 ns 21647 ns 1.01
array/reverse/1d_inplace 11477 ns 9989.5 ns 1.15
array/reverse/2d_inplace 13255 ns 13425 ns 0.99
array/reverse/2dL 73906 ns 73806 ns 1.00
array/reverse/1dL_inplace 66901 ns 66788 ns 1.00
array/copy 20827 ns 21013 ns 0.99
array/iteration/findall/int 156962 ns 158131 ns 0.99
array/iteration/findall/bool 139393.5 ns 139780 ns 1.00
array/iteration/findfirst/int 161049 ns 161255 ns 1.00
array/iteration/findfirst/bool 161777.5 ns 162292 ns 1.00
array/iteration/scalar 71086 ns 74249 ns 0.96
array/iteration/logical 214952 ns 215568.5 ns 1.00
array/iteration/findmin/1d 50399 ns 50621 ns 1.00
array/iteration/findmin/2d 96313 ns 96593 ns 1.00
array/reductions/reduce/Int64/1d 43812 ns 43717 ns 1.00
array/reductions/reduce/Int64/dims=1 45159.5 ns 47315 ns 0.95
array/reductions/reduce/Int64/dims=2 61368 ns 61512 ns 1.00
array/reductions/reduce/Int64/dims=1L 89073 ns 89114 ns 1.00
array/reductions/reduce/Int64/dims=2L 87814 ns 88127 ns 1.00
array/reductions/reduce/Float32/1d 36495 ns 38052 ns 0.96
array/reductions/reduce/Float32/dims=1 42038 ns 42212.5 ns 1.00
array/reductions/reduce/Float32/dims=2 59995 ns 59909 ns 1.00
array/reductions/reduce/Float32/dims=1L 52466 ns 52382 ns 1.00
array/reductions/reduce/Float32/dims=2L 72317 ns 72498 ns 1.00
array/reductions/mapreduce/Int64/1d 43193 ns 43848 ns 0.99
array/reductions/mapreduce/Int64/dims=1 47375 ns 55194.5 ns 0.86
array/reductions/mapreduce/Int64/dims=2 61631 ns 61712 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 89051 ns 88933 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88042 ns 88319 ns 1.00
array/reductions/mapreduce/Float32/1d 37264.5 ns 37106 ns 1.00
array/reductions/mapreduce/Float32/dims=1 52288.5 ns 41792 ns 1.25
array/reductions/mapreduce/Float32/dims=2 60482 ns 60056 ns 1.01
array/reductions/mapreduce/Float32/dims=1L 53010 ns 52707 ns 1.01
array/reductions/mapreduce/Float32/dims=2L 72741 ns 72500 ns 1.00
array/broadcast 20175 ns 20185 ns 1.00
array/copyto!/gpu_to_gpu 11462 ns 13288 ns 0.86
array/copyto!/cpu_to_gpu 214887 ns 216328 ns 0.99
array/copyto!/gpu_to_cpu 283937 ns 284828 ns 1.00
array/accumulate/Int64/1d 124699 ns 124343 ns 1.00
array/accumulate/Int64/dims=1 83475 ns 83433 ns 1.00
array/accumulate/Int64/dims=2 158043.5 ns 157726 ns 1.00
array/accumulate/Int64/dims=1L 1708810 ns 1710225.5 ns 1.00
array/accumulate/Int64/dims=2L 966321.5 ns 966516 ns 1.00
array/accumulate/Float32/1d 109353 ns 109351 ns 1.00
array/accumulate/Float32/dims=1 80459 ns 80425 ns 1.00
array/accumulate/Float32/dims=2 147423.5 ns 147561 ns 1.00
array/accumulate/Float32/dims=1L 1618073 ns 1619125 ns 1.00
array/accumulate/Float32/dims=2L 697852.5 ns 698195 ns 1.00
array/construct 1300.7 ns 1266.3 ns 1.03
array/random/randn/Float32 47962.5 ns 45130 ns 1.06
array/random/randn!/Float32 25010 ns 25086 ns 1.00
array/random/rand!/Int64 27304 ns 27356 ns 1.00
array/random/rand!/Float32 8783 ns 9035 ns 0.97
array/random/rand/Int64 29896 ns 29946 ns 1.00
array/random/rand/Float32 13150 ns 13310 ns 0.99
array/permutedims/4d 59830 ns 59701 ns 1.00
array/permutedims/2d 54018 ns 53857.5 ns 1.00
array/permutedims/3d 54666 ns 54922.5 ns 1.00
array/sorting/1d 2756855 ns 2758155 ns 1.00
array/sorting/by 3343979.5 ns 3344572 ns 1.00
array/sorting/2d 1080937.5 ns 1081253.5 ns 1.00
cuda/synchronization/stream/auto 1021.9 ns 1040.7 ns 0.98
cuda/synchronization/stream/nonblocking 7116.9 ns 7682 ns 0.93
cuda/synchronization/stream/blocking 814.989247311828 ns 807.7010309278351 ns 1.01
cuda/synchronization/context/auto 1161.5 ns 1187.8 ns 0.98
cuda/synchronization/context/nonblocking 8780.9 ns 8672.3 ns 1.01
cuda/synchronization/context/blocking 895.1458333333334 ns 915.4772727272727 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Jul 2, 2025

Enzyme CI fails.

@maleadt maleadt added ci Everything related to continuous integration. needs changes Changes are needed. labels Jul 2, 2025
@wsmoses
Copy link
Contributor Author

wsmoses commented Jul 2, 2025

@vchuravy looks like your fix missed the tape_type function?

extensions/enzyme: Error During Test at /var/lib/buildkite-agent/builds/gpuci-3/julialang/cuda-dot-jl/test/extensions/enzyme.jl:51
--
  | Got exception outside of a @test
  | UndefVarError: `codegen` not defined in `Enzyme.Compiler`
  | Stacktrace:
  | [1] #91
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/Enzyme.jl:1311
  | [2] #JuliaContext#152
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/Ecaql/src/driver.jl:34
  | [3] JuliaContext
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/packages/GPUCompiler/Ecaql/src/driver.jl:25 [inlined]
  | [4] tape_type
  | @ ~/.cache/julia-buildkite-plugin/depots/3cc01fab-3357-4a7a-9294-cde2d3115a97/dev/Enzyme/src/Enzyme.jl:1310 [inlined]


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Everything related to continuous integration. needs changes Changes are needed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants