Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to new alloc cache interface. #2614

Merged
merged 1 commit into from
Jan 15, 2025
Merged

Update to new alloc cache interface. #2614

merged 1 commit into from
Jan 15, 2025

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jan 10, 2025

@maleadt maleadt marked this pull request as ready for review January 15, 2025 09:27
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • src/array.jl
    • lines 88-89

Comment on lines +75 to +80
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do
DataRef(pool_free, pool_alloc(M, bufsize))
end
obj = new{T,N,M}(data, maxsize, 0, dims)
finalizer(unsafe_free!, obj)
return obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do
DataRef(pool_free, pool_alloc(M, bufsize))
end
obj = new{T,N,M}(data, maxsize, 0, dims)
finalizer(unsafe_free!, obj)
return obj
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do
DataRef(pool_free, pool_alloc(M, bufsize))
end
obj = new{T, N, M}(data, maxsize, 0, dims)
finalizer(unsafe_free!, obj)
return obj

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • src/array.jl
    • lines 88-89

Comment on lines +75 to +80
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do
DataRef(pool_free, pool_alloc(M, bufsize))
end
obj = new{T,N,M}(data, maxsize, 0, dims)
finalizer(unsafe_free!, obj)
return obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do
DataRef(pool_free, pool_alloc(M, bufsize))
end
obj = new{T,N,M}(data, maxsize, 0, dims)
finalizer(unsafe_free!, obj)
return obj
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do
DataRef(pool_free, pool_alloc(M, bufsize))
end
obj = new{T, N, M}(data, maxsize, 0, dims)
finalizer(unsafe_free!, obj)
return obj

@maleadt maleadt merged commit 8460cf8 into master Jan 15, 2025
2 of 3 checks passed
@maleadt maleadt deleted the tb/alloc_cache branch January 15, 2025 10:55
Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.59%. Comparing base (fe0419a) to head (eb33d4e).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2614       +/-   ##
===========================================
+ Coverage    5.96%   73.59%   +67.62%     
===========================================
  Files         157      157               
  Lines       15038    15230      +192     
===========================================
+ Hits          897    11208    +10311     
+ Misses      14141     4022    -10119     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: eb33d4e Previous: 774abc6 Ratio
latency/precompile 45397878341 ns 45532671418 ns 1.00
latency/ttfp 6420318339.5 ns 6382276443.5 ns 1.01
latency/import 3049558823 ns 3039078540.5 ns 1.00
integration/volumerhs 9567331 ns 9567627 ns 1.00
integration/byval/slices=1 146661 ns 146713 ns 1.00
integration/byval/slices=3 425651 ns 425286 ns 1.00
integration/byval/reference 144728 ns 144622 ns 1.00
integration/byval/slices=2 286144 ns 286077 ns 1.00
integration/cudadevrt 103442 ns 103283 ns 1.00
kernel/indexing 13963 ns 14073 ns 0.99
kernel/indexing_checked 15101 ns 15126 ns 1.00
kernel/occupancy 708.4836601307189 ns 710.5460992907801 ns 1.00
kernel/launch 2105.8 ns 2120.3 ns 0.99
kernel/rand 16810 ns 14743 ns 1.14
array/reverse/1d 19782 ns 19325.5 ns 1.02
array/reverse/2d 23929 ns 24669 ns 0.97
array/reverse/1d_inplace 10064.333333333334 ns 10913.666666666666 ns 0.92
array/reverse/2d_inplace 11161 ns 11253 ns 0.99
array/copy 20853 ns 20229 ns 1.03
array/iteration/findall/int 158429 ns 157863.5 ns 1.00
array/iteration/findall/bool 138836.5 ns 138404.5 ns 1.00
array/iteration/findfirst/int 154009 ns 153375 ns 1.00
array/iteration/findfirst/bool 155150 ns 154273 ns 1.01
array/iteration/scalar 78337 ns 75697 ns 1.03
array/iteration/logical 214802.5 ns 212853.5 ns 1.01
array/iteration/findmin/1d 41477 ns 41543 ns 1.00
array/iteration/findmin/2d 94799 ns 93933.5 ns 1.01
array/reductions/reduce/1d 41864 ns 35999 ns 1.16
array/reductions/reduce/2d 46498 ns 41907.5 ns 1.11
array/reductions/mapreduce/1d 39069 ns 33891.5 ns 1.15
array/reductions/mapreduce/2d 51362.5 ns 41528 ns 1.24
array/broadcast 21291 ns 21376 ns 1.00
array/copyto!/gpu_to_gpu 13431 ns 11516 ns 1.17
array/copyto!/cpu_to_gpu 212369 ns 210665 ns 1.01
array/copyto!/gpu_to_cpu 244285.5 ns 243223.5 ns 1.00
array/accumulate/1d 108232 ns 108164 ns 1.00
array/accumulate/2d 80093 ns 79823.5 ns 1.00
array/construct 1251.1 ns 1284.3 ns 0.97
array/random/randn/Float32 49579 ns 49740 ns 1.00
array/random/randn!/Float32 26496 ns 26117 ns 1.01
array/random/rand!/Int64 27075 ns 27030 ns 1.00
array/random/rand!/Float32 8643.333333333334 ns 8836.333333333334 ns 0.98
array/random/rand/Int64 29873 ns 37762.5 ns 0.79
array/random/rand/Float32 13113 ns 13046 ns 1.01
array/permutedims/4d 67087.5 ns 66810 ns 1.00
array/permutedims/2d 56669 ns 56518 ns 1.00
array/permutedims/3d 59144 ns 59273.5 ns 1.00
array/sorting/1d 2932542 ns 2933200.5 ns 1.00
array/sorting/by 3499198 ns 3500043 ns 1.00
array/sorting/2d 1085381 ns 1084935 ns 1.00
cuda/synchronization/stream/auto 1060.9 ns 1035.9 ns 1.02
cuda/synchronization/stream/nonblocking 6515.8 ns 6536.8 ns 1.00
cuda/synchronization/stream/blocking 815.8941176470588 ns 791.2244897959183 ns 1.03
cuda/synchronization/context/auto 1192.1 ns 1182.9 ns 1.01
cuda/synchronization/context/nonblocking 6709.4 ns 6769.6 ns 0.99
cuda/synchronization/context/blocking 921.525 ns 915.2666666666667 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant