-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to new alloc cache interface. #2614
Conversation
e6ae1b8
to
f631729
Compare
f631729
to
eb33d4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions could not be made:
- src/array.jl
- lines 88-89
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do | ||
DataRef(pool_free, pool_alloc(M, bufsize)) | ||
end | ||
obj = new{T,N,M}(data, maxsize, 0, dims) | ||
finalizer(unsafe_free!, obj) | ||
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do | |
DataRef(pool_free, pool_alloc(M, bufsize)) | |
end | |
obj = new{T,N,M}(data, maxsize, 0, dims) | |
finalizer(unsafe_free!, obj) | |
return obj | |
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do | |
DataRef(pool_free, pool_alloc(M, bufsize)) | |
end | |
obj = new{T, N, M}(data, maxsize, 0, dims) | |
finalizer(unsafe_free!, obj) | |
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions could not be made:
- src/array.jl
- lines 88-89
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do | ||
DataRef(pool_free, pool_alloc(M, bufsize)) | ||
end | ||
obj = new{T,N,M}(data, maxsize, 0, dims) | ||
finalizer(unsafe_free!, obj) | ||
return obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do | |
DataRef(pool_free, pool_alloc(M, bufsize)) | |
end | |
obj = new{T,N,M}(data, maxsize, 0, dims) | |
finalizer(unsafe_free!, obj) | |
return obj | |
data = GPUArrays.cached_alloc((CuArray, device(), M, bufsize)) do | |
DataRef(pool_free, pool_alloc(M, bufsize)) | |
end | |
obj = new{T, N, M}(data, maxsize, 0, dims) | |
finalizer(unsafe_free!, obj) | |
return obj |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2614 +/- ##
===========================================
+ Coverage 5.96% 73.59% +67.62%
===========================================
Files 157 157
Lines 15038 15230 +192
===========================================
+ Hits 897 11208 +10311
+ Misses 14141 4022 -10119 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: eb33d4e | Previous: 774abc6 | Ratio |
---|---|---|---|
latency/precompile |
45397878341 ns |
45532671418 ns |
1.00 |
latency/ttfp |
6420318339.5 ns |
6382276443.5 ns |
1.01 |
latency/import |
3049558823 ns |
3039078540.5 ns |
1.00 |
integration/volumerhs |
9567331 ns |
9567627 ns |
1.00 |
integration/byval/slices=1 |
146661 ns |
146713 ns |
1.00 |
integration/byval/slices=3 |
425651 ns |
425286 ns |
1.00 |
integration/byval/reference |
144728 ns |
144622 ns |
1.00 |
integration/byval/slices=2 |
286144 ns |
286077 ns |
1.00 |
integration/cudadevrt |
103442 ns |
103283 ns |
1.00 |
kernel/indexing |
13963 ns |
14073 ns |
0.99 |
kernel/indexing_checked |
15101 ns |
15126 ns |
1.00 |
kernel/occupancy |
708.4836601307189 ns |
710.5460992907801 ns |
1.00 |
kernel/launch |
2105.8 ns |
2120.3 ns |
0.99 |
kernel/rand |
16810 ns |
14743 ns |
1.14 |
array/reverse/1d |
19782 ns |
19325.5 ns |
1.02 |
array/reverse/2d |
23929 ns |
24669 ns |
0.97 |
array/reverse/1d_inplace |
10064.333333333334 ns |
10913.666666666666 ns |
0.92 |
array/reverse/2d_inplace |
11161 ns |
11253 ns |
0.99 |
array/copy |
20853 ns |
20229 ns |
1.03 |
array/iteration/findall/int |
158429 ns |
157863.5 ns |
1.00 |
array/iteration/findall/bool |
138836.5 ns |
138404.5 ns |
1.00 |
array/iteration/findfirst/int |
154009 ns |
153375 ns |
1.00 |
array/iteration/findfirst/bool |
155150 ns |
154273 ns |
1.01 |
array/iteration/scalar |
78337 ns |
75697 ns |
1.03 |
array/iteration/logical |
214802.5 ns |
212853.5 ns |
1.01 |
array/iteration/findmin/1d |
41477 ns |
41543 ns |
1.00 |
array/iteration/findmin/2d |
94799 ns |
93933.5 ns |
1.01 |
array/reductions/reduce/1d |
41864 ns |
35999 ns |
1.16 |
array/reductions/reduce/2d |
46498 ns |
41907.5 ns |
1.11 |
array/reductions/mapreduce/1d |
39069 ns |
33891.5 ns |
1.15 |
array/reductions/mapreduce/2d |
51362.5 ns |
41528 ns |
1.24 |
array/broadcast |
21291 ns |
21376 ns |
1.00 |
array/copyto!/gpu_to_gpu |
13431 ns |
11516 ns |
1.17 |
array/copyto!/cpu_to_gpu |
212369 ns |
210665 ns |
1.01 |
array/copyto!/gpu_to_cpu |
244285.5 ns |
243223.5 ns |
1.00 |
array/accumulate/1d |
108232 ns |
108164 ns |
1.00 |
array/accumulate/2d |
80093 ns |
79823.5 ns |
1.00 |
array/construct |
1251.1 ns |
1284.3 ns |
0.97 |
array/random/randn/Float32 |
49579 ns |
49740 ns |
1.00 |
array/random/randn!/Float32 |
26496 ns |
26117 ns |
1.01 |
array/random/rand!/Int64 |
27075 ns |
27030 ns |
1.00 |
array/random/rand!/Float32 |
8643.333333333334 ns |
8836.333333333334 ns |
0.98 |
array/random/rand/Int64 |
29873 ns |
37762.5 ns |
0.79 |
array/random/rand/Float32 |
13113 ns |
13046 ns |
1.01 |
array/permutedims/4d |
67087.5 ns |
66810 ns |
1.00 |
array/permutedims/2d |
56669 ns |
56518 ns |
1.00 |
array/permutedims/3d |
59144 ns |
59273.5 ns |
1.00 |
array/sorting/1d |
2932542 ns |
2933200.5 ns |
1.00 |
array/sorting/by |
3499198 ns |
3500043 ns |
1.00 |
array/sorting/2d |
1085381 ns |
1084935 ns |
1.00 |
cuda/synchronization/stream/auto |
1060.9 ns |
1035.9 ns |
1.02 |
cuda/synchronization/stream/nonblocking |
6515.8 ns |
6536.8 ns |
1.00 |
cuda/synchronization/stream/blocking |
815.8941176470588 ns |
791.2244897959183 ns |
1.03 |
cuda/synchronization/context/auto |
1192.1 ns |
1182.9 ns |
1.01 |
cuda/synchronization/context/nonblocking |
6709.4 ns |
6769.6 ns |
0.99 |
cuda/synchronization/context/blocking |
921.525 ns |
915.2666666666667 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
JuliaGPU/GPUArrays.jl#583