-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Use non-blocking device side pointer mode in CUBLAS, with fallbacks #2616
base: master
Are you sure you want to change the base?
Conversation
I can also add some more |
Sample speedup: julia> using CUDA, CUDA.CUBLAS, LinearAlgebra;
julia> n = Int(2^26);
julia> X = CUDA.rand(Float64, n);
julia> res = CuRef{Float64}(0.0);
# do some precompilation runs first
julia> @time CUBLAS.nrm2(n, X, res);
0.000104 seconds (18 allocations: 288 bytes)
julia> @time CUBLAS.nrm2(n, X);
0.001564 seconds (73 allocations: 3.094 KiB)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: bcd41c1 | Previous: b2ee7e7 | Ratio |
---|---|---|---|
latency/precompile |
45400882457 ns |
45217092321.5 ns |
1.00 |
latency/ttfp |
6362368507.5 ns |
6288023843 ns |
1.01 |
latency/import |
2992161692 ns |
2960039735 ns |
1.01 |
integration/volumerhs |
9549189 ns |
9567878 ns |
1.00 |
integration/byval/slices=1 |
147270 ns |
147185 ns |
1.00 |
integration/byval/slices=3 |
425490 ns |
425353 ns |
1.00 |
integration/byval/reference |
145273 ns |
145019 ns |
1.00 |
integration/byval/slices=2 |
286524.5 ns |
286047 ns |
1.00 |
integration/cudadevrt |
103712.5 ns |
103238 ns |
1.00 |
kernel/indexing |
14257 ns |
14079 ns |
1.01 |
kernel/indexing_checked |
14898 ns |
14738 ns |
1.01 |
kernel/occupancy |
662.1470588235294 ns |
666.3333333333334 ns |
0.99 |
kernel/launch |
2179.9 ns |
2076.6 ns |
1.05 |
kernel/rand |
18285 ns |
15822 ns |
1.16 |
array/reverse/1d |
19572 ns |
19722 ns |
0.99 |
array/reverse/2d |
24145 ns |
24464 ns |
0.99 |
array/reverse/1d_inplace |
10991 ns |
10896 ns |
1.01 |
array/reverse/2d_inplace |
13408 ns |
13220 ns |
1.01 |
array/copy |
20991 ns |
20968 ns |
1.00 |
array/iteration/findall/int |
155783.5 ns |
155503.5 ns |
1.00 |
array/iteration/findall/bool |
134484 ns |
134149 ns |
1.00 |
array/iteration/findfirst/int |
151117 ns |
154020 ns |
0.98 |
array/iteration/findfirst/bool |
153022 ns |
153902 ns |
0.99 |
array/iteration/scalar |
64967 ns |
61626 ns |
1.05 |
array/iteration/logical |
202762 ns |
202679 ns |
1.00 |
array/iteration/findmin/1d |
39782 ns |
38469 ns |
1.03 |
array/iteration/findmin/2d |
93779.5 ns |
94001 ns |
1.00 |
array/reductions/reduce/1d |
30108 ns |
40656 ns |
0.74 |
array/reductions/reduce/2d |
41762.5 ns |
47582.5 ns |
0.88 |
array/reductions/mapreduce/1d |
29407 ns |
36366 ns |
0.81 |
array/reductions/mapreduce/2d |
51008.5 ns |
41825.5 ns |
1.22 |
array/broadcast |
20869 ns |
20720 ns |
1.01 |
array/copyto!/gpu_to_gpu |
11648.5 ns |
13420 ns |
0.87 |
array/copyto!/cpu_to_gpu |
207275 ns |
210000 ns |
0.99 |
array/copyto!/gpu_to_cpu |
242298 ns |
243040 ns |
1.00 |
array/accumulate/1d |
108424 ns |
109139 ns |
0.99 |
array/accumulate/2d |
80163 ns |
80526 ns |
1.00 |
array/construct |
1245.4 ns |
1236.9 ns |
1.01 |
array/random/randn/Float32 |
49050.5 ns |
43772 ns |
1.12 |
array/random/randn!/Float32 |
26735 ns |
26309 ns |
1.02 |
array/random/rand!/Int64 |
27181 ns |
27062 ns |
1.00 |
array/random/rand!/Float32 |
8873.333333333334 ns |
8747.666666666666 ns |
1.01 |
array/random/rand/Int64 |
38147 ns |
29796 ns |
1.28 |
array/random/rand/Float32 |
13215 ns |
12927 ns |
1.02 |
array/permutedims/4d |
60895 ns |
61532 ns |
0.99 |
array/permutedims/2d |
55164 ns |
55321 ns |
1.00 |
array/permutedims/3d |
56079 ns |
56155 ns |
1.00 |
array/sorting/1d |
2776444 ns |
2775720.5 ns |
1.00 |
array/sorting/by |
3367937.5 ns |
3367449 ns |
1.00 |
array/sorting/2d |
1085162 ns |
1084582 ns |
1.00 |
cuda/synchronization/stream/auto |
1036.1 ns |
1035 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
6479.6 ns |
6321.6 ns |
1.02 |
cuda/synchronization/stream/blocking |
806.9677419354839 ns |
803.8529411764706 ns |
1.00 |
cuda/synchronization/context/auto |
1171.5 ns |
1169.6 ns |
1.00 |
cuda/synchronization/context/nonblocking |
6715 ns |
6548.6 ns |
1.03 |
cuda/synchronization/context/blocking |
892.7058823529412 ns |
901.9777777777778 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
Yeah, should one of us open an issue?
…On Sat, Jan 11, 2025 at 2:48 AM Tim Besard ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In lib/cublas/wrappers.jl
<#2616 (comment)>:
> function scal!(n::Integer, alpha::Number, x::StridedCuVecOrDenseMat{Float16})
- α = convert(Float32, alpha)
- cublasScalEx(handle(), n, Ref{Float32}(α), Float32, x, Float16, stride(x, 1), Float32)
+ α = CuRef{Float32}( convert(Float32, alpha) )
We should improve CuRef so that it can be constructed identically to Ref.
Ref{T}(x) doing an implicit convert is pretty convenient.
—
Reply to this email directly, view it on GitHub
<#2616 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJY7VKNAPMMZTTKAF2YT2KDEFVAVCNFSM6AAAAABU7EYIIGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKNBUGU4DSOBRGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Is the test failure something I've done? Seems GPUArrays related |
a0829fa
to
5d52d10
Compare
OK, I think this is ready for review! |
I am not qualified to review, but certainly interested in the outcome. Will the non-blocking methods only accept |
For now only CuRef but these are easy to create (it’s exported by CUDA.jl).
I think one can also create them without a copy from a regular CuArray?
…On Thu, Jan 16, 2025 at 3:41 PM Jutho ***@***.***> wrote:
I am not qualified to review, but certainly interested in the outcome.
Will the non-blocking methods only accept CuRef objects for the scalar
input or output quantities, or also zero-dimensional arrays (i.e.
CuArray{T,0})?
—
Reply to this email directly, view it on GitHub
<#2616 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJYYFBVIOILWK4G4PORD2LAKPLAVCNFSM6AAAAABU7EYIIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWHA2DSMBXG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
You can create a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should also improve CuRef
to initialize its memory by calling fill
instead of memcpy
: When calling memcpy
, the copy likely won't be truly asynchronous (that would require pinned memory). But if we call fill
, which should be possible for most scalars, the argument is passed by value and I think the call will complete asynchronously.
Something to investigate!
lib/cublas/wrappers.jl
Outdated
α = convert(T, alpha) | ||
gpu_α = CuRef{T}(α) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converts can go, CuRef
does that for you:
julia> CuRef{Float32}(1)
CUDA.CuRefArray{Float32, CuArray{Float32, 1, CUDA.DeviceMemory}}(Float32[1.0], 1)
α = convert(T, alpha) | ||
gpu_α = CuRef{T}(α) | ||
scal!(n, gpu_α, x) | ||
synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the synchronization? The only way to see the changes by this call is to fetch memory, which is a synchronizing operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scal!
and other functions which don't return a scalar result, I added this to keep the previous behaviour (so that the entire call is synchronous). I'll remove the sync for things like nrm2!
that return a scalar which is copied back anyway.
|
829083e
to
fd59678
Compare
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/lib/cublas/wrappers.jl b/lib/cublas/wrappers.jl
index 24f414af0..ea213ff66 100644
--- a/lib/cublas/wrappers.jl
+++ b/lib/cublas/wrappers.jl
@@ -115,8 +115,9 @@ for (fname, fname_64, elty) in ((:cublasDscal_v2, :cublasDscal_v2_64, :Float64),
(:cublasCscal_v2, :cublasCscal_v2_64, :ComplexF32))
@eval begin
function scal!(n::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
- x::StridedCuVecOrDenseMat{$elty}) where {M <: AbstractMemory}
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ x::StridedCuVecOrDenseMat{$elty}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, alpha, x, stride(x, 1))
else
@@ -147,8 +148,9 @@ for (fname, fname_64, elty, celty) in ((:cublasCsscal_v2, :cublasCsscal_v2_64, :
(:cublasZdscal_v2, :cublasZdscal_v2_64, :Float64, :ComplexF64))
@eval begin
function scal!(n::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
- x::StridedCuVecOrDenseMat{$celty}) where {M<:AbstractMemory}
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ x::StridedCuVecOrDenseMat{$celty}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, alpha, x, stride(x, 1))
else
@@ -190,9 +192,9 @@ for (jname, fname, fname_64, elty) in ((:dot, :cublasDdot_v2, :cublasDdot_v2_64,
@eval begin
function $jname(n::Integer,
x::StridedCuVecOrDenseMat{$elty},
- y::StridedCuVecOrDenseMat{$elty},
- result::CuRefArray{$elty, CuVector{$elty, M}},
- ) where {M<:AbstractMemory}
+ y::StridedCuVecOrDenseMat{$elty},
+ result::CuRefArray{$elty, CuVector{$elty, M}},
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, x, stride(x, 1), y, stride(y, 1), result)
else
@@ -236,7 +238,7 @@ function dotu(
return result[]
end
-function dot(n::Integer, x::StridedCuVecOrDenseMat{Float16}, y::StridedCuVecOrDenseMat{Float16}, result::CuRefArray{Float16, CuVector{Float16, M}}) where {M<:AbstractMemory}
+function dot(n::Integer, x::StridedCuVecOrDenseMat{Float16}, y::StridedCuVecOrDenseMat{Float16}, result::CuRefArray{Float16, CuVector{Float16, M}}) where {M <: AbstractMemory}
cublasDotEx(handle(), n, x, Float16, stride(x, 1), y, Float16, stride(y, 1), result, Float16, Float32)
return result
end
@@ -263,7 +265,7 @@ for (fname, fname_64, elty, ret_type) in ((:cublasDnrm2_v2, :cublasDnrm2_v2_64,
function nrm2(n::Integer,
X::StridedCuVecOrDenseMat{$elty},
result::CuRefArray{$ret_type, CuVector{$ret_type, M}},
- ) where {M<:AbstractMemory}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, X, stride(X, 1), result)
else
@@ -339,9 +341,10 @@ for (fname, fname_64, elty) in ((:cublasDaxpy_v2, :cublasDaxpy_v2_64, :Float64),
(:cublasCaxpy_v2, :cublasCaxpy_v2_64, :ComplexF32))
@eval begin
function axpy!(n::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
dx::StridedCuVecOrDenseMat{$elty},
- dy::StridedCuVecOrDenseMat{$elty}) where {M <: AbstractMemory}
+ dy::StridedCuVecOrDenseMat{$elty}
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, alpha, dx, stride(dx, 1), dy, stride(dy, 1))
else
@@ -399,9 +402,9 @@ for (fname, fname_64, elty, cty, sty) in (
function rot!(n::Integer,
x::StridedCuVecOrDenseMat{$elty},
y::StridedCuVecOrDenseMat{$elty},
- c::CuRefArray{$cty, CuVector{$cty, M}},
- s::CuRefArray{$sty, CuVector{$sty, M}},
- ) where {M <: AbstractMemory}
+ c::CuRefArray{$cty, CuVector{$cty, M}},
+ s::CuRefArray{$sty, CuVector{$sty, M}},
+ ) where {M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, x, stride(x, 1), y, stride(y, 1), c, s)
else
@@ -472,9 +475,9 @@ for (fname, fname_64, elty) in ((:cublasIdamax_v2, :cublasIdamax_v2_64, :Float64
(:cublasIcamax_v2, :cublasIcamax_v2_64, :ComplexF32))
@eval begin
function iamax(n::Integer,
- dx::StridedCuVecOrDenseMat{$elty},
- result::CuRefArray{Ti, CuVector{Ti, M}},
- ) where {Ti <: Integer, M <: AbstractMemory}
+ dx::StridedCuVecOrDenseMat{$elty},
+ result::CuRefArray{Ti, CuVector{Ti, M}},
+ ) where {Ti <: Integer, M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, dx, stride(dx, 1), result)
else
@@ -493,9 +496,9 @@ for (fname, fname_64, elty) in ((:cublasIdamin_v2, :cublasIdamin_v2_64, :Float64
(:cublasIcamin_v2, :cublasIcamin_v2_64, :ComplexF32))
@eval begin
function iamin(n::Integer,
- dx::StridedCuVecOrDenseMat{$elty},
- result::CuRefArray{Ti, CuVector{Ti, M}},
- ) where {Ti <: Integer, M <: AbstractMemory}
+ dx::StridedCuVecOrDenseMat{$elty},
+ result::CuRefArray{Ti, CuVector{Ti, M}},
+ ) where {Ti <: Integer, M <: AbstractMemory}
if CUBLAS.version() >= v"12.0"
$fname_64(handle(), n, dx, stride(dx, 1), result)
else
@@ -529,11 +532,12 @@ for (fname, fname_64, elty) in ((:cublasDgemv_v2, :cublasDgemv_v2_64, :Float64),
(:cublasCgemv_v2, :cublasCgemv_v2_64, :ComplexF32))
@eval begin
function gemv!(trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
# handle trans
m,n = size(A)
# check dimensions
@@ -558,10 +562,10 @@ function gemv!(trans::Char, alpha::Number, A::StridedCuMatrix{T}, x::StridedCuVe
synchronize()
return y
end
-function gemv(trans::Char, alpha::CuRefArray{T, CuVector{T, M}}, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T, M<:AbstractMemory}
+function gemv(trans::Char, alpha::CuRefArray{T, CuVector{T, M}}, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T, M <: AbstractMemory}
return gemv!(trans, alpha, A, x, CuRef{T}(zero(T)), similar(x, size(A, (trans == 'N' ? 1 : 2))))
end
-function gemv(trans::Char, alpha::Number, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where T
+function gemv(trans::Char, alpha::Number, A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T}
gemv!(trans, alpha, A, x, zero(T), similar(x, size(A, (trans == 'N' ? 1 : 2))))
end
# should this be async?
@@ -579,12 +583,12 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
)
@eval begin
function gemv_batched!(trans::Char,
- alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- A::Vector{<:StridedCuMatrix{$eltyin}},
- x::Vector{<:StridedCuVector{$eltyin}},
- beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- y::Vector{<:StridedCuVector{$eltyout}}
- ) where {M<:AbstractMemory}
+ alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ A::Vector{<:StridedCuMatrix{$eltyin}},
+ x::Vector{<:StridedCuVector{$eltyin}},
+ beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ y::Vector{<:StridedCuVector{$eltyout}}
+ ) where {M <: AbstractMemory}
if length(A) != length(x) || length(A) != length(y)
throw(DimensionMismatch("Lengths of inputs must be the same"))
end
@@ -615,13 +619,13 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
y
end
function gemv_batched!(
- trans::Char,
- alpha::Number,
- A::Vector{<:StridedCuMatrix{$eltyin}},
- x::Vector{<:StridedCuVector{$eltyin}},
- beta::Number,
- y::Vector{<:StridedCuVector{$eltyout}}
- )
+ trans::Char,
+ alpha::Number,
+ A::Vector{<:StridedCuMatrix{$eltyin}},
+ x::Vector{<:StridedCuVector{$eltyin}},
+ beta::Number,
+ y::Vector{<:StridedCuVector{$eltyout}}
+ )
gpu_α = CuRef{$eltyconst}(alpha)
gpu_β = CuRef{$eltyconst}(beta)
y = gemv_batched!(trans, gpu_α, A, x, gpu_β, y)
@@ -641,12 +645,12 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
)
@eval begin
function gemv_strided_batched!(trans::Char,
- alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- A::AbstractArray{$eltyin, 3},
- x::AbstractArray{$eltyin, 2},
- beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
- y::AbstractArray{$eltyout, 2}
- ) where {M<:AbstractMemory}
+ alpha::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ A::AbstractArray{$eltyin, 3},
+ x::AbstractArray{$eltyin, 2},
+ beta::CuRefArray{$eltyconst, CuVector{$eltyconst, M}},
+ y::AbstractArray{$eltyout, 2}
+ ) where {M <: AbstractMemory}
if size(A, 3) != size(x, 2) || size(A, 3) != size(y, 2)
throw(DimensionMismatch("Batch sizes must be equal for all inputs"))
end
@@ -671,13 +675,13 @@ for (fname, fname_64, eltyin, eltyout, eltyconst) in (
y
end
function gemv_strided_batched!(
- trans::Char,
- alpha::Number,
- A::AbstractArray{$eltyin, 3},
- x::AbstractArray{$eltyin, 2},
- beta::Number,
- y::AbstractArray{$eltyout, 2}
- )
+ trans::Char,
+ alpha::Number,
+ A::AbstractArray{$eltyin, 3},
+ x::AbstractArray{$eltyin, 2},
+ beta::Number,
+ y::AbstractArray{$eltyout, 2}
+ )
gpu_α = CuRef{$eltyconst}(alpha)
gpu_β = CuRef{$eltyconst}(beta)
y = gemv_strided_batched!(trans, gpu_α, A, x, gpu_β, y)
@@ -697,11 +701,12 @@ for (fname, fname_64, elty) in ((:cublasDgbmv_v2, :cublasDgbmv_v2_64, :Float64),
m::Integer,
kl::Integer,
ku::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
n = size(A,2)
# check dimensions
length(x) == (trans == 'N' ? n : m) && length(y) == (trans == 'N' ? m : n) || throw(DimensionMismatch(""))
@@ -716,16 +721,17 @@ for (fname, fname_64, elty) in ((:cublasDgbmv_v2, :cublasDgbmv_v2_64, :Float64),
end
y
end
- function gbmv!(trans::Char,
- m::Integer,
- kl::Integer,
- ku::Integer,
- alpha::Number,
- A::StridedCuMatrix{$elty},
- x::StridedCuVector{$elty},
- beta::Number,
- y::StridedCuVector{$elty}
- )
+ function gbmv!(
+ trans::Char,
+ m::Integer,
+ kl::Integer,
+ ku::Integer,
+ alpha::Number,
+ A::StridedCuMatrix{$elty},
+ x::StridedCuVector{$elty},
+ beta::Number,
+ y::StridedCuVector{$elty}
+ )
gpu_α = CuRef{$elty}(alpha)
gpu_β = CuRef{$elty}(beta)
@@ -735,8 +741,10 @@ for (fname, fname_64, elty) in ((:cublasDgbmv_v2, :cublasDgbmv_v2_64, :Float64),
end
end
end
-function gbmv(trans::Char, m::Integer, kl::Integer, ku::Integer, alpha::CuRefArray{T, CuVector{T, M}},
- A::StridedCuMatrix{T}, x::StridedCuVector{T}) where {T, M<:AbstractMemory}
+function gbmv(
+ trans::Char, m::Integer, kl::Integer, ku::Integer, alpha::CuRefArray{T, CuVector{T, M}},
+ A::StridedCuMatrix{T}, x::StridedCuVector{T}
+ ) where {T, M <: AbstractMemory}
# TODO: fix gbmv bug in julia
n = size(A, 2)
leny = trans == 'N' ? m : n
@@ -759,11 +767,12 @@ for (fname, fname_64, elty) in ((:cublasDspmv_v2, :cublasDspmv_v2_64, :Float64),
(:cublasSspmv_v2, :cublasSspmv_v2_64, :Float32))
@eval begin
function spmv!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
AP::StridedCuVector{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
n = round(Int, (sqrt(8*length(AP))-1)/2)
if n != length(x) || n != length(y) throw(DimensionMismatch("")) end
incx = stride(x,1)
@@ -777,21 +786,24 @@ for (fname, fname_64, elty) in ((:cublasDspmv_v2, :cublasDspmv_v2_64, :Float64),
end
end
end
-function spmv!(uplo::Char,
- alpha::Number,
- AP::StridedCuVector{T},
- x::StridedCuVector{T},
- beta::Number,
- y::StridedCuVector{T}
- ) where {T}
+function spmv!(
+ uplo::Char,
+ alpha::Number,
+ AP::StridedCuVector{T},
+ x::StridedCuVector{T},
+ beta::Number,
+ y::StridedCuVector{T}
+ ) where {T}
gpu_α = CuRef{T}(alpha)
gpu_β = CuRef{T}(beta)
y = spmv!(uplo, gpu_α, AP, x, gpu_β, y)
synchronize()
return y
end
-function spmv(uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
- AP::StridedCuVector{T}, x::StridedCuVector{T}) where {T, M<:AbstractMemory}
+function spmv(
+ uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
+ AP::StridedCuVector{T}, x::StridedCuVector{T}
+ ) where {T, M <: AbstractMemory}
return spmv!(uplo, alpha, AP, x, CuRef{T}(zero(T)), similar(x))
end
function spmv(uplo::Char, alpha::Number,
@@ -810,11 +822,12 @@ for (fname, fname_64, elty) in ((:cublasDsymv_v2, :cublasDsymv_v2_64, :Float64),
# Note that the complex symv are not BLAS but auiliary functions in LAPACK
@eval begin
function symv!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
if m != n throw(DimensionMismatch("Matrix A is $m by $n but must be square")) end
if m != length(x) || m != length(y) throw(DimensionMismatch("")) end
@@ -847,7 +860,7 @@ end
function symv(
uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return symv!(uplo, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function symv(uplo::Char, alpha::Number,
@@ -864,11 +877,12 @@ for (fname, fname_64, elty) in ((:cublasZhemv_v2, :cublasZhemv_v2_64, :ComplexF6
(:cublasChemv_v2, :cublasChemv_v2_64, :ComplexF32))
@eval begin
function hemv!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
# TODO: fix dimension check bug in julia
m, n = size(A)
if m != n throw(DimensionMismatch("Matrix A is $m by $n but must be square")) end
@@ -902,7 +916,7 @@ end
function hemv(
uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return hemv!(uplo, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function hemv(uplo::Char, alpha::Number, A::StridedCuMatrix{T},
@@ -922,11 +936,12 @@ for (fname, fname_64, elty) in ((:cublasDsbmv_v2, :cublasDsbmv_v2_64, :Float64),
@eval begin
function sbmv!(uplo::Char,
k::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
#if m != n throw(DimensionMismatch("Matrix A is $m by $n but must be square")) end
if !(1<=(1+k)<=n) throw(DimensionMismatch("Incorrect number of bands")) end
@@ -962,7 +977,7 @@ end
function sbmv(
uplo::Char, k::Integer, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return sbmv!(uplo, k, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function sbmv(uplo::Char, k::Integer, alpha::Number,
@@ -981,11 +996,12 @@ for (fname, fname_64, elty) in ((:cublasZhbmv_v2, :cublasZhbmv_v2_64, :ComplexF6
@eval begin
function hbmv!(uplo::Char,
k::Integer,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
x::StridedCuVector{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- y::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ y::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
if !(1<=(1+k)<=n) throw(DimensionMismatch("Incorrect number of bands")) end
if m < 1+k throw(DimensionMismatch("Array A has fewer than 1+k rows")) end
@@ -1020,7 +1036,7 @@ end
function hbmv(
uplo::Char, k::Integer, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, x::StridedCuVector{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return hbmv!(uplo, k, alpha, A, x, CuRef{T}(zero(T)), similar(x))
end
function hbmv(uplo::Char, k::Integer, alpha::Number,
@@ -1168,10 +1184,11 @@ for (fname, fname_64, elty) in ((:cublasDger_v2, :cublasDger_v2_64, :Float64),
(:cublasCgerc_v2, :cublasCgerc_v2_64, :ComplexF32))
@eval begin
function ger!(
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
x::StridedCuVector{$elty},
y::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == length(x) || throw(DimensionMismatch(""))
n == length(y) || throw(DimensionMismatch(""))
@@ -1204,9 +1221,10 @@ for (fname, fname_64, elty) in ((:cublasDspr_v2, :cublasDspr_v2_64, :Float64),
(:cublasSspr_v2, :cublasSspr_v2_64, :Float32))
@eval begin
function spr!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
x::StridedCuVector{$elty},
- AP::StridedCuVector{$elty}) where {M<:AbstractMemory}
+ AP::StridedCuVector{$elty}
+ ) where {M <: AbstractMemory}
n = round(Int, (sqrt(8*length(AP))-1)/2)
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
incx = stride(x,1)
@@ -1238,9 +1256,10 @@ for (fname, fname_64, elty) in ((:cublasDsyr_v2, :cublasDsyr_v2_64, :Float64),
(:cublasCsyr_v2, :cublasCsyr_v2_64, :ComplexF32))
@eval begin
function syr!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
x::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == n || throw(DimensionMismatch("Matrix A is $m by $n but must be square"))
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
@@ -1274,9 +1293,10 @@ for (fname, fname_64, elty, relty) in (
)
@eval begin
function her!(uplo::Char,
- alpha::CuRefArray{$relty, CuVector{$relty, M}},
+ alpha::CuRefArray{$relty, CuVector{$relty, M}},
x::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == n || throw(DimensionMismatch("Matrix A is $m by $n but must be square"))
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
@@ -1308,11 +1328,11 @@ for (fname, fname_64, elty) in ((:cublasZher2_v2, :cublasZher2_v2_64, :ComplexF6
(:cublasCher2_v2, :cublasCher2_v2_64, :ComplexF32))
@eval begin
function her2!(uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
- x::StridedCuVector{$elty},
- y::StridedCuVector{$elty},
- A::StridedCuMatrix{$elty}
- ) where {M<:AbstractMemory}
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ x::StridedCuVector{$elty},
+ y::StridedCuVector{$elty},
+ A::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(A)
m == n || throw(DimensionMismatch("Matrix A is $m by $n but must be square"))
length(x) == n || throw(DimensionMismatch("Length of vector must be the same as the matrix dimensions"))
@@ -1352,11 +1372,12 @@ for (fname, fname_64, elty) in ((:cublasDgemm_v2, :cublasDgemm_v2_64, :Float64),
@eval begin
function gemm!(transA::Char,
transB::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuVecOrMat{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuVecOrMat{$elty}
+ ) where {M <: AbstractMemory}
m = size(A, transA == 'N' ? 1 : 2)
k = size(A, transA == 'N' ? 2 : 1)
n = size(B, transB == 'N' ? 2 : 1)
@@ -1393,7 +1414,7 @@ end
function gemm(
transA::Char, transB::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuVecOrMat{T}, B::StridedCuVecOrMat{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return gemm!(
transA, transB, alpha, A, B, CuRef(zero(T)),
similar(
@@ -1493,10 +1514,10 @@ function gemmExComputeType(TA, TB, TC, m, k, n)
end
function gemmEx!(transA::Char, transB::Char,
- @nospecialize(alpha::CuRefArray),
+ @nospecialize(alpha::CuRefArray),
@nospecialize(A::StridedCuVecOrMat),
@nospecialize(B::StridedCuVecOrMat),
- @nospecialize(beta::CuRefArray),
+ @nospecialize(beta::CuRefArray),
@nospecialize(C::StridedCuVecOrMat);
algo::cublasGemmAlgo_t=CUBLAS_GEMM_DEFAULT)
m = size(A, transA == 'N' ? 1 : 2)
@@ -1551,10 +1572,10 @@ end
# TODO for device mode pointers
function gemmBatchedEx!(transA::Char, transB::Char,
- @nospecialize(alpha::CuRefArray),
+ @nospecialize(alpha::CuRefArray),
@nospecialize(A::Vector{<:StridedCuVecOrMat}),
@nospecialize(B::Vector{<:StridedCuVecOrMat}),
- @nospecialize(beta::CuRefArray),
+ @nospecialize(beta::CuRefArray),
@nospecialize(C::Vector{<:StridedCuVecOrMat});
algo::cublasGemmAlgo_t=CUBLAS_GEMM_DEFAULT)
if length(A) != length(B) || length(A) != length(C)
@@ -1622,11 +1643,11 @@ function gemmBatchedEx!(
end
function gemmStridedBatchedEx!(
- transA::Char, transB::Char,
- @nospecialize(alpha::CuRefArray),
+ transA::Char, transB::Char,
+ @nospecialize(alpha::CuRefArray),
@nospecialize(A::AbstractArray{Ta, 3}),
@nospecialize(B::AbstractArray{Tb, 3}),
- @nospecialize(beta::CuRefArray),
+ @nospecialize(beta::CuRefArray),
@nospecialize(C::AbstractArray{Tc, 3});
algo::cublasGemmAlgo_t=CUBLAS_GEMM_DEFAULT) where {Ta, Tb, Tc}
if size(A, 3) != size(B, 3) || size(A, 3) != size(C, 3)
@@ -1865,11 +1886,12 @@ for (fname, fname_64, elty) in ((:cublasDgemmBatched, :cublasDgemmBatched_64, :F
@eval begin
function gemm_batched!(transA::Char,
transB::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::Vector{<:StridedCuMatrix{$elty}},
B::Vector{<:StridedCuMatrix{$elty}},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::Vector{<:StridedCuMatrix{$elty}}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::Vector{<:StridedCuMatrix{$elty}}
+ ) where {M <: AbstractMemory}
if length(A) != length(B) || length(A) != length(C)
throw(DimensionMismatch(""))
end
@@ -1948,11 +1970,12 @@ for (fname, fname_64, elty) in ((:cublasDgemmStridedBatched, :cublasDgemmStrided
@eval begin
function gemm_strided_batched!(transA::Char,
transB::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::AbstractArray{$elty, 3}, # allow PermutedDimsArray
B::AbstractArray{$elty, 3},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::AbstractArray{$elty, 3}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::AbstractArray{$elty, 3}
+ ) where {M <: AbstractMemory}
m = size(A, transA == 'N' ? 1 : 2)
k = size(A, transA == 'N' ? 2 : 1)
n = size(B, transB == 'N' ? 2 : 1)
@@ -2031,11 +2054,12 @@ for (fname, fname_64, elty) in ((:cublasDsymm_v2, :cublasDsymm_v2_64, :Float64),
@eval begin
function symm!(side::Char,
uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
B::StridedCuMatrix{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
k, nA = size(A)
if k != nA throw(DimensionMismatch("Matrix A must be square")) end
m = side == 'L' ? k : size(B,1)
@@ -2073,7 +2097,7 @@ end
function symm(
side::Char, uplo::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, B::StridedCuMatrix{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
return symm!(side, uplo, alpha, A, B, CuRef{T}(zero(T)), similar(B))
end
function symm(side::Char, uplo::Char, alpha::Number,
@@ -2093,10 +2117,11 @@ for (fname, fname_64, elty) in ((:cublasDsyrk_v2, :cublasDsyrk_v2_64, :Float64),
@eval begin
function syrk!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mC, n = size(C)
if mC != n throw(DimensionMismatch("C must be square")) end
nn = size(A, trans == 'N' ? 1 : 2)
@@ -2146,11 +2171,12 @@ for (fname, fname_64, elty) in ((:cublasDsyrkx, :cublasDsyrkx_64, :Float64),
@eval begin
function syrkx!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mC, n = size(C)
if mC != n throw(DimensionMismatch("C must be square")) end
nn = size(A, trans == 'N' ? 1 : 2)
@@ -2186,7 +2212,7 @@ end
function syrkx(
uplo::Char, trans::Char, alpha::CuRefArray{T, CuVector{T, M}}, A::StridedCuVecOrMat{T},
beta::CuRefArray{T, CuVector{T}}, B::StridedCuVecOrMat{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
n = size(A, trans == 'N' ? 1 : 2)
return syrkx!(uplo, trans, alpha, A, B, beta, similar(A, (n, n)))
end
@@ -2205,11 +2231,12 @@ for (fname, fname_64, elty) in ((:cublasZhemm_v2, :cublasZhemm_v2_64, :ComplexF6
@eval begin
function hemm!(side::Char,
uplo::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
B::StridedCuMatrix{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mA, nA = size(A)
m, n = size(B)
mC, nC = size(C)
@@ -2247,7 +2274,7 @@ end
function hemm(
uplo::Char, trans::Char, alpha::CuRefArray{T, CuVector{T, M}},
A::StridedCuMatrix{T}, B::StridedCuMatrix{T}
- ) where {T, M<:AbstractMemory}
+ ) where {T, M <: AbstractMemory}
m, n = size(B)
return hemm!(uplo, trans, alpha, A, B, CuRef{T}(zero(T)), similar(B, (m, n)))
end
@@ -2268,10 +2295,11 @@ for (fname, fname_64, elty, relty) in (
@eval begin
function herk!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$relty, CuVector{$relty, M}},
+ alpha::CuRefArray{$relty, CuVector{$relty, M}},
A::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$relty, CuVector{$relty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$relty, CuVector{$relty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mC, n = size(C)
if mC != n throw(DimensionMismatch("C must be square")) end
nn = size(A, trans == 'N' ? 1 : 2)
@@ -2305,7 +2333,7 @@ for (fname, fname_64, elty, relty) in (
trans::Char,
alpha::CuRefArray{$relty, CuVector{$relty, M}},
A::StridedCuVecOrMat{$elty}
- ) where {M<:AbstractMemory}
+ ) where {M <: AbstractMemory}
n = size(A, trans == 'N' ? 1 : 2)
return herk!(uplo, trans, alpha, A, CuRef{$relty}(zero($relty)), similar(A, (n, n)))
end
@@ -2327,11 +2355,12 @@ for (fname, fname_64, elty) in ((:cublasDsyr2k_v2, :cublasDsyr2k_v2_64, :Float64
@eval begin
function syr2k!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
# TODO: check size of B in julia (syr2k!)
m, n = size(C)
if m != n throw(DimensionMismatch("C must be square")) end
@@ -2386,7 +2415,7 @@ function syr2k(uplo::Char,
B::StridedCuVecOrMat)
T = eltype(A)
n = size(A, trans == 'N' ? 1 : 2)
- syr2k!(uplo, trans, convert(T, alpha), A, B, zero(T), similar(A, T, (n, n)))
+ return syr2k!(uplo, trans, convert(T, alpha), A, B, zero(T), similar(A, T, (n, n)))
end
function syr2k(uplo::Char, trans::Char, A::StridedCuVecOrMat, B::StridedCuVecOrMat)
syr2k(uplo, trans, one(eltype(A)), A, B)
@@ -2400,11 +2429,12 @@ for (fname, fname_64, elty, relty) in (
@eval begin
function her2k!(uplo::Char,
trans::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- beta::CuRefArray{$relty, CuVector{$relty, M}},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ beta::CuRefArray{$relty, CuVector{$relty, M}},
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
# TODO: check size of B in julia (her2k!)
m, n = size(C)
if m != n throw(DimensionMismatch("C must be square")) end
@@ -2447,7 +2477,7 @@ for (fname, fname_64, elty, relty) in (
alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuVecOrMat{$elty},
B::StridedCuVecOrMat{$elty},
- ) where {M<:AbstractMemory}
+ ) where {M <: AbstractMemory}
n = size(A, trans == 'N' ? 1 : 2)
return her2k!(uplo, trans, alpha, A, B, CuRef{$relty}(zero($relty)), similar(A, (n, n)))
end
@@ -2477,10 +2507,11 @@ for (mmname, smname, elty) in
uplo::Char,
transa::Char,
diag::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
B::StridedCuMatrix{$elty},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(B)
mA, nA = size(A)
# TODO: clean up error messages
@@ -2499,9 +2530,10 @@ for (mmname, smname, elty) in
uplo::Char,
transa::Char,
diag::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
- B::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ B::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
m, n = size(B)
mA, nA = size(A)
# TODO: clean up error messages
@@ -2564,9 +2596,10 @@ for (fname, fname_64, elty) in ((:cublasDtrsmBatched, :cublasDtrsmBatched_64, :F
uplo::Char,
transa::Char,
diag::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::Vector{<:StridedCuMatrix{$elty}},
- B::Vector{<:StridedCuMatrix{$elty}}) where {M<:AbstractMemory}
+ B::Vector{<:StridedCuMatrix{$elty}}
+ ) where {M <: AbstractMemory}
if length(A) != length(B)
throw(DimensionMismatch(""))
end
@@ -2620,11 +2653,12 @@ for (fname, fname_64, elty) in ((:cublasDgeam, :cublasDgeam_64, :Float64),
@eval begin
function geam!(transa::Char,
transb::Char,
- alpha::CuRefArray{$elty, CuVector{$elty, M}},
+ alpha::CuRefArray{$elty, CuVector{$elty, M}},
A::StridedCuMatrix{$elty},
- beta::CuRefArray{$elty, CuVector{$elty, M}},
+ beta::CuRefArray{$elty, CuVector{$elty, M}},
B::StridedCuMatrix{$elty},
- C::StridedCuMatrix{$elty}) where {M<:AbstractMemory}
+ C::StridedCuMatrix{$elty}
+ ) where {M <: AbstractMemory}
mA, nA = size(A)
mB, nB = size(B)
m, n = size(C)
@@ -2860,8 +2894,9 @@ for (fname, elty) in ((:cublasDgetriBatched, :Float64),
end
function getri_batched!(n, Aptrs::CuVector{CuPtr{$elty}},
- lda, Cptrs::CuVector{CuPtr{$elty}},ldc,
- pivotArray::CuArray{Cint})
+ lda, Cptrs::CuVector{CuPtr{$elty}}, ldc,
+ pivotArray::CuArray{Cint}
+ )
batchSize = length(Aptrs)
info = CuArray{Cint}(undef, batchSize)
$fname(handle(), n, Aptrs, lda, pivotArray, Cptrs, ldc, info, batchSize) |
CI failures seem relevant. Feel free to ignore the formatter; I made it less spammy 😉 |
fd59678
to
a2dedad
Compare
I really do not know what is up with the 1.11 failure, it looks |
Rebase to get rid of CI failures? |
Yep, next on my to do list
…On Sat, Jan 25, 2025 at 2:43 AM Tim Besard ***@***.***> wrote:
Rebase to get rid of CI failures?
—
Reply to this email directly, view it on GitHub
<#2616 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGKJY6QWHTIRCVHYN4CSE32MM6DTAVCNFSM6AAAAABU7EYIIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJTHAZDQNBZGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
cc7e01f
to
804a967
Compare
804a967
to
bcd41c1
Compare
Gotta admit I'm a bit mystified here as I cannot reproduce these If I run only the |
Attempting to address #2571
I've set the pointer mode to "device side" during handle creation. Since
gemmGroupedBatched
doesn't support device side pointer mode, it won't be usable. One workaround for this would be to add a new function to create a handle with host side mode, or add the pointer mode as an optional kwarg tohandle()
. Very open to feedback on this.I've set this up so that users can supply
CuRef
s of the appropriate result type to the level 1 functions for results. If that's not provided, the functions execute as they do today (synchronously). Similarly, for functions takingalpha
orbeta
scalar arguments, if the user providesCuRef
(actually aCuRefArray
), the functions will execute asynchronously and return instantly. If the user provides aNumber
, the behaviour is unchanged from today. I'm not married to this design and it can certainly be changed.cc @Jutho