Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polluted result using CUDA 3D rfft in Float32 data type #326

Open
doraemonho opened this issue Jul 16, 2022 · 28 comments
Open

Polluted result using CUDA 3D rfft in Float32 data type #326

doraemonho opened this issue Jul 16, 2022 · 28 comments

Comments

@doraemonho
Copy link
Contributor

Hi, I am working on the 3D RFFT but find out the result of rfft using CUDA is a bit weird in Float32 .

I define a function F(k) in k-space as a gaussian function like e^{-(k-k0)^2}e^{-i\phi}, where k0 is some number and phi is a random number array between 0 and 2pi. Here is the implementation

function GetFk(grid;kf = 2,σ²= 1)
    kx,ky,kz = grid.kr,grid.l,grid.m;
    k⁻¹  = @. √(grid.invKrsq);
    k    = @. √(grid.Krsq);
    Fk   = @. √(exp(-(k.-kf)^2/σ²)/2/π)*k⁻¹;
    randN = typeof(Fk) <: Array ? Base.rand : CUDA.rand;
    eⁱᶿ¹ = exp.(im*randN(T,GPUgrid.nkr,GPUgrid.nl,GPUgrid.nm)*2*π);
    return Fk.*eⁱᶿ¹;
end

If I do a rfft on this function, the result of CUDA rfft will overlay a moiré like pattern like below:
image

Here is the implemention:

using PyPlot,CUDA, FourierFlows
using FFTW
using LinearAlgebra: mul!, ldiv!

T = Float32;
GPUgrid = ThreeDGrid(GPU(),128,2π;T=T);
GFkh = GetFk(GPUgrid;kf = 2,σ²= 1);

GFk  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(GFk, GPUgrid.rfftplan, deepcopy(GFkh));

CPUgrid = ThreeDGrid(CPU(),128,2π;T=T);
CFkh = Array(GFkh);
CFk  = zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(CFk, CPUgrid.rfftplan, deepcopy(CFkh));

figure(figsize = (16,8))
subplot(121);title("GPU T =$T",size=16)
imshow(Array(GFk[:,:,1]));
subplot(122);title("CPU T =$T",size=16)
imshow(Array(CFk[:,:,1]))

However, if you change T from Float32 to Float64, the result would become normal again:
image

The problem itself is rare and would happen in this kind of function.

@navidcy
Copy link
Member

navidcy commented Jul 17, 2022

Hi @doraemonho — Nice! I’ll try to have a look at it soon.

@navidcy
Copy link
Member

navidcy commented Jul 17, 2022

Inside the GetFk function the GPUgrid is called instead the grid from the function’s argument. Is that the prob?

@doraemonho
Copy link
Contributor Author

doraemonho commented Jul 17, 2022

Thanks for taking a look!

Inside the GetFk function the GPUgrid is called instead the grid from the function’s argument. Is that the prob?

I rewrtie the function a bit to define the GPUgrid inside the function (showed below), but the result doesn't change.

function GetFk(N::Int,Lx::Number;T = Float32, dev = GPU(), kf = 2,σ²= 1)
    Fgrid    = ThreeDGrid(GPU(),N,Lx;T=T);
    kx,ky,kz = Fgrid.kr,Fgrid.l,Fgrid.m;
    k⁻¹  = @. √(Fgrid.invKrsq);
    k    = @. √(Fgrid.Krsq);
    Fk   = @. √(exp(-(k.-kf)^2/σ²)/2/π)*k⁻¹;
    randN = typeof(Fk) <: Array ? Base.rand : CUDA.rand;
    eⁱᶿ¹ = exp.(im*randN(T,Fgrid.nkr,Fgrid.nl,Fgrid.nm)*2*π);
    return Fk.*eⁱᶿ¹,Fgrid;
end

T = Float32;
N = 128
Lx= 2π
GFkh,GPUgrid = GetFk(N,Lx;T=T,kf = 8,σ²= 1);

GFk  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(GFk, GPUgrid.rfftplan, deepcopy(GFkh));

CPUgrid = ThreeDGrid(CPU(),N,Lx;T=T);
CFkh = Array(GFkh);
CFk  = zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
ldiv!(CFk, CPUgrid.rfftplan, deepcopy(CFkh));

figure(figsize = (16,8))
subplot(121);title("GPU T =$T",size=16)
imshow(Array(GFk[:,:,1]));
subplot(122);title("CPU T =$T",size=16)
imshow(Array(CFk[:,:,1]))

@navidcy
Copy link
Member

navidcy commented Jul 17, 2022

I’ll have a closer look in the morning from my laptop

@doraemonho
Copy link
Contributor Author

doraemonho commented Jul 17, 2022

It occurs to me that, the root of this problem is coming from the fft plan itself.
Now, I don't even use grid fft plan itself but define a rfft plan using FFTW, I can reproduce the polluted result

T = Float32;
GFkh_32,GPUgrid = GetFk(N,Lx;T=T,kf = 8,σ²= 1);
GFk32  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
GPUrfftplan_F32 = plan_rfft(GFk32);
ldiv!(GFk32, GPUrfftplan_F32, deepcopy(GFkh_32));


figure(figsize = (16,8))
subplot(121);title("GPU T =$T",size=16)
imshow(Array(GFk32[:,:,1]));

T = Float64;
GFkh_64,GPUgrid = GetFk(N,Lx;T=T,kf = 8,σ²= 1);
GFk64  = CUDA.zeros(T,GPUgrid.nx,GPUgrid.ny,GPUgrid.nz);
GPUrfftplan_F64 = plan_rfft(GFk64);
ldiv!(GFk64, GPUrfftplan_F64, deepcopy(GFkh_64));

subplot(122);title("GPU T =$T",size=16)
imshow(Array(GFk64[:,:,1]))

image

I am not sure if this is a FFTW.jl 's or CUDA.jl 's problem.

@doraemonho
Copy link
Contributor Author

Quick Update here
After a long trial and error, it seems that I am finally arriving to the root of problem.
It may has something to do with the fft plan of 2^n 3D real transform.
If you try switch the resolution of any 2^n 3D Map (128^3 in our case) to other even number (130^3).
The discrepancy between CPU and GPU rfft would vanish.
I suspect there may have a bug on the algorithm of implementing 2^n FP32 rfft but I am not sure if this is a FFTW problem or CUDA problem

@navidcy
Copy link
Member

navidcy commented Jul 18, 2022

Can we make sure it's not a plotting issue?

I tried to do some FFTs back and forth and couldn't find any disagreement between different data types. Have a look:

Here's an example just with FFTW and CUDA

CUDA + FFTW

using CUDA
using FFTW
using LinearAlgebra: mul!, ldiv!
using Test

for T in [Float64, Float32]
    nx, ny, nz = 6, 8, 10
    nk, nl, nm = Int(nx//2)+1, ny, nz

    a = randn(T, (nx, ny, nz))
    a_cu = CuArray(a)

    ah = randn(Complex{T}, (nk, nl, nm))
    ah_cu = CuArray(ah)

    plan = plan_rfft(a)
    plan_cu = plan_rfft(a_cu)

    bh = zeros(Complex{T}, (nk, nl, nm))
    bh_cu = CuArray(bh)

    c = zeros(T, (nx, ny, nz))
    c_cu = CuArray(c)

    mul!(bh, plan, a)
    ldiv!(c, plan, deepcopy(bh))

    mul!(bh_cu, plan_cu, a_cu)
    ldiv!(c_cu, plan_cu, deepcopy(bh_cu))

    @testset "CPU test for $T" begin
        @test bh == rfft(a)
        @test c == irfft(bh, nx)
    end

    @testset "GPU test for $T" begin
        @test bh_cu == rfft(a_cu)
        @test c_cu == irfft(bh_cu, nx)
    end
end

and here's an example with CUDA + FourierFlows

CUDA + FourierFlows

using CUDA
using LinearAlgebra: mul!, ldiv!
using FourierFlows
using Test

for T in [Float64, Float32]
    nx, ny, nz = 6, 8, 10

    grid = ThreeDGrid(nx, nx, ny, ny, nz, nz; T)
    grid_cu = ThreeDGrid(GPU(), nx, nx, ny, ny, nz, nz; T)

    a = randn(T, (grid.nx, grid.ny, grid.nz))
    a_cu = CuArray(a)

    ah = randn(Complex{eltype(grid)}, (grid.nkr, grid.nl, grid.nm))
    ah_cu = CuArray(ah)

    plan = grid.rfftplan
    plan_cu = grid_cu.rfftplan

    bh = zeros(Complex{eltype(grid)}, (grid.nkr, grid.nl, grid.nm))
    bh_cu = CuArray(bh)

    c = zeros(eltype(grid), (grid.nx, grid.ny, grid.nz))
    c_cu = CuArray(c)

    mul!(bh, plan, a)
    ldiv!(c, plan, deepcopy(bh))

    mul!(bh_cu, plan_cu, a_cu)
    ldiv!(c_cu, plan_cu, deepcopy(bh_cu))

    @testset "CPU test for $T" begin
        @test bh == rfft(a)
        @test c == irfft(bh, nx)
    end

    @testset "GPU test for $T" begin
        @test bh_cu == rfft(a_cu)
        @test c_cu == irfft(bh_cu, nx)
    end
end

All test pass for me. Same if I choose nx = ny = nz = 128.

@doraemonho
Copy link
Contributor Author

Can we make sure it's not a plotting issue?

I tried to do some FFTs back and forth and couldn't find any disagreement between different data types. Have a look:

Here's an example just with FFTW and CUDA

CUDA + FFTW
and here's an example with CUDA + FourierFlows

CUDA + FourierFlows
All test pass for me. Same if I choose nx = ny = nz = 128.

Thanks for checking!
Same result for me but I think this problem would only occurs if we satisfy the following condition :

  1. Doing rfft from k-space to real space
  2. The function that we are transferring is special ( like the spectral filtered e^{i\phi} k^{-2} )

I quite certain the problem is not coming the plot as the radial spectrum of FP32-CUDA found a spike at large k number.
I cannot check the result using your method as the FFT result is a little different (~1e-18) between GPU and CPU result.
However, I come up a alternative by using \approx instead of == inside the test. The Float64 is fine but fail in Float32.

using CUDA
using FFTW
using LinearAlgebra: mul!, ldiv!
using FourierFlows
using Test

nx, ny, nz = 128, 128, 128;       # define size of the array
θ = rand(θ,div(nx,2)+1, ny, nz)*2π; # define \theta \in [0,2\pi]

for T in [Float64,Float32]
    grid = ThreeDGrid(nx, nx, ny, ny, nz, nz; T)             
    grid_cu = ThreeDGrid(GPU(), nx, nx, ny, ny, nz, nz; T)

    θT  = convert(Array{T,3},θ)
    eⁱᶿk⁻² = @. exp.(im*θT)*grid.invKrsq;     #define the spectral filtered function in CPU
    eⁱᶿk⁻²_cu = CuArray(eⁱᶿk⁻²);               #define the spectral filtered function in GPU

    plan = grid.rfftplan
    plan_cu = grid_cu.rfftplan

    c = zeros(eltype(grid), (grid.nx, grid.ny, grid.nz))
    c_cu = CuArray(c);
    
    ldiv!(c, plan, deepcopy(eⁱᶿk⁻²))    
    ldiv!(c_cu, plan_cu, deepcopy(eⁱᶿk⁻²_cu))
    c_cu = Array(c_cu); #move the result from GPU-> CPU
    
    test_title = "eⁱᶿk⁻² CPU & GPUtest for " * string(T)

    @testset "$test_title" begin
        #Compare the result using ≈
        @test c_cu    c
    end 
end

@navidcy
Copy link
Member

navidcy commented Jul 21, 2022

Can you print the types of eⁱᶿk⁻² and eⁱᶿk⁻²_cu to make sure they are correct type?

@navidcy
Copy link
Member

navidcy commented Jul 21, 2022

Before and after the conversion to Array for eⁱᶿk⁻²_cu

@doraemonho
Copy link
Contributor Author

Before and after the conversion to Array for eⁱᶿk⁻²_cu

do you mean before/after the rift or before the testing process?
This is my type checking code and the result, as you can see, FP64 is fine but FP32 just fail

using CUDA
using FFTW
using LinearAlgebra: mul!, ldiv!
using FourierFlows
using Test

nx, ny, nz = 128, 128, 128;       # define size of the array
θ = rand(div(nx,2)+1, ny, nz)*2π; # define \theta \in [0,2\pi]

for T in [Float64,Float32]
    grid = ThreeDGrid(nx, nx, ny, ny, nz, nz; T)             
    grid_cu = ThreeDGrid(GPU(), nx, nx, ny, ny, nz, nz; T)

    θT  = convert(Array{T,3},θ)
    eⁱᶿk⁻² = @. exp.(im*θT)*grid.invKrsq;     #define the spectral filtered function in CPU
    eⁱᶿk⁻²_cu = CuArray(eⁱᶿk⁻²);              #define the spectral filtered function in GPU
    
    println("before the rfft")
    @show typeof(eⁱᶿk⁻²)
    @show typeof(eⁱᶿk⁻²_cu)
    
    plan = grid.rfftplan
    plan_cu = grid_cu.rfftplan

    c = zeros(eltype(grid), (grid.nx, grid.ny, grid.nz))
    c_cu = CuArray(c);
    
    
    ldiv!(c, plan, deepcopy(eⁱᶿk⁻²))    
    ldiv!(c_cu, plan_cu, deepcopy(eⁱᶿk⁻²_cu))
    c_cu = Array(c_cu); #move the result from GPU-> CPU
    
    
    println("before the test")
    @show typeof(c)
    @show typeof(c_cu)
    
    test_title = "eⁱᶿk⁻² CPU & GPU test for " * string(T)

    @testset "$test_title" begin
        #Compare the result using ≈
        @test c_cu    c
    end 
end

image

@navidcy
Copy link
Member

navidcy commented Aug 29, 2022

@doraemonho sorry for slacking... I went out on leave and now slowly coming back.

Did you have any progress on this or is it still outstanding?

@doraemonho
Copy link
Contributor Author

@doraemonho sorry for slacking... I went out on leave and now slowly coming back.

Did you have any progress on this or is it still outstanding?

@navidcy no worries, I still have this issue.
The way of circumventing it is to change the resolution from 2^n to 10*n or switch to FP64 calculation. The problem only occurs in this kind of function

@doraemonho
Copy link
Contributor Author

Just an update from what we were : -)
This may be a CuFFT bug that would occur after the version CuFFT v.10.3.
I reported it to NVIDIA and just heard back from them. They confirmed the issue and their CuFFT team is taking a look into it right now.
Not sure when will be solved but I will keep updating when I get back from them

@navidcy
Copy link
Member

navidcy commented Nov 8, 2022

@doraemonho thanks for this!

@nvlcambier
Copy link

Hi there! I'm working on cuFFT and following up from JuliaGPU/CUDA.jl#1656 (internal NVIDIA bug# 3847913).

Sorry to hijack your issues - I thought it would be easier and faster to chat here directly.
Let me know if you'd prefer to chat somewhere else (either JuliaGPU/CUDA.jl#1656 or in the bug ticket).

Do you have more information about what exactly is the input to the complex-to-real transform that is being done by cuFFT ?

The reason why this matters is the following.

When doing a DFT of length N with a real input, if x is the output and N is even:

Given this, in practice, real-to-complex transforms work on N real inputs and produce N/2+1 complex outputs. The complex output is such that the first two properties above are respected.

Complex-to-real transforms take N/2+1 complex input (y) and produce N real output and assume that

  • y[0] is real;
  • y[N/2] is real
    in memory.

One might assume that imag(y[0]) or imag(y[N/2]) are not even read by the kernels.
But in some (very few) case it is, because it can improve performance. This is usually not a problem because the input to a complex-to-real FFT is usually the output of a real-to-complex FFT, which satisfies this property.

What I suspect is that your input doesn't satisfy one of the above properties.
I've done some profiling of the Julia code and in the nx=128, FP32 case, I see GPU kernels which assume the above properties.

In JuliaGPU/CUDA.jl#1656 you mentionned that

  • This doesn't happen for FP64 --> this is because FP64 transforms don't use this algorithm
  • This doesn't happen for nx=130 --> this is because only powers of 2 use this algorithm

I've attached a cpp repro that shows this: when the last entry of the N/2+1 complex input to the complex-to-real FFT is not real (i.e., the imaginary part is non zero), in some cases (powers of 2 with enough batches on new-enough GPUs), the output will be wrong. If you compile and run it, for instance with CUDA 11.8, you will see

Symmetry error: 2.402546e-07
Error C2R output vs C2C output (with mistake ? 0): 1.307564e-07
Last value of input_c2r set to {6.389053,5.000000} instead of {6.389053, 0}
Error C2R output vs C2C output (with mistake ? 1): 1.566945e-02

I realize this is a subtle issue. It might also have nothing to do with this, but I thought this was worth a shot.

Let me know if this helps,

repro.tar.gz

@doraemonho
Copy link
Contributor Author

doraemonho commented Nov 11, 2022

Hi there! I'm working on cuFFT and following up from [JuliaGPU/CUDA.jl#1656] .....

Hi, thanks for checking it out!
As the problem started from this thread, I think we may better chat here?

Do you have more information about what exactly is the input to the complex-to-real transform that is being done by cuFFT?

The input is just a random phase function $e^{i\phi}k^{-2}$. The first term is just Euler's formula with some random phase angle bounded between 0 and $2\pi$, and $k^{-2}$ can be referred to $(\sqrt{3}\times N\times x_i )^{-2}$, where $x_i$ is discrete coordinate number bound between 0 and 1. This is one classical way of injecting turbulence energy in fluid simulation.

We always set $k^{-2} =0$ when $x_i = 0$ to avoid singularity but other terms are nonzeros and remain imaginary. So I think
y[0] = 0 doesn't hold in our case. I work out a test of setting $y[0] = 0$ and it works! (Code showed at bottom)
image

I have one more question to ask. I was trying CuFFT v.10.3.0 for the same setup but the result was fine. However, the symptom only occurs in the newer version of CuFFT library. Are complex to real transform algorithm changes during the recent version?

using CUDA
using FFTW
using Random
using LinearAlgebra: mul!, ldiv!
using Test


function getk⁻²_and_rfftplan(dev::Dev; 
                             nx = 64, ny = nx, nz = nx, Lx = 2π, Ly = Lx, Lz = Lx,
                             T = Float32, nthreads=Sys.CPU_THREADS, effort=FFTW.MEASURE) where Dev
    ArrayType = dev == "CPU" ? Array : CuArray;
    
    # Define the rfft parameters
    nk = nx
    nl = ny
    nm = nz
    nkr = Int(nx/2 + 1)  
    # Wavenubmer
    k = ArrayType{T}(reshape( fftfreq(nx, 2π/Lx*nx), (nk, 1, 1)))
    l = ArrayType{T}(reshape( fftfreq(ny, 2π/Ly*ny), (1, nl, 1)))
    m = ArrayType{T}(reshape( fftfreq(nz, 2π/Lz*nz), (1, 1, nm)))
    kr = ArrayType{T}(reshape(rfftfreq(nx, 2π/Lx*nx), (nkr, 1, 1)))

       Ksq = @. k^2 + l^2 + m^2
    invKsq = @. 1 / Ksq
    CUDA.@allowscalar  invKsq[1, 1, 1] = 0;

       Krsq = @. kr^2 + l^2 + m^2
    invKrsq = @. 1 / Krsq
    CUDA.@allowscalar invKrsq[1, 1, 1] = 0;

    FFTW.set_num_threads(nthreads);
    rfftplan = plan_rfft(ArrayType{T, 3}(undef, nx, ny, nz))
    
    return invKrsq,rfftplan
end

# Set up the initial condition
Random.seed!(1234);        # Set up the random seed
θ = rand(div(nx,2)+1, ny, nz)*2π; # define \theta \in [0,2\pi]
nx = ny = nz= 128
T = Float32

  k⁻²,  rfftplan = getk⁻²_and_rfftplan("CPU"; nx = nx, Lx = 2π, ny = ny, nz = ny, T =  T)             
k⁻²_c,rfftplan_c = getk⁻²_and_rfftplan("GPU"; nx = nx, Lx = 2π, ny = ny, nz = ny, T =  T)

θT  = im.*convert(Array{T,3},θ)
eⁱᶿk⁻² = @. exp.(θT)*k⁻²;        #define the spectral filtered function in CPU
eⁱᶿk⁻²_cu = CuArray(eⁱᶿk⁻²);     #define the spectral filtered function in GPU

c_b4_cleaning = CUDA.zeros(T, nx, ny, nz);
c_af_cleaning = CUDA.zeros(T, nx, ny, nz);

# Work out the rfft before cleaning 0 terms
ldiv!(c_b4_cleaning, rfftplan_c, deepcopy(eⁱᶿk⁻²_cu));

#Clean y[N/2] terms (index 0 = 1 julia)
@. eⁱᶿk⁻²_cu[1,:,:] .= 0;
ldiv!(c_af_cleaning, rfftplan_c, deepcopy(eⁱᶿk⁻²_cu));

c_b4_cleaning = Array(c_b4_cleaning); #move the result from GPU-> CPU
c_af_cleaning = Array(c_af_cleaning); #move the result from GPU-> CPU

figure(figsize=(12,6))
subplot("121");
imshow(c_b4_cleaning[:,:,100]);title("Before cleaning y[1,:,:]")
subplot("122");
imshow(c_af_cleaning[:,:,100]);title("After cleaning y[1,:,:]")

@nvlcambier
Copy link

nvlcambier commented Nov 11, 2022

Are complex to real transform algorithm changes during the recent version?

Yes, there are occasional changes. I doesn't surprise me that you didn't see this behavior in older versions.

I work out a test of setting and it works! (Code showed at bottom)

Awesome!

If that solves your problem, we'd appreciate if you can follow up on the Bug ticket.

In think in summary

  • If you want to do a complex-to-real transform, ensure the first and last element of the complex input of length (N/2)/1 are real, otherwise the result is essentially undefined;
  • If you can't have those elements real, I would just do a complex-to-complex with a complex input of length N. But then the output won't necessarily be real.

@doraemonho
Copy link
Contributor Author

Yes, there are occasional changes. I doesn't surprise me that you didn't see this behavior in older versions.

Thanks for the information! I appreciate your help. I would follow the ticket then.

@navidcy would you think we should keep this issue open or just close it?

@navidcy
Copy link
Member

navidcy commented Nov 14, 2022

Let’s keep it open for now so that people can see that there is an issue. Perhaps we should issue a warning if someone makes a ThreeDGrid w Float32 on GPU. The issue is only for 3D, right?

@doraemonho
Copy link
Contributor Author

doraemonho commented Nov 14, 2022

Let’s keep it open for now so that people can see that there is an issue. Perhaps we should issue a warning if someone makes a ThreeDGrid w Float32 on GPU. The issue is only for 3D, right?

From the description, it seems to apply to all cases(1D/2D/3D) as long as the Fourier transformed $x_{1}$ and $x_{N/2+1}$ is not 0 before they do the Complex to Real transform.

@nvlcambier
Copy link

This applies to 1D, 2D and 3D yes.

If the imaginary part of x[0] and x[N/2+1] is not zero, computing a complex-to-real DFT doesn't really make sense since the output of the DFT isn't guaranteed to be real. If you still want to compute the DFT of such signal, I would suggest to do a complex-to-complex DFT. But the output might not be real.

FYI, this is also documented in section 2.3 of cuFFT's online documentation

https://docs.nvidia.com/cuda/cufft/index.html#fft-types
Screen Shot 2022-11-14 at 8 26 58 AM

@navidcy
Copy link
Member

navidcy commented Nov 16, 2022

I'm confused a bit.

If we start with a real signal and Fourier transform it then the components that correspond to 0-th frequency
and to $N/2+1$-frequency aren't they real?

@navidcy
Copy link
Member

navidcy commented Nov 16, 2022

$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using LinearAlgebra

julia> using CUDA

julia> using CUDA.CUFFT

julia> a = cu(rand(8))
8-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.4507753
 0.06592999
 0.96525353
 0.6843841
 0.94825345
 0.046085697
 0.38072583
 0.8913641

julia> rfft(a)
5-element CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}:
   4.4327717f0 + 0.0f0im
 -0.33708918f0 - 0.4522028f0im
 0.053049427f0 + 1.4637324f0im
 -0.65786713f0 + 0.71685266f0im
   1.0572441f0 + 0.0f0im

julia> ah =complex(cu(zeros(5)))
5-element CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}:
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im
 0.0f0 + 0.0f0im

julia> rfftplan = plan_rfft(CuArray{Float32, 1}(undef, 8))
CUFFT real-to-complex forward plan for 8-element CuArray of Float32

julia> mul!(ah, rfftplan, a)
5-element CuArray{ComplexF32, 1, CUDA.Mem.DeviceBuffer}:
   4.4327717f0 + 0.0f0im
 -0.33708918f0 - 0.4522028f0im
 0.053049427f0 + 1.4637324f0im
 -0.65786713f0 + 0.71685266f0im
   1.0572441f0 + 0.0f0im

julia> b = deepcopy(a*0)
8-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

julia> ldiv!(b, rfftplan, ah)
8-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 0.45077527
 0.06593
 0.9652535
 0.68438405
 0.9482534
 0.046085723
 0.38072583
 0.89136404

@navidcy
Copy link
Member

navidcy commented Nov 16, 2022

(same results in 2D and 3D...)

@navidcy
Copy link
Member

navidcy commented Nov 16, 2022

OK, I see... the issue comes when we generate the ah and not take ah as the rFFT of a real array. That's perfectly reasonable and indeed it's something expected.

But why that's not an issue with Float64?

@nvlcambier
Copy link

Because cuFFT FP64 transforms use (slightly) different algorithms, so the issue is not visible.

@navidcy
Copy link
Member

navidcy commented Jul 10, 2023

I guess it seems that a warning should at least be displayed when user does give an N/2-Fourier component with non-zero imaginary part? I guess by CUDA.jl?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants