Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enzyme sum derivatives #2471

Merged
merged 3 commits into from
Sep 4, 2024
Merged

Add Enzyme sum derivatives #2471

merged 3 commits into from
Sep 4, 2024

Conversation

wsmoses
Copy link
Contributor

@wsmoses wsmoses commented Aug 21, 2024

No description provided.

@wsmoses wsmoses force-pushed the rsum branch 2 times, most recently from 098f13d to c0bade1 Compare August 22, 2024 02:13
@wsmoses
Copy link
Contributor Author

wsmoses commented Aug 22, 2024

@maleadt mind giving this a review?

All tests pass

@maleadt
Copy link
Member

maleadt commented Aug 23, 2024

I can't review this; I don't have any Enzyme.jl experience, so the code looks foreign to me.

@maleadt maleadt requested a review from vchuravy August 23, 2024 05:02
@jgreener64
Copy link

With this PR, Enzyme main (44febc), GPUCompiler 0.27.1 and Julia 1.10.3 I get:

using Enzyme, CUDA
Enzyme.API.runtimeActivity!(true)
f(x, y) = sum(x .+ y)
x = CuArray(rand(5))
y = CuArray(rand(5))
dx = CuArray([1.0, 0.0, 0.0, 0.0, 0.0])
autodiff(Forward, f, Duplicated, Duplicated(x, dx), Const(y))
ERROR: Enzyme execution failed.
Enzyme: incorrect return type of prima/shadow forward custom rule - Duplicated{Float64} Type[Const{typeof(Core.kwcall)}, Const{typeof(GPUArrays._mapreduce)}, Const{typeof(identity)}, Const{typeof(Base.add_sum)}, Duplicated{CuArray{Float64, 1, CUDA.DeviceMemory}}] want just shadow type Duplicated{Float64} found Tuple{Float64, Float64}
Stacktrace:
 [1] mapreduce
   @ ~/.julia/packages/GPUArrays/qt4ax/src/host/mapreduce.jl:28
 [2] _sum
   @ ./reducedim.jl:1015
 [3] _sum
   @ ./reducedim.jl:1014
 [4] sum
   @ ./reducedim.jl:1010
 [5] f
   @ ./REPL[3]:1

Stacktrace:
  [1] mapreduce
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/mapreduce.jl:28 [inlined]
  [2] _sum
    @ ./reducedim.jl:1015 [inlined]
  [3] _sum
    @ ./reducedim.jl:1014 [inlined]
  [4] sum
    @ ./reducedim.jl:1010 [inlined]
  [5] f
    @ ./REPL[3]:1 [inlined]
  [6] fwddiffejulia_f_1404wrap
    @ ./REPL[3]:0
  [7] macro expansion
    @ ~/.julia/dev/Enzyme/src/compiler.jl:7099 [inlined]
  [8] enzyme_call
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6708 [inlined]
  [9] ForwardModeThunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6588 [inlined]
 [10] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:437 [inlined]
 [11] autodiff(::ForwardMode{FFIABI, false}, ::typeof(f), ::Type{Duplicated}, ::Duplicated{CuArray{…}}, ::Const{CuArray{…}})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:332
 [12] top-level scope
    @ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.

And for reverse mode:

autodiff(Reverse, f, Active, Duplicated(x, dx), Const(y))
ERROR: 
No create nofree of empty function (jl_gc_safe_enter) jl_gc_safe_enter)
 at context:   call fastcc void @julia__launch_configuration_979_26387([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %7, i64 noundef signext 0, { i64, {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(32) %45) #707, !dbg !1091 (julia__launch_configuration_979_26387)

Stacktrace:
 [1] launch_configuration
   @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56
 [2] #launch_heuristic#1204
   @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22
 [3] launch_heuristic
   @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15
 [4] _copyto!
   @ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:78
 [5] copyto!
   @ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:44
 [6] copy
   @ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:29
 [7] materialize
   @ ./broadcast.jl:903
 [8] f
   @ ./REPL[3]:1


Stacktrace:
  [1] launch_configuration
    @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56 [inlined]
  [2] #launch_heuristic#1204
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22 [inlined]
  [3] launch_heuristic
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15 [inlined]
  [4] _copyto!
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:78 [inlined]
  [5] copyto!
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:44 [inlined]
  [6] copy
    @ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:29 [inlined]
  [7] materialize
    @ ./broadcast.jl:903 [inlined]
  [8] f
    @ ./REPL[3]:1 [inlined]
  [9] diffejulia_f_24841wrap
    @ ./REPL[3]:0
 [10] macro expansion
    @ ~/.julia/dev/Enzyme/src/compiler.jl:7099 [inlined]
 [11] enzyme_call
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6708 [inlined]
 [12] CombinedAdjointThunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6585 [inlined]
 [13] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:320 [inlined]
 [14] autodiff(::ReverseMode{…}, ::typeof(f), ::Type{…}, ::Duplicated{…}, ::Const{…})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:332
 [15] top-level scope
    @ REPL[10]:1
Some type information was truncated. Use `show(err)` to see complete types.

@wsmoses
Copy link
Contributor Author

wsmoses commented Sep 2, 2024

yeah @jgreener64 this does so for arrays, not for broadcasted yet (so you'll need to actually create the .+ result

@wsmoses
Copy link
Contributor Author

wsmoses commented Sep 4, 2024

@maleadt now with CI happy again, can this get a merge?

@maleadt maleadt merged commit d72cdaa into JuliaGPU:master Sep 4, 2024
1 check passed
@maleadt maleadt added enhancement New feature or request extensions Stuff about package extensions. labels Sep 4, 2024
@wsmoses wsmoses deleted the rsum branch September 28, 2024 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request extensions Stuff about package extensions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants