Skip to content

Conversation

yolhan83
Copy link

@yolhan83 yolhan83 commented Aug 19, 2025

This PR adds mean and var to KA, I also added the files for implementing the rest if you don't think it should be there don't hesitate.

PS : to implement median, we should first implement partialsort (and even multi dim partialsort), hopping the sort-pro here can do it.

@yolhan83
Copy link
Author

note : some backend don't support f64, need to differ from Statistics in promoting integers

@yolhan83
Copy link
Author

could we have a preffered_float(backend) ?

@yolhan83
Copy link
Author

OneAPI 1.10 seems unrelated, var extremly slow on Metal ?

@yolhan83
Copy link
Author

Metal perf fixed

@yolhan83
Copy link
Author

sorry for all the commits I think its ok now

@yolhan83
Copy link
Author

yolhan83 commented Aug 19, 2025

cpu var is really slow, mainly due to

for sl in eachslice(src,dims=dims)
    sl .-= selectdim(m,dims,1)
end

being slow when the dimension of the reduced dim is high.
1- Is there a better way to write it on cpu (even a kernel would be better here)
2- Is there a way to avoid it when dealing with array of more than 3 dimensions (cpu or gpu)

@yolhan83
Copy link
Author

yolhan83 commented Aug 19, 2025

ok only thing missnig is nice cpu handling, gpu support any dimension now (even though feels bad to not use shared mem for this problem, we access m a lot too much)

@yolhan83
Copy link
Author

yolhan83 commented Aug 19, 2025

Var still issue on CPU :
statistics_var Float32
statistics_var Int64
statistics_var UInt32
statistics_mean Float32
statistics_mean Int64
statistics_mean UInt32

CUDA looks really nice :
statistics_var Int64
statistics_var UInt32
statistics_mean Float32
statistics_mean Int64
statistics_mean UInt32
statistics_var Float32

@Moelf
Copy link

Moelf commented Aug 19, 2025

fix #62 I guess

@yolhan83
Copy link
Author

yes trying to go through the CPU bad var before its ok to review

@yolhan83
Copy link
Author

getting better
statistics_var Float32
statistics_var Int64
statistics_var UInt32

@yolhan83
Copy link
Author

@yolhan83
Copy link
Author

yolhan83 commented Aug 19, 2025

updated benchmark :
CPU :
statistics_var Float32
statistics_var Int64
statistics_var UInt32
GPU :
statistics_var Float32
statistics_var Int64
statistics_var UInt32

@yolhan83
Copy link
Author

ok now ready to review :)

test/runtests.jl Outdated
include("accumulate.jl")
include("predicates.jl")
include("binarysearch.jl")
include("binarysearch.jl")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not removing the last newline character of the file (here and everywhere else) would be nice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at https://github.com/JuliaGPU/AcceleratedKernels.jl/pull/64/files you'll easily spot other files with missing ending newline (github highlights them with a special symbol). I believe vscode by default does this idiotic thing of removing the last newline, which makes very easy to guess that someone is using that editor. I hope there's an option to stop it doing stupid things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more symbols, I also removed non implemented statistics files for now, we will add them as they are implemented I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants