-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make recode! type stable #407
make recode! type stable #407
Conversation
Test failures are surely unrelated |
Bump. This is a big performance improvement at no cost - could anyone take a look? |
@nalimilan should probably approve it. I looked at the code and I left one comment where I was not 100% sure that the change is OK. In other places it looks OK. |
src/recode.jl
Outdated
for j in 1:length(opt_pairs) | ||
p = opt_pairs[j] | ||
for j in 1:length(pairs) | ||
p = optimize_pair(pairs[j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks problematic when pairs[j]
is an array as it will create a Set
for each item. Have you investigated why map(optimize_pair, pairs)
isn't inferred correctly? In theory it should, and I think I had carefully checked performance when I wrote that function so I'm surprised it's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pairs[j]
is never an array, it is always a Pair
. I just got rid of the map
and moved it inside the loop, but it still iterates over the exact same thing, so I don't think that should ever be problematic.
I was also surprised that map(optimize_pair, pairs)
wasn't inferred, some change in the compiler must have triggered this. I stumbled into this easy fix, so then I didn't investigate further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean when the first element is an array to that the call hits optimize_pair(pair::Pair{<:AbstractArray})
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that currently we create a Set
once for each pair, and with this change we would create it for each pair for each entry in the input array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's in the nested loop. I don't know how I missed that.
Okay I dove into this a little more and was able to solve some problems but not all. I now moved bound checking and converting arrays to sets into a seperate function which then calls This already does a lot, but is still type unstable if the pairs have different types (e.g. both scalars and arrays). I'm not really sure how to solve that. EDIT: It doesn't seem to be type stable now anyway. I'm running this using CategoricalArrays
A = rand(1:100, 1000, 1000)
dest = copy(A)
pairs_range = (((i*10-9):(i*10)) => i for i in 1:9)
pairs_scalar = collect(i+90 => i for i in 1:9)
@time recode!(dest, A, pairs_range...);
@time recode!(dest, A, pairs_scalar...);
@time recode!(dest, A, pairs_scalar..., pairs_range...); and getting
on the first run and
on the second. I for the life of me can't understand why the performance of the second call get worse after compilation. |
The tests fail because the changes to |
Why do you need it to return |
Regarding the origin of the issue, what surprises me is that I don't see any type instability using @code_warntype recode!(similar(A, Int), A, nothing, 'b' => 1, 'a' => 2, 'c' => 3) So it's really hard to understand what's going on. |
Because I tried to collapse this
I tried the same things and am also pretty clueless at this point. Sometimes it seems to be type stable, and then when I run the same code again it allocates. |
OK. And does the approach you tried at ef10966 work? That sounds like the most acceptable fix. |
It does not, unfortunately. The only thing I've found that makes this type stable is to pass I've also been playing around with a different implementation that uses more multiple dispatch and no This already works nicely. function _myrecode!(dest::AbstractArray{T}, src::AbstractArray, default, pairsfirst::Tuple, pairlast::Tuple) where {T}
@inbounds for i in eachindex(dest)
dest[i] = recode_element(T, src[i], default, pairsfirst, T.(pairlast))
end
end
function recode_element(::Type{T}, x, default, recode_from, recode_to) where T
j = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
isnothing(j) ? recode_fallback(T, x, default) : T(recode_to[j])
end
recode_fallback(::Type{T}, x::Missing, default) where T =
throw(MissingException("missing value found, but dest does not support them: recode them to a supported value"))
recode_fallback(::Type{T}, x::Missing, default) where T >: Missing = missing
recode_fallback(::Type{T}, x, default) where T = T(default)
function recode_fallback(::Type{T}, x, ::Nothing) where T
try
T(x)
catch err
isa(err, MethodError) || rethrow(err)
throw(ArgumentError("cannot `convert` value $(repr(x)) (of type $(typeof(x))) to type of recoded levels ($T). " *
"This will happen with recode() when not all original levels are recoded " *
"(i.e. some are preserved) and their type is incompatible with that of recoded levels."))
end
return T(x)
end |
I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.
I'm not sure what's the advantage of using multiple dispatch rather than the current code TBH. |
True. Okay, this is already much better, but still not totally fixed if One thing I found out is that using CategoricalArrays
src = rand(1:20, 1000, 1000)
dest = copy(src)
destcat = CategoricalArray{Int}(undef, size(src))
srccat = categorical(src)
pairs = (i => i+1 for i in 1:8)
pairs_array = ([11,12] => 2, [13,14] => 3)
@time recode!(dest, src, nothing, pairs..., pairs_array...);
@time recode!(destcat, src, nothing, pairs..., pairs_array...);
@time recode!(dest, srccat, nothing, pairs..., pairs_array...); After compilation gives
|
If we want to avoid breaking any tests we can define |
What do you mean? It always returns a
Unfortunately I don't think this is correct in general I think, for example |
Actually after asking on Slack this is a known issue (probably a combination of JuliaLang/julia#32834 and JuliaLang/julia#54390). This is due to to use of varargs, which are inferred but not fully specialized on. Regarding the issue with j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from) Could you try this instead of changing the definition of |
This works. I wonder if this whole PR can be replaced by just that one |
Okay, it seems that this works, but not for julia 1.0. Do you know how to make it work there? Or maybe it would be okay to require 1.6? |
Yes I think at this point we can require Julia 1.6. |
This use of |
So I though that this
was going to be the same as
But apparently the former allocates whereas the latter doesn't. But using |
Sorry for the delay. I'm not too happy about requiring Julia 1.8, but you can use Compat.jl >= 3.37 for that. |
Are you still interested in finishing this? |
Sorry, it slipped through with all of the things to finish before the holidays. But yes I'll finish this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's close. So this is enough to fix performance in all problematic cases?
src/recode.jl
Outdated
vals = convert.(T, recode_to) | ||
vals = default === nothing ? vals : (vals..., default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change really make a difference for performance? It would seem it matters more to do it for pairmap
. It would also be worth checking that performance doesn't degrade when the tuple gets long, e.g. with 10 pairs.
Also if you do this the same change should be applied to other methods.
In terms of implementation, for type stability, don't change the type of vals
. Also, default
needs to be converted too.
vals = convert.(T, recode_to) | |
vals = default === nothing ? vals : (vals..., default) | |
if default === nothing | |
vals = convert.(T, recode_to) | |
else | |
vals = convert.(T, (recode_to..., default)) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really remember why I made this change so I'll just revert.
Thanks for those comments! I think I have incorporated all of them. All of the examples I gave in this thread are now type-stable and much faster! Let's see if tests pass (on my machine they do). |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Thanks! The performance improvement is impressive and the code is even a bit cleaner now. |
This whole PR is net negative 1 line of code somehow :). Thanks for reviewing and helping to figure out the |
This PR improves the type-stability of
recode!
, providing a huge performance improvement, in particular when recoding normal (not categorical) arrays.On this MWE
Before:
After:
For a ~200 times speed-up