Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make recode! type stable #407

Merged
merged 17 commits into from
Jan 3, 2025

Conversation

tiemvanderdeure
Copy link
Contributor

This PR improves the type-stability of recode!, providing a huge performance improvement, in particular when recoding normal (not categorical) arrays.

On this MWE

A = (rand('a':'c', 100_000))
@profview recode(A, 'b' => 1, 'a' => 2, 'c' => 3)

Before:
billede
After:
billede

For a ~200 times speed-up

@tiemvanderdeure
Copy link
Contributor Author

Test failures are surely unrelated

@tiemvanderdeure
Copy link
Contributor Author

Bump. This is a big performance improvement at no cost - could anyone take a look?

src/recode.jl Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Nov 19, 2024

@nalimilan should probably approve it.

I looked at the code and I left one comment where I was not 100% sure that the change is OK. In other places it looks OK.

src/recode.jl Outdated
for j in 1:length(opt_pairs)
p = opt_pairs[j]
for j in 1:length(pairs)
p = optimize_pair(pairs[j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks problematic when pairs[j] is an array as it will create a Set for each item. Have you investigated why map(optimize_pair, pairs) isn't inferred correctly? In theory it should, and I think I had carefully checked performance when I wrote that function so I'm surprised it's not the case.

Copy link
Contributor Author

@tiemvanderdeure tiemvanderdeure Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pairs[j] is never an array, it is always a Pair. I just got rid of the map and moved it inside the loop, but it still iterates over the exact same thing, so I don't think that should ever be problematic.

I was also surprised that map(optimize_pair, pairs) wasn't inferred, some change in the compiler must have triggered this. I stumbled into this easy fix, so then I didn't investigate further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean when the first element is an array to that the call hits optimize_pair(pair::Pair{<:AbstractArray}).

Copy link
Member

@nalimilan nalimilan Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that currently we create a Set once for each pair, and with this change we would create it for each pair for each entry in the input array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's in the nested loop. I don't know how I missed that.

@tiemvanderdeure
Copy link
Contributor Author

tiemvanderdeure commented Nov 20, 2024

Okay I dove into this a little more and was able to solve some problems but not all.

I now moved bound checking and converting arrays to sets into a seperate function which then calls _recode! which has the rest of the code. I also added some more dispatches to recode_in as this was type unstable for scalars even after this change.

This already does a lot, but is still type unstable if the pairs have different types (e.g. both scalars and arrays). I'm not really sure how to solve that.

EDIT: It doesn't seem to be type stable now anyway. I'm running this

using CategoricalArrays
A = rand(1:100, 1000, 1000)
dest = copy(A)
pairs_range = (((i*10-9):(i*10)) => i for i in 1:9)
pairs_scalar = collect(i+90 => i for i in 1:9)
@time recode!(dest, A, pairs_range...);
@time recode!(dest, A, pairs_scalar...);
@time recode!(dest, A, pairs_scalar..., pairs_range...);

and getting

  0.312937 seconds (192.78 k allocations: 10.059 MiB, 72.57% compilation time)
  0.091321 seconds (43.72 k allocations: 2.161 MiB, 73.41% compilation time)
  1.531255 seconds (5.70 M allocations: 92.121 MiB, 6.26% gc time, 11.30% compilation time)

on the first run and

  0.053362 seconds (78 allocations: 4.766 KiB)
  0.392593 seconds (89.91 k allocations: 1.373 MiB)
  0.792653 seconds (5.58 M allocations: 85.128 MiB, 1.24% gc time)

on the second. I for the life of me can't understand why the performance of the second call get worse after compilation.

@tiemvanderdeure
Copy link
Contributor Author

The tests fail because the changes to recode_in mean that recode_in(missing, missing) is now true. I don't know if recode_in is considered internal? It's not an ideal solution, in any case.

@nalimilan
Copy link
Member

The tests fail because the changes to recode_in mean that recode_in(missing, missing) is now true. I don't know if recode_in is considered internal? It's not an ideal solution, in any case.

Why do you need it to return true rather than false as before?

@nalimilan
Copy link
Member

Regarding the origin of the issue, what surprises me is that I don't see any type instability using @descend or a direct invocation:

@code_warntype recode!(similar(A, Int), A, nothing, 'b' => 1, 'a' => 2, 'c' => 3)

So it's really hard to understand what's going on.

@tiemvanderdeure
Copy link
Contributor Author

Why do you need it to return true rather than false as before?

Because I tried to collapse this x ≅ p.first || recode_in(x, p.first) into just one function, since we don't always need to do both checks anyway and this was the line that was marked type unstable. But it doesn't seem to work anyway.

So it's really hard to understand what's going on.

I tried the same things and am also pretty clueless at this point. Sometimes it seems to be type stable, and then when I run the same code again it allocates.

@nalimilan
Copy link
Member

OK. And does the approach you tried at ef10966 work? That sounds like the most acceptable fix.

@tiemvanderdeure
Copy link
Contributor Author

And does the approach you tried at ef10966 work?

It does not, unfortunately.

The only thing I've found that makes this type stable is to pass first.(pairs) and last.(pairs each as their own Tuple. This even works for mixed types. The downside is that this scales poorly with the number of pairs, but I don't know how big of an issue that is in practice?

I've also been playing around with a different implementation that uses more multiple dispatch and no @goto, but I can't quite figure out how to make it work with the categorical arrays.

This already works nicely.

function _myrecode!(dest::AbstractArray{T}, src::AbstractArray, default, pairsfirst::Tuple, pairlast::Tuple) where {T}
    @inbounds for i in eachindex(dest)
        dest[i] = recode_element(T, src[i], default, pairsfirst, T.(pairlast))
    end
end

function recode_element(::Type{T}, x, default, recode_from, recode_to) where T
    j = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
    isnothing(j) ? recode_fallback(T, x, default) : T(recode_to[j])
end

recode_fallback(::Type{T}, x::Missing, default) where T = 
    throw(MissingException("missing value found, but dest does not support them: recode them to a supported value"))
recode_fallback(::Type{T}, x::Missing, default) where T >: Missing = missing
recode_fallback(::Type{T}, x, default) where T = T(default)
function recode_fallback(::Type{T}, x, ::Nothing) where T
    try
        T(x)
    catch err
        isa(err, MethodError) || rethrow(err)
        throw(ArgumentError("cannot `convert` value $(repr(x)) (of type $(typeof(x))) to type of recoded levels ($T). " *
                            "This will happen with recode() when not all original levels are recoded " *
                            "(i.e. some are preserved) and their type is incompatible with that of recoded levels."))
    end
    return T(x)
end

@nalimilan
Copy link
Member

The only thing I've found that makes this type stable is to pass first.(pairs) and last.(pairs each as their own Tuple. This even works for mixed types. The downside is that this scales poorly with the number of pairs, but I don't know how big of an issue that is in practice?

I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.

I've also been playing around with a different implementation that uses more multiple dispatch and no @goto, but I can't quite figure out how to make it work with the categorical arrays.

I'm not sure what's the advantage of using multiple dispatch rather than the current code TBH. @goto is fine in simple cases like this one.

@tiemvanderdeure
Copy link
Contributor Author

tiemvanderdeure commented Nov 22, 2024

I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.

True.

Okay, this is already much better, but still not totally fixed if src is categorical and dest is not.

One thing I found out is that recode_in wasn't type stable with the dispatch on Set if it is defined as y in set. I also added some dispatches for identical types, as we then don't have to loop (the isequal check should always be enough) and it also allocated for some types (such as Char).

using CategoricalArrays
src = rand(1:20, 1000, 1000)
dest = copy(src)
destcat = CategoricalArray{Int}(undef, size(src))
srccat = categorical(src)
pairs = (i => i+1 for i in 1:8)
pairs_array = ([11,12] => 2, [13,14] => 3)

@time recode!(dest, src, nothing, pairs..., pairs_array...);
@time recode!(destcat, src, nothing, pairs..., pairs_array...);
@time recode!(dest, srccat, nothing, pairs..., pairs_array...);

After compilation gives

  0.033817 seconds (34 allocations: 2.266 KiB)

  0.049040 seconds (80 allocations: 8.812 KiB)

  0.340654 seconds (2.95 M allocations: 118.264 MiB, 59.14% gc time)

@tiemvanderdeure
Copy link
Contributor Author

If we want to avoid breaking any tests we can define recode_in(x::T, y::T) where T = x == y instead, even though it adds some unnecessary overhead.

@nalimilan
Copy link
Member

One thing I found out is that recode_in wasn't type stable with the dispatch on Set if it is defined as y in set.

What do you mean? It always returns a Bool AFAICT.

I also added some dispatches for identical types, as we then don't have to loop (the isequal check should always be enough) and it also allocated for some types (such as Char).

Unfortunately I don't think this is correct in general I think, for example Any[1] in Any[[1]]. Why does the current code allocate for Char?

@tiemvanderdeure
Copy link
Contributor Author

Honestly I'm just going off trial-and-error here, I don't understand the compiler well enough to make sense of this. Some of these results I'm getting seem inconsistent so there might be some heuristics at play that I'm not so familiar with (the compiler giving up when Tuples become bigger than a certain number or something).

There are other things I don't understand like findfirst being much faster than the for loop.

Any[1] in Any[[1]]

This won't be a problem currently because Any[[1]] will be converted to a Set. I can't think of any cases where this would be a problem and the only test that fails is the one that explicitly tests that recode_in(1,1) == true. That said I'm sure there must be a better solution out there, it's just been a surprisingly though nut to crack and this is just the solution I found.

To reproduce the issues that I was running into:

using CategoricalArrays
A = (rand('a':'z', 1000, 1000))
recode(A, nothing, 'e' => 3, ['a', 'b', 'c'] => 1, ['x', 'y'] => 2)
@profview recode(A, nothing, 'e' => 3, ['a', 'b', 'c'] => 1, ['x', 'y'] => 2)

With @inline recode_in(x, set::Set) = x in set and/or without @inline recode_in(::T, ::T) where T = false, this gives a flame graph like this:
billede

With both of those changes it's completely blue.

@nalimilan
Copy link
Member

Actually after asking on Slack this is a known issue (probably a combination of JuliaLang/julia#32834 and JuliaLang/julia#54390). This is due to to use of varargs, which are inferred but not fully specialized on.

Regarding the issue with recode_in (which is related to JuliaLang/julia#56587), it seems that allocations can be fixed using this pattern:

j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

Could you try this instead of changing the definition of recode_in?

@tiemvanderdeure
Copy link
Contributor Author

This works. I wonder if this whole PR can be replaced by just that one @inline

@tiemvanderdeure
Copy link
Contributor Author

Okay, it seems that this works, but not for julia 1.0. Do you know how to make it work there? Or maybe it would be okay to require 1.6?

@nalimilan
Copy link
Member

Yes I think at this point we can require Julia 1.6.

@tiemvanderdeure
Copy link
Contributor Author

This use of @inline requires 1.8, actually. So we need to define a separate function

https://docs.julialang.org/en/v1/base/base/#Base.@inline

@tiemvanderdeure
Copy link
Contributor Author

So I though that this

@inline findfirstrecode(x, recode_from) = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)

was going to be the same as

j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

But apparently the former allocates whereas the latter doesn't. But using @inline as in the latter requires julia 1.8. Would requiring julia 1.8 be okay with you @nalimilan ?

@nalimilan
Copy link
Member

Sorry for the delay. I'm not too happy about requiring Julia 1.8, but you can use Compat.jl >= 3.37 for that.

@nalimilan
Copy link
Member

Are you still interested in finishing this?

@tiemvanderdeure
Copy link
Contributor Author

Sorry, it slipped through with all of the things to finish before the holidays. But yes I'll finish this!

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's close. So this is enough to fix performance in all problematic cases?

src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated
Comment on lines 108 to 109
vals = convert.(T, recode_to)
vals = default === nothing ? vals : (vals..., default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change really make a difference for performance? It would seem it matters more to do it for pairmap. It would also be worth checking that performance doesn't degrade when the tuple gets long, e.g. with 10 pairs.

Also if you do this the same change should be applied to other methods.

In terms of implementation, for type stability, don't change the type of vals. Also, default needs to be converted too.

Suggested change
vals = convert.(T, recode_to)
vals = default === nothing ? vals : (vals..., default)
if default === nothing
vals = convert.(T, recode_to)
else
vals = convert.(T, (recode_to..., default))
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really remember why I made this change so I'll just revert.

src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
@tiemvanderdeure
Copy link
Contributor Author

Thanks for those comments! I think I have incorporated all of them. All of the examples I gave in this thread are now type-stable and much faster!

Let's see if tests pass (on my machine they do).

src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Outdated Show resolved Hide resolved
src/recode.jl Show resolved Hide resolved
@nalimilan nalimilan merged commit 3e0d056 into JuliaData:master Jan 3, 2025
15 of 18 checks passed
@nalimilan
Copy link
Member

Thanks! The performance improvement is impressive and the code is even a bit cleaner now.

@tiemvanderdeure
Copy link
Contributor Author

This whole PR is net negative 1 line of code somehow :). Thanks for reviewing and helping to figure out the @inline trick, I at least learned something new there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants