Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make recode! type stable #407

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

tiemvanderdeure
Copy link

This PR improves the type-stability of recode!, providing a huge performance improvement, in particular when recoding normal (not categorical) arrays.

On this MWE

A = (rand('a':'c', 100_000))
@profview recode(A, 'b' => 1, 'a' => 2, 'c' => 3)

Before:
billede
After:
billede

For a ~200 times speed-up

@tiemvanderdeure
Copy link
Author

Test failures are surely unrelated

@tiemvanderdeure
Copy link
Author

Bump. This is a big performance improvement at no cost - could anyone take a look?

src/recode.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Nov 19, 2024

@nalimilan should probably approve it.

I looked at the code and I left one comment where I was not 100% sure that the change is OK. In other places it looks OK.

src/recode.jl Outdated
for j in 1:length(opt_pairs)
p = opt_pairs[j]
for j in 1:length(pairs)
p = optimize_pair(pairs[j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks problematic when pairs[j] is an array as it will create a Set for each item. Have you investigated why map(optimize_pair, pairs) isn't inferred correctly? In theory it should, and I think I had carefully checked performance when I wrote that function so I'm surprised it's not the case.

Copy link
Author

@tiemvanderdeure tiemvanderdeure Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pairs[j] is never an array, it is always a Pair. I just got rid of the map and moved it inside the loop, but it still iterates over the exact same thing, so I don't think that should ever be problematic.

I was also surprised that map(optimize_pair, pairs) wasn't inferred, some change in the compiler must have triggered this. I stumbled into this easy fix, so then I didn't investigate further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean when the first element is an array to that the call hits optimize_pair(pair::Pair{<:AbstractArray}).

Copy link
Member

@nalimilan nalimilan Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that currently we create a Set once for each pair, and with this change we would create it for each pair for each entry in the input array.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's in the nested loop. I don't know how I missed that.

@tiemvanderdeure
Copy link
Author

tiemvanderdeure commented Nov 20, 2024

Okay I dove into this a little more and was able to solve some problems but not all.

I now moved bound checking and converting arrays to sets into a seperate function which then calls _recode! which has the rest of the code. I also added some more dispatches to recode_in as this was type unstable for scalars even after this change.

This already does a lot, but is still type unstable if the pairs have different types (e.g. both scalars and arrays). I'm not really sure how to solve that.

EDIT: It doesn't seem to be type stable now anyway. I'm running this

using CategoricalArrays
A = rand(1:100, 1000, 1000)
dest = copy(A)
pairs_range = (((i*10-9):(i*10)) => i for i in 1:9)
pairs_scalar = collect(i+90 => i for i in 1:9)
@time recode!(dest, A, pairs_range...);
@time recode!(dest, A, pairs_scalar...);
@time recode!(dest, A, pairs_scalar..., pairs_range...);

and getting

  0.312937 seconds (192.78 k allocations: 10.059 MiB, 72.57% compilation time)
  0.091321 seconds (43.72 k allocations: 2.161 MiB, 73.41% compilation time)
  1.531255 seconds (5.70 M allocations: 92.121 MiB, 6.26% gc time, 11.30% compilation time)

on the first run and

  0.053362 seconds (78 allocations: 4.766 KiB)
  0.392593 seconds (89.91 k allocations: 1.373 MiB)
  0.792653 seconds (5.58 M allocations: 85.128 MiB, 1.24% gc time)

on the second. I for the life of me can't understand why the performance of the second call get worse after compilation.

@tiemvanderdeure
Copy link
Author

The tests fail because the changes to recode_in mean that recode_in(missing, missing) is now true. I don't know if recode_in is considered internal? It's not an ideal solution, in any case.

@nalimilan
Copy link
Member

The tests fail because the changes to recode_in mean that recode_in(missing, missing) is now true. I don't know if recode_in is considered internal? It's not an ideal solution, in any case.

Why do you need it to return true rather than false as before?

@nalimilan
Copy link
Member

Regarding the origin of the issue, what surprises me is that I don't see any type instability using @descend or a direct invocation:

@code_warntype recode!(similar(A, Int), A, nothing, 'b' => 1, 'a' => 2, 'c' => 3)

So it's really hard to understand what's going on.

@tiemvanderdeure
Copy link
Author

Why do you need it to return true rather than false as before?

Because I tried to collapse this x ≅ p.first || recode_in(x, p.first) into just one function, since we don't always need to do both checks anyway and this was the line that was marked type unstable. But it doesn't seem to work anyway.

So it's really hard to understand what's going on.

I tried the same things and am also pretty clueless at this point. Sometimes it seems to be type stable, and then when I run the same code again it allocates.

@nalimilan
Copy link
Member

OK. And does the approach you tried at ef10966 work? That sounds like the most acceptable fix.

@tiemvanderdeure
Copy link
Author

And does the approach you tried at ef10966 work?

It does not, unfortunately.

The only thing I've found that makes this type stable is to pass first.(pairs) and last.(pairs each as their own Tuple. This even works for mixed types. The downside is that this scales poorly with the number of pairs, but I don't know how big of an issue that is in practice?

I've also been playing around with a different implementation that uses more multiple dispatch and no @goto, but I can't quite figure out how to make it work with the categorical arrays.

This already works nicely.

function _myrecode!(dest::AbstractArray{T}, src::AbstractArray, default, pairsfirst::Tuple, pairlast::Tuple) where {T}
    @inbounds for i in eachindex(dest)
        dest[i] = recode_element(T, src[i], default, pairsfirst, T.(pairlast))
    end
end

function recode_element(::Type{T}, x, default, recode_from, recode_to) where T
    j = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
    isnothing(j) ? recode_fallback(T, x, default) : T(recode_to[j])
end

recode_fallback(::Type{T}, x::Missing, default) where T = 
    throw(MissingException("missing value found, but dest does not support them: recode them to a supported value"))
recode_fallback(::Type{T}, x::Missing, default) where T >: Missing = missing
recode_fallback(::Type{T}, x, default) where T = T(default)
function recode_fallback(::Type{T}, x, ::Nothing) where T
    try
        T(x)
    catch err
        isa(err, MethodError) || rethrow(err)
        throw(ArgumentError("cannot `convert` value $(repr(x)) (of type $(typeof(x))) to type of recoded levels ($T). " *
                            "This will happen with recode() when not all original levels are recoded " *
                            "(i.e. some are preserved) and their type is incompatible with that of recoded levels."))
    end
    return T(x)
end

@nalimilan
Copy link
Member

The only thing I've found that makes this type stable is to pass first.(pairs) and last.(pairs each as their own Tuple. This even works for mixed types. The downside is that this scales poorly with the number of pairs, but I don't know how big of an issue that is in practice?

I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.

I've also been playing around with a different implementation that uses more multiple dispatch and no @goto, but I can't quite figure out how to make it work with the categorical arrays.

I'm not sure what's the advantage of using multiple dispatch rather than the current code TBH. @goto is fine in simple cases like this one.

@tiemvanderdeure
Copy link
Author

tiemvanderdeure commented Nov 22, 2024

I'd say it's OK, anyway it's not more costly for the compiler than a single tuple of pairs.

True.

Okay, this is already much better, but still not totally fixed if src is categorical and dest is not.

One thing I found out is that recode_in wasn't type stable with the dispatch on Set if it is defined as y in set. I also added some dispatches for identical types, as we then don't have to loop (the isequal check should always be enough) and it also allocated for some types (such as Char).

using CategoricalArrays
src = rand(1:20, 1000, 1000)
dest = copy(src)
destcat = CategoricalArray{Int}(undef, size(src))
srccat = categorical(src)
pairs = (i => i+1 for i in 1:8)
pairs_array = ([11,12] => 2, [13,14] => 3)

@time recode!(dest, src, nothing, pairs..., pairs_array...);
@time recode!(destcat, src, nothing, pairs..., pairs_array...);
@time recode!(dest, srccat, nothing, pairs..., pairs_array...);

After compilation gives

  0.033817 seconds (34 allocations: 2.266 KiB)

  0.049040 seconds (80 allocations: 8.812 KiB)

  0.340654 seconds (2.95 M allocations: 118.264 MiB, 59.14% gc time)

@tiemvanderdeure
Copy link
Author

If we want to avoid breaking any tests we can define recode_in(x::T, y::T) where T = x == y instead, even though it adds some unnecessary overhead.

@nalimilan
Copy link
Member

One thing I found out is that recode_in wasn't type stable with the dispatch on Set if it is defined as y in set.

What do you mean? It always returns a Bool AFAICT.

I also added some dispatches for identical types, as we then don't have to loop (the isequal check should always be enough) and it also allocated for some types (such as Char).

Unfortunately I don't think this is correct in general I think, for example Any[1] in Any[[1]]. Why does the current code allocate for Char?

@tiemvanderdeure
Copy link
Author

Honestly I'm just going off trial-and-error here, I don't understand the compiler well enough to make sense of this. Some of these results I'm getting seem inconsistent so there might be some heuristics at play that I'm not so familiar with (the compiler giving up when Tuples become bigger than a certain number or something).

There are other things I don't understand like findfirst being much faster than the for loop.

Any[1] in Any[[1]]

This won't be a problem currently because Any[[1]] will be converted to a Set. I can't think of any cases where this would be a problem and the only test that fails is the one that explicitly tests that recode_in(1,1) == true. That said I'm sure there must be a better solution out there, it's just been a surprisingly though nut to crack and this is just the solution I found.

To reproduce the issues that I was running into:

using CategoricalArrays
A = (rand('a':'z', 1000, 1000))
recode(A, nothing, 'e' => 3, ['a', 'b', 'c'] => 1, ['x', 'y'] => 2)
@profview recode(A, nothing, 'e' => 3, ['a', 'b', 'c'] => 1, ['x', 'y'] => 2)

With @inline recode_in(x, set::Set) = x in set and/or without @inline recode_in(::T, ::T) where T = false, this gives a flame graph like this:
billede

With both of those changes it's completely blue.

@nalimilan
Copy link
Member

Actually after asking on Slack this is a known issue (probably a combination of JuliaLang/julia#32834 and JuliaLang/julia#54390). This is due to to use of varargs, which are inferred but not fully specialized on.

Regarding the issue with recode_in (which is related to JuliaLang/julia#56587), it seems that allocations can be fixed using this pattern:

j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

Could you try this instead of changing the definition of recode_in?

@tiemvanderdeure
Copy link
Author

This works. I wonder if this whole PR can be replaced by just that one @inline

@tiemvanderdeure
Copy link
Author

Okay, it seems that this works, but not for julia 1.0. Do you know how to make it work there? Or maybe it would be okay to require 1.6?

@nalimilan
Copy link
Member

Yes I think at this point we can require Julia 1.6.

@tiemvanderdeure
Copy link
Author

This use of @inline requires 1.8, actually. So we need to define a separate function

https://docs.julialang.org/en/v1/base/base/#Base.@inline

@tiemvanderdeure
Copy link
Author

So I though that this

@inline findfirstrecode(x, recode_from) = findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)

was going to be the same as

j = @inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

But apparently the former allocates whereas the latter doesn't. But using @inline as in the latter requires julia 1.8. Would requiring julia 1.8 be okay with you @nalimilan ?

@nalimilan
Copy link
Member

Sorry for the delay. I'm not too happy about requiring Julia 1.8, but you can use Compat.jl >= 3.37 for that.

@nalimilan
Copy link
Member

Are you still interested in finishing this?

@tiemvanderdeure
Copy link
Author

Sorry, it slipped through with all of the things to finish before the holidays. But yes I'll finish this!

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's close. So this is enough to fix performance in all problematic cases?

@@ -1,3 +1,4 @@
import Compat
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this in src/CategoricalArrays.jl with other packages imports.

dest[i] = p.second
@goto nextitem
end
j = Compat.@inline findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
j = Compat.@inline findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
# we use isequal and recode_in because we cannot really
# distinguish scalars from collections
j = Compat.@inline findfirst(y -> isequal(x, y) || recode_in(x, y), recode_from)

j = Compat.@inline findfirst(y -> isequal(x, y) || recode_in(x,y), recode_from)
if j !== nothing
dest[i] = recode_to[j]
@goto nextitem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT you can drop this @goto now and just to elseif ismissing(x), etc.

end

opt_pairs = map(optimize_pair, pairs)
function _recode!(dest::CategoricalArray{T, <:Any, R}, src::AbstractArray, default::Any, pairs) where {T, R}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function _recode!(dest::CategoricalArray{T, <:Any, R}, src::AbstractArray, default::Any, pairs) where {T, R}
function _recode!(dest::CategoricalArray{T, <:Any, R}, src::AbstractArray, default::Any,
pairs::NTuple{<:Any, Pair}) where {T, R}

Comment on lines +108 to +109
vals = convert.(T, recode_to)
vals = default === nothing ? vals : (vals..., default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change really make a difference for performance? It would seem it matters more to do it for pairmap. It would also be worth checking that performance doesn't degrade when the tuple gets long, e.g. with 10 pairs.

Also if you do this the same change should be applied to other methods.

In terms of implementation, for type stability, don't change the type of vals. Also, default needs to be converted too.

Suggested change
vals = convert.(T, recode_to)
vals = default === nothing ? vals : (vals..., default)
if default === nothing
vals = convert.(T, recode_to)
else
vals = convert.(T, (recode_to..., default))
end

pairmap = [ismissing(v) ? 0 : get(dest.pool, v) for v in vals]
defaultref = default === nothing || ismissing(default) ? 0 : get(dest.pool, default)
pairmap = [ismissing(v) ? zero(R) : get(dest.pool, v) for v in vals]
defaultref = default === nothing ? nothing : ismissing(default) ? 0 : get(dest.pool, default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of this change? It introduces a type instability, even it's a small type union.

Comment on lines +171 to +172
function _recode!(dest::CategoricalArray{T, N, R}, src::CategoricalArray,
default::Any, pairs::Tuple) where {T, N, R<:Integer}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function _recode!(dest::CategoricalArray{T, N, R}, src::CategoricalArray,
default::Any, pairs::Tuple) where {T, N, R<:Integer}
function _recode!(dest::CategoricalArray{T, N, R}, src::CategoricalArray,
default::Any, pairs::NTuple{<:Any, Pair}) where {T, N, R<:Integer}

@@ -233,7 +228,7 @@ function recode!(dest::CategoricalArray{T, N, R}, src::CategoricalArray,
@inbounds for (i, l) in enumerate(srclevels)
for j in 1:length(pairs)
p = pairs[j]
if l ≅ p.first || recode_in(l, p.first)
if l ≅ p.first ||recode_in(l, p.first)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if l p.first ||recode_in(l, p.first)
if l p.first || recode_in(l, p.first)

end

function _recode!(dest::CategoricalArray{T, N, R}, src::CategoricalArray,
default::Any, pairs::Tuple) where {T, N, R<:Integer}
vals = T[p.second for p in pairs]
if default === nothing
srclevels = levels(src)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below there's a firsts variable which should be renamed to recode_from and moved to the top of the function for consistency.

_recode!(dest, src, default, opt_pairs)
end

function _recode!(dest::AbstractArray{T}, src::AbstractArray, default, pairs) where {T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity (another solution would be to use varargs/splatting again):

Suggested change
function _recode!(dest::AbstractArray{T}, src::AbstractArray, default, pairs) where {T}
function _recode!(dest::AbstractArray{T}, src::AbstractArray, default,
pairs::NTuple{<:Any, Pair}) where {T}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants