Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix ChainedVector constructor performance #103

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JoaoAparicio
Copy link

Currently the ChainedVector constructor has bad performance if the args have empty entries.

import Random
struct MyType
    x::Float64
    y::Float64
end

x1 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(1:6)] for _ in 1:1_000_000];
@time ChainedVector(x1);  # 0.017 secs

x2 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
@time ChainedVector(x2);  # 24 secs

Note that the difference comes only from the fact that x2 might have empty entries whereas x1 doesn't.

The problem is that the function cleanup! has runtime proportional to the number of elements times the number of elements which are empty vectors.

function cleanup_original!(arrays, inds)
    @assert length(arrays) == length(inds)
    for i = length(arrays):-1:1
        if !isassigned(arrays, i) || length(arrays[i]) == 0
            deleteat!(arrays, i)
            deleteat!(inds, i)
        end
    end
    return
end

function cleanup_new!(arrays, inds)
    @assert length(arrays) == length(inds)
    mask_it = Base.Iterators.filter(i->isassigned(arrays, i) && length(arrays[i])>0, 1:length(arrays))
    n = 0
    for k in mask_it
        n += 1
        k > n || continue
        arrays[n] = arrays[k]
        inds[n] = inds[k]
    end
    resize!(arrays, n)
    resize!(inds, n)
    return
end

We can check that the new version is faster and returns the same result

x = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
arrays = x;
inds = Vector{Int}(undef, length(arrays));

x1 = copy(x);
inds1 = copy(inds);
x2 = copy(x);
inds2 = copy(inds);

@time cleanup_original!(x1, inds1)  # 20 secs
@time cleanup_new!(x2, inds2)  # 0.04 secs

all(x1 .== x2)  # true
all(inds1 .== inds2)  # true

Currently the ChainedVector constructor has bad performance if the args have empty entries.

```
import Random
struct MyType
    x::Float64
    y::Float64
end

x1 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(1:6)] for _ in 1:1_000_000];
@time ChainedVector(x1);  # 0.017 secs

x2 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
@time ChainedVector(x2);  # 24 secs
```

Note that the difference comes only from the fact that x2 might have empty entries
whereas x1 doesn't.

The problem is that the function cleanup! has runtime proportional to the number of
elements times the number of elements which are empty vectors.

```
function cleanup_original!(arrays, inds)
    @Assert length(arrays) == length(inds)
    for i = length(arrays):-1:1
        if !isassigned(arrays, i) || length(arrays[i]) == 0
            deleteat!(arrays, i)
            deleteat!(inds, i)
        end
    end
    return
end

function cleanup_new!(arrays, inds)
    @Assert length(arrays) == length(inds)
    mask_it = Base.Iterators.filter(i->isassigned(arrays, i) && length(arrays[i])>0, 1:length(arrays))
    n = 0
    for k in mask_it
        n += 1
        k > n || continue
        arrays[n] = arrays[k]
        inds[n] = inds[k]
    end
    resize!(arrays, n)
    resize!(inds, n)
    return
end
```

We can check that the new version is faster and returns the same result

```
x = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
arrays = x;
inds = Vector{Int}(undef, length(arrays));

x1 = copy(x);
inds1 = copy(inds);
x2 = copy(x);
inds2 = copy(inds);

@time cleanup_original!(x1, inds1)  # 20 secs
@time cleanup_new!(x2, inds2)  # 0.04 secs

all(x1 .== x2)  # true
all(inds1 .== inds2)  # true
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant