Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix ChainedVector constructor performance #103

Merged
merged 1 commit into from
Jul 7, 2024

Commits on Jul 7, 2024

  1. Bugfix ChainedVector constructor performance

    Currently the ChainedVector constructor has bad performance if the args have empty entries.
    
    ```
    import Random
    struct MyType
        x::Float64
        y::Float64
    end
    
    x1 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(1:6)] for _ in 1:1_000_000];
    @time ChainedVector(x1);  # 0.017 secs
    
    x2 = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
    @time ChainedVector(x2);  # 24 secs
    ```
    
    Note that the difference comes only from the fact that x2 might have empty entries
    whereas x1 doesn't.
    
    The problem is that the function cleanup! has runtime proportional to the number of
    elements times the number of elements which are empty vectors.
    
    ```
    function cleanup_original!(arrays, inds)
        @Assert length(arrays) == length(inds)
        for i = length(arrays):-1:1
            if !isassigned(arrays, i) || length(arrays[i]) == 0
                deleteat!(arrays, i)
                deleteat!(inds, i)
            end
        end
        return
    end
    
    function cleanup_new!(arrays, inds)
        @Assert length(arrays) == length(inds)
        mask_it = Base.Iterators.filter(i->isassigned(arrays, i) && length(arrays[i])>0, 1:length(arrays))
        n = 0
        for k in mask_it
            n += 1
            k > n || continue
            arrays[n] = arrays[k]
            inds[n] = inds[k]
        end
        resize!(arrays, n)
        resize!(inds, n)
        return
    end
    ```
    
    We can check that the new version is faster and returns the same result
    
    ```
    x = [ MyType[MyType(rand(), rand()) for _ in 1:Random.rand(0:5)] for _ in 1:1_000_000];
    arrays = x;
    inds = Vector{Int}(undef, length(arrays));
    
    x1 = copy(x);
    inds1 = copy(inds);
    x2 = copy(x);
    inds2 = copy(inds);
    
    @time cleanup_original!(x1, inds1)  # 20 secs
    @time cleanup_new!(x2, inds2)  # 0.04 secs
    
    all(x1 .== x2)  # true
    all(inds1 .== inds2)  # true
    ```
    JoaoAparicio authored and quinnj committed Jul 7, 2024
    Configuration menu
    Copy the full SHA
    ae1085f View commit details
    Browse the repository at this point in the history