Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating ics on grid for basins of attraction is super slow for higher dimensions #76

Open
KalelR opened this issue Jun 6, 2023 · 17 comments

Comments

@KalelR
Copy link
Contributor

KalelR commented Jun 6, 2023

Hey

I need to calculate basins of attraction for a high-dimensional system (100+), but I noticed that basins_of_attraction is super slow for these cases because of the creation of the initial conditions on a grid. Especifically
A = StateSpaceSet([generate_ic_on_grid(grid, i) for i in vec(I)])

is very slow for higher dimensions (was taking more than an hour to run...).

I think there are two reasons: evaluating the CartesianIndices becomes slow, and also maybe because generate_ic_on_grid allocates StaticVectors.

What would be a good solution for this?

Thanks a lot!

@awage
Copy link
Contributor

awage commented Jun 6, 2023

I wouldn't use basins_of_attraction for this purpose. Use directly the mapper:

lab = zeros(Int32, N)
for i in 1:N
      lab[i] = mapper( ic_generator())
end

The ic_generator can be a function or an array on the region you want to explore. The memory consumption would be restricted to the sparse matrix. For Kuramotos it works well.

A more condensend version: lab = [ mapper(ic_gen()) for i in 1:N]

@KalelR
Copy link
Contributor Author

KalelR commented Jun 6, 2023

Ah, nice idea! But in this case I need to use AttractorsViaFeaturizing since the dimension is so high.

@awage
Copy link
Contributor

awage commented Jun 6, 2023

So the problem is just initial condition caching? Because I assume you have a low dimensional feature space.

@KalelR
Copy link
Contributor Author

KalelR commented Jun 6, 2023

So the problem is just initial condition caching?

Yup. This line
A = StateSpaceSet([generate_ic_on_grid(grid, i) for i in vec(I)])

takes way too long just to create the ics when the dimension of the grid is high (100+).
I could of course just create A how I want for my specific problem, but it would be great if there was a simple and general way to create the ics that also works well in higher dims.

Because I assume you have a low dimensional feature space.

Yup, basins_fractions works great once we have the ics.

@awage
Copy link
Contributor

awage commented Jun 6, 2023

What about a special sampler with an internal queue:

sampler() = generate_ic_on_grid(grid, pop!(I))

I am not sure it is possible, I am just throwing an idea

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

It is because of static arrays.

We need to allow StateSpaceSet to work with normal vectors as well. Someone needs to do a PR in StateSpaceSets.jl.

And once this PR is merged, to allow for a keyword, or some heuristic, to use vectors if dimension is very high.

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

BTW I had this problem elsewhere as well, that StateSpaceSet.jl doesn't work with vectors in ComplexityMeasures.jl.

@KalelR
Copy link
Contributor Author

KalelR commented Jun 6, 2023

It is because of static arrays.

Partly, but indexing the cartesian indices is also very slow. For example this code:

basin_lengths = ones(Int64, 500)
basin_lengths[1:2] .= 500
basins = zeros(Int32, Tuple(basin_lengths));
I = CartesianIndices(basins);
[i for i in vec(I)]

just hangs on my computer when I try it.

So I'm not sure if

sampler() = generate_ic_on_grid(grid, pop!(I))

would do it, unfortunately.

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

but indexing the cartesian indices is also very slow

This cannot be possible, because indexing the cartesian indices is as fast as indexing a linear range 1:N with N the total number of points. You are not measuring the time it takes to index the cartesian indices, you are measuring the time it takes to (1) transform a cartesian index to a vector and (2) collect all those vectors to another vector.

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

Also, your code gives me infinite stack overflow messages well before I go into the cartesian indices :D already from basins = .

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] =
Internal error: stack overflow in type inference of unsafe_view(Array{Int32, 500}, Base.UnitRange{Int64}, Base.UnitRange{Int64}, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, I
...
, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64, Int64).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of (::Type{Base.SubArray{T, N, P, I, L} where L where I where P where N where T})(Base.IndexCartesian, Array{Int32, 500}, Tuple{Base.UnitRange{Int64}, Base.UnitRange{Int64}, Vararg{Int64, 498}}, Tuple{Bool, Bool}).

Internal error: stack overflow in type inference of print_matrix(Base.IOContext{Base.TTY}, Base.SubArray{Int32, 2, Array{Int32, 500}, Tuple{Base.UnitRange{Int64}, Base.UnitRange{Int64}, Vararg{Int64, 498}}, false}).
This might be caused by recursion over very long tuples or argument lists.
Internal error: stack overflow in type inference of isassigned(Base.SubArray{Int32, 2, Array{Int32, 500}, Tuple{Base.UnitRange{Int64}, Base.UnitRange{Int64}, Vararg{Int64, 498}}, false}, Int64, Int64).
This might be caused by recursion over very long tuples or argument lists.

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

Ah no sorry this is because of my printing hack :D

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

Okay, now I finally understand the problem. "Indexing the cartesian indices" means to obtain the indices themselves. And yes, indeed, simply doing I[1] is exceptionally slow.

julia> using BenchmarkTools

julia> @btime I[1];
  19.697 ms (252812 allocations: 7.85 MiB)

@awage
Copy link
Contributor

awage commented Jun 6, 2023

Here you are creating a monster array: [i for i in vec(I)]

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

@KalelR you should open an issue at the main Julia repo that Cartesian Index access becomes slow and allocates in high dimensional arrays. This is too deeply internal of Julia for us to do anything.

@Datseris
Copy link
Member

Datseris commented Jun 6, 2023

Actually, I don't think anyone can do anything. The same limitation that comes from regular tuples goes into Cartesian Indices as well. Tuples are limited to 32 "dimensions" to be fast if I recall correctly. We would need to use Vectors to index things above some dimension.

@KalelR
Copy link
Contributor Author

KalelR commented Jun 16, 2023

Indeed, I haven't found a way to fix it using CartesianIndices, but Vectors do work great. Since typically we are working on a 2D projection of the basins, there are only two important dimensions, and the ics on others (which could be a lot) are fixed anyway. So a solution that's working for me now is to use CartesianIndices only for the dimensions have length > 1 (corresponding to the dimensions of the ics that are varying), and then build the grid from those indices. Then I put the grid of varying ics together with the fixed ics to construct the vector of the full ics.

It works great for my use-case, but I would like to test it further in other cases, I just don't have time at the moment. I'll put the code here anyway in case anyone is interested:

function generate_ics_on_grid(grid::NTuple{B, T}) where {B, T}
    grid_lengths = length.(grid)
    idxs_varying_dims = findall(len->len>1, grid_lengths)
    reduced_grid = grid[idxs_varying_dims]
    I_reduced = CartesianIndices(length.(reduced_grid))
    A_reduced = [generate_ic_on_grid(reduced_grid, i) for i in vec(I_reduced)]
    
    ics_fixed = [grid_dim[1] for grid_dim in grid]
    A = _expand_A(A_reduced, ics_fixed, idxs_varying_dims)
    return Dataset(A)
end

function _expand_A(vec_reduced, ic_fixed, idxs_varying_dims) 
    vec = [deepcopy(ic_fixed) for _ in vec_reduced]
    for i in eachindex(vec)
        vec[i][idxs_varying_dims] .= vec_reduced[i]
    end
    return vec 
end

Oh, and using regular vectors when the dimension gets beyond a value (in my case, 30) also helps:

function generate_ic_on_grid(grid::NTuple{B, T}, ind) where {B, T}
    mx_dimension_sparse = 30
    if B < mx_dimension_sparse 
        return _generate_ic_on_grid(grid, ind)
    else
        return _generate_ic_on_grid_vec(grid, ind)
    end 
end

function _generate_ic_on_grid_vec(grid::NTuple{B, T}, ind) where {B, T}
    gens = [(grid[k][ind[k]]) for k=1:B]
    return gens
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants