Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel version of AttractorsViaRecurrences #1

Open
awage opened this issue Sep 28, 2022 · 13 comments
Open

Parallel version of AttractorsViaRecurrences #1

awage opened this issue Sep 28, 2022 · 13 comments
Labels
hard performance numeric performance and optimizing it

Comments

@awage
Copy link
Contributor

awage commented Sep 28, 2022

One of the main drawback of AttractorsViaRecurrences to map initial conditions to attractors is that it is sequential. The mapper takes one initial condition at a time and map it to an attractor.

I have an idea to get a partial response to the problem for systems in high dimension. When computing basins fractions or basins of attractions, a large number of samples is necessary. So the idea is the following:

  • The system caches N trajectory computed in parallel and integrates from Ttr to Tmax.
  • The mapper process each trajectory sequentially using the cached points until it finds the attractor or the basin. If the mapper reaches Tmax and needs more points, an integrator takes on and feeds the mapper with new points.
  • Once all the trajectories in the cache have been processed, we start the process again.

Benefit: Speed!
Drawback: We compute all the trajectory but the mapper may discard a lots of computed points if the trajectory reach an attractor before Tmax.

I think it would be useful for very large system. I don't think it will be too much additional code, maybe two or three new functions.

@Datseris
Copy link
Member

I am not even sure if this is "guaranteed" to lead to better performance. The parallelization of evolving trajectories doesn't scale as you'd think, i.e., that it does as many trajectories as available cores. IT scales much worse. And we add the allocation step of saving all these trajectories. Worst part is, you'd have to choose some Tmax, and if it is not enough, the algorithm goes back to the existing serial model. So you anyways don't get the parallelization benefit for long. ANd if you choose Tmax ultra large, you may save too many points, and even much more than you need.

I'd see only a working implementation can be proof of whether this is indeed faster or not.

@awage
Copy link
Contributor Author

awage commented Sep 29, 2022

Ok, forget this option for the moment.

I have another solution but it is more risky.

We can launch instances of the mapper with a copy of the sparse matrix. There is a data race condition to manage. The critical point is when a new attractor has been found. You need to do the following:

  • lock the thread and kill other instances.
  • Register the points attractor and write in the cell of the "master" sparse matrix.
  • Launch the killed processes from the beginning with the refreshed sparse mat.

You have the problem of memory allocation each time you launch a new instance of the mapper. Also everything related to synchronization of datas and multithreading can be a real pain in the neck. There are lockers available in the library but seems it is still an ongoing with the @atomic macro.

@awage
Copy link
Contributor Author

awage commented Sep 29, 2022

OK, forget about the threads. I have a better idea.

We can launch several mappers with different memory space in parallel. Obviously they will detect the same attractors several times. BUT!! You made a nice program that match the attractors from different sets. So we can leverage this program to bring all the pieces together.

So we can launch all the mapper in parallel without the data race problem. All we need to do is to match the attractors and the labels afterward and you have made the tools to do this.

@Datseris
Copy link
Member

Yes, sounds good. Once again, we still need to test how much speedup this actually gives, which isn't clear! I guess if you have access to 20 or 50 cores the speedup will be huge.

(Also, we have a long list of TODOs for the paper. Personally, I have no idea what to prioritize. Getting an even faster algorithm is definitely cool, but publishing it is cool as well :D :D :D )

@Datseris
Copy link
Member

Actually, @awage I'm not sure. THe bottleneck of the algorithm is finding the attractors, not converging to existing ones. Seems to me here with the last proposal we repeat the most expensive step of the algorithm...?

@awage
Copy link
Contributor Author

awage commented Sep 29, 2022

I am certain it will be a benefit for huge systems like the coupled networks. The integration is very slow and even the convergence to a known attractor is slow. To give you a figure: the basins of attraction of the coupled Kuramoto oscillators with N=120 nodes took roughly 20h for a grid of 300x300. So if you divide this time by 20 its quite interesting.

It is not a high priority since it does not bring everything new to the method. I will try to make the proof of concept code working.

@Datseris
Copy link
Member

Yeah but with the latest version, you don't even need to write much more new code. You can use pmap instead. Use pmap to call basins_fractions several times. each time, with a pre-determined set of initial conditions (sampled randomly). Then, you merge them all together by counting all the same attractor labels that basins_fractions returns. Right? This doesn't seem like it woulr require you to alter the source code. If you create many attractor mappers, you create many integrators, and you create everything anyways from scratch. So you might as well use Julia's parallelization tools like pmap and just call basins_fractions many times in parallel.

@awage
Copy link
Contributor Author

awage commented Sep 30, 2022

Exactly!! I have a basic benchmark (without the attractor matching part):
basics_stats

Blue point are with 1 thread and the orange points with 4 threads. There is gain of roughly 2. It means that there is a memory bottleneck somewhere.

The code is very crude also:

        mapper_array = Array{typeof(mapper)}(undef, nth)
        for k in 1:nth 
            mapper_array[k] = deepcopy(mapper)
        end

	ics = []
	Threads.@threads for k in 1:Nt
		u = vec([pi.*rand(N) (rand(N) .- 0.5).*12])
		l = mapper_array[Threads.threadid()](u)		
		push!(ics, (u,l))
	end

@Datseris
Copy link
Member

It means that there is a memory bottleneck somewhere.

No, that's normal, to get x2 in case you were expecting x4. The speedup isn't the number of threads, I've never seen such a speedup.

@Datseris
Copy link
Member

Notice that this method introduces yet another ambiguity: how to match attractors? It could be that a thread finds an attractor that wasn't found in other threads. How to merge them?

@awage
Copy link
Contributor Author

awage commented Sep 30, 2022

how to match attractors?

That's the big problem.

@awage
Copy link
Contributor Author

awage commented Sep 30, 2022

Sorry if I keep bashing on the same subject. The continuation method can be parallelized "easily" since there is no data race condition for two mappers with different parameters. It is something to explore.

@Datseris
Copy link
Member

The continuation relies on seeding from the previous attractors for speedup. The parallel version would not use the seeding and just start processes at each parameter and then merge all parameters together with a final iterative call to match_attractor_ids!. Good thing here is that the exact same matching would happen, given the same instructions for mapping. I think this is the best place to look for parallelization, making a 2nd version for the continuation.

@Datseris Datseris transferred this issue from JuliaDynamics/ChaosTools.jl Oct 19, 2022
@Datseris Datseris added performance numeric performance and optimizing it hard labels Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hard performance numeric performance and optimizing it
Projects
None yet
Development

No branches or pull requests

2 participants