Better parallelization algorithm for gridlink

Gridlink currently uses a simple parallel scheme that does two passes over all particles: one to compute cell indices, and another to write the particles into cells.  Each thread is actually still reading all particles in the second pass and is just treating those in its domain to avoid race conditions.

We can do better!  With a fast, parallel sort, each thread will only have to read particles in its cell domain, and the writes will be contiguous, rather than semi-random, like they are now.

The current method is still pretty fast (see timings in #239), so the challenge is writing a fast enough sort.  A parallel radix sort might be the trick.  We likely want this to be an argsort, since the particles can be "heavy" (12 or 24 bytes) and the indices are relatively light (8 or maybe 4 bytes if we economize).  Worse, the movement of the weights requires a loop over a number of weights only known at runtime, which could bog down any swaps.

We'll want to port any such algorithm to the mock grindlink(s); the current parallel scheme is only for the theory gridlink.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better parallelization algorithm for gridlink #240

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Better parallelization algorithm for gridlink #240

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions