Skip to content

Better parallelization algorithm for gridlink #240

Open
@lgarrison

Description

@lgarrison

Gridlink currently uses a simple parallel scheme that does two passes over all particles: one to compute cell indices, and another to write the particles into cells. Each thread is actually still reading all particles in the second pass and is just treating those in its domain to avoid race conditions.

We can do better! With a fast, parallel sort, each thread will only have to read particles in its cell domain, and the writes will be contiguous, rather than semi-random, like they are now.

The current method is still pretty fast (see timings in #239), so the challenge is writing a fast enough sort. A parallel radix sort might be the trick. We likely want this to be an argsort, since the particles can be "heavy" (12 or 24 bytes) and the indices are relatively light (8 or maybe 4 bytes if we economize). Worse, the movement of the weights requires a loop over a number of weights only known at runtime, which could bog down any swaps.

We'll want to port any such algorithm to the mock grindlink(s); the current parallel scheme is only for the theory gridlink.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions