Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplemented particles to use AoS with block communication #72

Open
alecjohnson opened this issue Jun 10, 2014 · 0 comments
Open

Reimplemented particles to use AoS with block communication #72

alecjohnson opened this issue Jun 10, 2014 · 0 comments

Comments

@alecjohnson
Copy link
Contributor

I have restructured the particles class. Changes include:
  • AoS is now the canonical way to represent particles. Particle arrays are always allocated with 64-byte-aligned memory so that every particle (8 doubles, i.e. 64 bytes) occupies a 64-byte cache line.
  • There is no longer any imposed limit on the number of particles that can be communicated. Particles are stored in Larray buffers (i.e. std::vector) and are added with calls like pcl_buffer.push_back(pcl), which automatically reallocates the buffer as necessary.
  • I use processor-independent application of boundary conditions to ensure that particle communication completes within 2*(XLEN+YLEN+ZLEN) iterations of communication.
  • I enforce boundary conditions by calling virtual methods such as apply_Xrght_BC(), which takes a list of particles that need the boundary conditions for the right edge of the domain to be applied. It is ultimately intended that the user will inherit from the particle solver and override this method when appropriate in order to implement user-defined boundary conditions.
  • I implemented BCs via MPI self-communication. This is a coding shortcut that could be eliminated if this turns out to be a problem. Note that for the GEM problem, to avoid lots of communication in the periodic z direction and accelerate convergence of the field solver, you should make Lz large (e.g. the same as Lx and Ly).

In the process, I also did the following:

  • I created pclIDgenerator class for particle IDs.
  • I used double precision rather than long long to represent particle IDs.
  • I implemented support for nxc/XLEN to be noninteger. (This has not yet been tested.)
  • I implemented a fast 8x8 transpose for the MIC and used it to convert between AoS and SoA pcls. This could be extended to Xeon by implementing the same method with AVX 256-bit intrinsics instead of MIC 512-bit instrinsics.

Internal changes to the code include:

  • I eliminated the distinction between processor topologies of fields and particles. This distinction was never properly made. If we do this, we should first separate the particle and field solvers.
  • I consolidated random sampling code so that there is a single point in the code (ipicmath.h) that samples from a Maxwellian distribution or unit interval.
  • Particles3D::particle_repopulator() is now much more efficient. Instead of traversing the list of particles six times, deleting and repopulating particles with each pass, the list is now traversed once to delete particles and repopulated particles are then created and added at the end of the list.
alecjohnson added a commit to alecjohnson/iPic3D that referenced this issue Jun 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant