Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eliminate calls to getColumn() and putColumn() in GMRES solver #73

Open
alecjohnson opened this issue Aug 18, 2014 · 1 comment
Open

Comments

@alecjohnson
Copy link
Contributor

In GMRES(), a matrix V is defined with dimensions xkrylovlen by (m+1), where
m is the number of iterations per restart and is chosen in the code to be 20.
The columns of this matrix are accessed by calls to getColumn() and putColumn(), which gather/scatter the data from/to a vector. This is likely to be expensive, since the large size of the Krylov vectors means that these accesses will usually be cache misses; such scatter/gather can also be a computational bottleneck when a vector unit is available. Transposing V would eliminate the need for any copying of data. We can then make v and w simple pointers:

  // don't need this any more:
  //double *v = new double[xkrylovlen];
  //double *w = new double[xkrylovlen];
  double *v,*w;

and change the getColumn() calls to set the pointer:

      // don't need this any more:
      //getColumn(v, V, k, xkrylovlen);
      v = V[k]; // use this instead

and simply delete the putColumn() calls.

This change also allows us to vectorize the use of V to modify the Krylov vector at the end of the loop:

    // the new code
    //
    for (register int j = 0; j < k; j++)
    {
      const double yj = y[j];
      double* Vj = V[j];
      // this will vectorize nicely
      for (int i = 0; i < xkrylovlen; i++)
        xkrylov[i] += yj * Vj[i];
    }

    // this was the old code.
    //
    //for (int jj = 0; jj < xkrylovlen; jj++) {
    //  tmp = 0.0;
    //  for (register int l = 0; l < k; l++)
    //    tmp += y[l] * V[l][jj];
    //  xkrylov[jj] += tmp;
    //}
alecjohnson added a commit to alecjohnson/iPic3D that referenced this issue Aug 18, 2014
@alecjohnson
Copy link
Contributor Author

On my laptop this accelerated the computation part of the field solve by about 14 percent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant