Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of "cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications" #11

Open
mandli opened this issue Jun 15, 2014 · 0 comments

Comments

@mandli
Copy link

mandli commented Jun 15, 2014

Review of "cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications"

Reviewer: Kyle Mandli
Department: Institute for Computational Engineering and Science
Institution: University of Texas at Austin
Field: Applied and Computational Mathematics
Country: USA
Article Reviewed: cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

General Evaluation

below doesn't meet standards for academic publication
meets meets or exceeds the standards for academic publication
n/a not applicable

  • Quality of the approach:

    Meets with caveats (below).

  • Quality of the writing:

    Meets

  • Quality of the figures/tables:

    Meets

Specific Evaluation

  • Is the code made publicly available and does the article sufficiently describe how to access it?

    No although at some point it was I think. Googling the code lead to a set of page links that did not seem to point to anything.

  • Does the article present the problem in an appropriate context? Specifically, does it:

    • explain why the problem is important,

      Yes

    • describe in which situations it arises,

      Yes

    • outline relevant previous work,

      Yes and no. With the length of time that's passed between this review and the original submission, I think their is work more relevant today but it would require a large rewrite of this part of the paper.

    • provide background information for non-experts

      Somewhat, there is terminology that it assumed known but it is not egregious.

  • Is the content of the paper accessible to a computational scientist
    with no specific knowledge in the given field?

    Somewhat, it does assume a working knowledge of the problem being addressed and low-level memory management.

  • Does the paper describe a well-formulated scientific or technical
    achievement?

    Yes

  • Are the technical and scientific decisions well-motivated and
    clearly explained?

    Yes

  • Are the code examples (if any) sound, clear, and well-written?

    Yes.

  • Is the paper factual correct?

    To my knowledge yes.

  • Is the language and grammar of sufficient quality?

    A few corrections have been suggested in the marked up PDF.

  • Are the conclusions justified?

    Somewhat. The performance seems encouraging but there are a number of issues (detailed below). I think the most egregious of these is the claim that this approach will work on clusters and super-computers which is definitely not clear to me. Issues such as communication and latency are not addressed at all and would be critical for these setups.

  • Is prior work properly and fully cited?

    Yes

  • Should any part of the article be shortened or expanded? Please explain.

    The performance study is the crux of the article and should be expanded upon with additional testing and explanations. Some of the design explanations could be condensed perhaps to make room for this.

  • In your view, is the paper fit for publication in the conference proceedings?
    Please suggest specific improvements and indicate whether you think the
    article needs a significant rewrite (rather than a minor revision).

    I think my largest qualm with the article as is involves the Performance Study section. Some specific comments:

    • The vector engine setups are never explained (although I think they can inferred)
    • I think that most computational scientists would be pretty hard pressed to call this a strong-scaling experiment (going from 1 to 2 cores).
    • The code for the benchmarks has not been provided
    • The experiments should probably be longer to test out other issues dealing with normal system operations that even ensembles of 3 will not catch with out longer term operation.
      Besides this, the scope of the work seems to be very limited (only to a single-node machine). As mentioned above, claiming this works on a supercomputer seems completely un-supported given the work in the article.

Other Comments/Questions

  • Related Work section could be a lot better with less of a laundry list approach and more of how the approach to the design of cphVB where certain decisions were made due to previous work (for instance).
  • The memory overhead may not be large (as was shown) for copying between CPU cores but what about discrete accelerators? This seems to be a much more difficult question and one that is not compellingly answered or mentioned.
  • Is using and catching segfaults a wise design decision? Addressing this would lead to a much more compelling article. As I read that a number of questions came up including how fragile this is, does it work on all kernels, what about code that calls other libraries, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants