Review of "cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications" #11

mandli · 2014-06-15T21:27:34Z

Review of "cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications"

Reviewer: Kyle Mandli
Department: Institute for Computational Engineering and Science
Institution: University of Texas at Austin
Field: Applied and Computational Mathematics
Country: USA
Article Reviewed: cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

General Evaluation

below doesn't meet standards for academic publication
meets meets or exceeds the standards for academic publication
n/a not applicable

Quality of the approach:

Meets with caveats (below).
Quality of the writing:

Meets
Quality of the figures/tables:

Meets

Specific Evaluation

Is the code made publicly available and does the article sufficiently describe how to access it?

No although at some point it was I think. Googling the code lead to a set of page links that did not seem to point to anything.
Does the article present the problem in an appropriate context? Specifically, does it:
- explain why the problem is important,
  
  Yes
- describe in which situations it arises,
  
  Yes
- outline relevant previous work,
  
  Yes and no. With the length of time that's passed between this review and the original submission, I think their is work more relevant today but it would require a large rewrite of this part of the paper.
- provide background information for non-experts
  
  Somewhat, there is terminology that it assumed known but it is not egregious.
Is the content of the paper accessible to a computational scientist
with no specific knowledge in the given field?

Somewhat, it does assume a working knowledge of the problem being addressed and low-level memory management.
Does the paper describe a well-formulated scientific or technical
achievement?

Yes
Are the technical and scientific decisions well-motivated and
clearly explained?

Yes
Are the code examples (if any) sound, clear, and well-written?

Yes.
Is the paper factual correct?

To my knowledge yes.
Is the language and grammar of sufficient quality?

A few corrections have been suggested in the marked up PDF.
Are the conclusions justified?

Somewhat. The performance seems encouraging but there are a number of issues (detailed below). I think the most egregious of these is the claim that this approach will work on clusters and super-computers which is definitely not clear to me. Issues such as communication and latency are not addressed at all and would be critical for these setups.
Is prior work properly and fully cited?

Yes
Should any part of the article be shortened or expanded? Please explain.

The performance study is the crux of the article and should be expanded upon with additional testing and explanations. Some of the design explanations could be condensed perhaps to make room for this.
In your view, is the paper fit for publication in the conference proceedings?
Please suggest specific improvements and indicate whether you think the
article needs a significant rewrite (rather than a minor revision).
I think my largest qualm with the article as is involves the Performance Study section. Some specific comments:
- The vector engine setups are never explained (although I think they can inferred)
- I think that most computational scientists would be pretty hard pressed to call this a strong-scaling experiment (going from 1 to 2 cores).
- The code for the benchmarks has not been provided
- The experiments should probably be longer to test out other issues dealing with normal system operations that even ensembles of 3 will not catch with out longer term operation.
  Besides this, the scope of the work seems to be very limited (only to a single-node machine). As mentioned above, claiming this works on a supercomputer seems completely un-supported given the work in the article.

Other Comments/Questions

Related Work section could be a lot better with less of a laundry list approach and more of how the approach to the design of cphVB where certain decisions were made due to previous work (for instance).
The memory overhead may not be large (as was shown) for copying between CPU cores but what about discrete accelerators? This seems to be a much more difficult question and one that is not compellingly answered or mentioned.
Is using and catching segfaults a wise design decision? Addressing this would lead to a much more compelling article. As I read that a number of questions came up including how fragile this is, does it work on all kernels, what about code that calls other libraries, etc.

ahmadia added the simon_lund label Jun 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review of "cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications" #11

Review of "cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications" #11

mandli commented Jun 15, 2014