0003 "standev_phoronix_test_suite" #4

ghost · 2019-04-15T03:14:59Z

Generally unreviewed, may be typos.

seantalts · 2019-05-05T17:10:37Z

Hey, thanks for posting! Forgot about this, sorry. So is the proposal here whether to write a test suite that integrates with phoronix? And the motivation that it will then be hosted on their website and people can choose to run the benchmarks on their own personal hardware, which will automatically upload the results?

I think anyone should feel free to do this if they like! This seems like the kind of thing where someone can just build it and if it's useful it will be used, you know? Or was this proposal to replace or change some existing system or design?

ghost · 2019-05-05T17:58:21Z

No problem. What happens (on Linux anyways) is I can define a simple shell script with the commands and the test-suite runs a repeated trial over the calls for whatever metric we want to capture. I had a rudimentary test of this working on a couple CmdStan tests in about 30 minutes. It was trivial to link my account and send the formatted information to a aggregated online repository.

The idea arose from a Discourse posting about awareness surrounding the differences in timing for different compiler and operating system setups. I see that kind of information posted all the time using this software: https://www.phoronix.com/scan.php?page=article&item=windows10-linux-browsers&num=2

I think that the architecture definitely supports quick build and test scenarios. If someone wanted reliable and consistent testing across different builds (something mentioned on Discourse) this would be the way to go. In most scenarios I don't think Phoronix is robust enough to handle all of the routine developer testing for a project like CmdStan, but it could be helpful in certain situations. Perhaps prior to a release or if a small batch of tests produced inconclusive results and more information would improve decision making.

seantalts · 2019-05-05T18:06:03Z

How does it provide the "reliable and consistent testing across different builds?" That does sound good.

ghost · 2019-05-05T18:15:14Z

The test is downloaded, built and run from a developer specified source. Test-suite developers can define which version of the build tools are to be used, or alternatively test to compare different build tools.

Once defined all test-suite users are running the same test regardless of other system variables. Major system specifications which differ are tracked and can be used to demarcate the results.

seantalts · 2019-05-06T02:15:03Z

Who causes the test to be downloaded? Is phoronix basically a repo of benchmarks that users can choose to run on their hardware? And so getting results there would depend on a bunch of users reliably running these, right?

ghost · 2019-05-06T10:39:11Z

Basically yes, there is a source repo for the test code. The user has to make sure they use the same test name as everyone else and select the option to upload their system information.

designs/0003-phoronix_test_suite.md

syclik · 2019-05-21T13:11:20Z

I just read through it. I don't really see this as requiring a design document at this level unless I'm missing something. It looks like this was meant to augment our existing testing. I think it'd be good to discuss the specifics of what testing would be appropriate for phoronix and how reliable it is in providing these results.

I guess I'm thinking should go the other way. We should define what sort of testing we want or require (maybe as a design doc) and then pick things that would help accomplish that? Does that make sense?

To be concrete, we could discuss how this would work for benchmarking Math. If it makes sense, we should be able to include since that would only affects Math; and in a sense, it doesn't actually affect design... this is additional instrumentation around the code base. (To rephrase: I'm all for it! Conditioned on it being low maintenance and reliable.)

bob-carpenter · 2019-05-21T16:00:46Z

What we want is to avoid performance regressions in the math library. How to achieve that within our computing and maintenance budget is up for debate. Even how to test performance given our cross-platform support and probabilistic targets is going to be tricky.

ghost · 2019-05-21T17:47:46Z

If we know when to expect a performance regression there are more efficient developer orientated tools than phoronix to isolate where it is being introduced.

When the developer and review process overlooks some unknowable and a regression is introduced what is the current detection procedure and how might it be improved?

ghost · 2019-05-21T18:04:35Z

A blunt hammer outline might be to have a baseline benchmark on the current versioned release and have it that the developer is responsible to replicate relevant test cases for their PR as part of the review process. So we know we aren't thoroughly testing everything within the scope of a change but we're making sure some bundle of representative models still performs equivalent or relatively well compared to the current release for the end user.

As Bob mentions there is a lot of detail involved and the accuracy requirements and an expanding administrative scope for what may need to be tested cannot be easily defined in advance. Certainly not for me. On the other hand the phoronix online forum is a presently untapped resource as are the software developers or maintainers themselves. I'm not comfortable reaching out on behalf of the project though an anonymous post on the forum could help better outline some of this.

oliver: 0003

41d60c1

ghost changed the title ~~oliver: 0003 "standev_phoronix_test_suite"~~ 0003 "standev_phoronix_test_suite" Apr 15, 2019

seantalts reviewed May 16, 2019

View reviewed changes

designs/0003-phoronix_test_suite.md Outdated Show resolved Hide resolved

seantalts reviewed May 16, 2019

View reviewed changes

designs/0003-phoronix_test_suite.md Show resolved Hide resolved

seantalts reviewed May 16, 2019

View reviewed changes

designs/0003-phoronix_test_suite.md Outdated Show resolved Hide resolved

seantalts reviewed May 16, 2019

View reviewed changes

designs/0003-phoronix_test_suite.md Outdated Show resolved Hide resolved

Update 0003-phoronix_test_suite.md

50b7e88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0003 "standev_phoronix_test_suite" #4

0003 "standev_phoronix_test_suite" #4

Uh oh!

ghost commented Apr 15, 2019

Uh oh!

seantalts commented May 5, 2019

Uh oh!

ghost commented May 5, 2019 •

edited by ghost

Loading

Uh oh!

seantalts commented May 5, 2019

Uh oh!

ghost commented May 5, 2019

Uh oh!

seantalts commented May 6, 2019

Uh oh!

ghost commented May 6, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

syclik commented May 21, 2019

Uh oh!

bob-carpenter commented May 21, 2019 via email

Uh oh!

ghost commented May 21, 2019

Uh oh!

ghost commented May 21, 2019

Uh oh!

Uh oh!

0003 "standev_phoronix_test_suite" #4

Are you sure you want to change the base?

0003 "standev_phoronix_test_suite" #4

Uh oh!

Conversation

ghost commented Apr 15, 2019

Uh oh!

seantalts commented May 5, 2019

Uh oh!

ghost commented May 5, 2019 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seantalts commented May 5, 2019

Uh oh!

ghost commented May 5, 2019

Uh oh!

seantalts commented May 6, 2019

Uh oh!

ghost commented May 6, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

syclik commented May 21, 2019

Uh oh!

bob-carpenter commented May 21, 2019 via email

Uh oh!

ghost commented May 21, 2019

Uh oh!

ghost commented May 21, 2019

Uh oh!

Uh oh!

ghost commented May 5, 2019 •

edited by ghost

Loading