Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0003 "standev_phoronix_test_suite" #4

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

0003 "standev_phoronix_test_suite" #4

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 15, 2019

Generally unreviewed, may be typos.

@ghost ghost changed the title oliver: 0003 "standev_phoronix_test_suite" 0003 "standev_phoronix_test_suite" Apr 15, 2019
@seantalts
Copy link
Member

Hey, thanks for posting! Forgot about this, sorry. So is the proposal here whether to write a test suite that integrates with phoronix? And the motivation that it will then be hosted on their website and people can choose to run the benchmarks on their own personal hardware, which will automatically upload the results?

I think anyone should feel free to do this if they like! This seems like the kind of thing where someone can just build it and if it's useful it will be used, you know? Or was this proposal to replace or change some existing system or design?

@ghost
Copy link
Author

ghost commented May 5, 2019

No problem. What happens (on Linux anyways) is I can define a simple shell script with the commands and the test-suite runs a repeated trial over the calls for whatever metric we want to capture. I had a rudimentary test of this working on a couple CmdStan tests in about 30 minutes. It was trivial to link my account and send the formatted information to a aggregated online repository.

The idea arose from a Discourse posting about awareness surrounding the differences in timing for different compiler and operating system setups. I see that kind of information posted all the time using this software: https://www.phoronix.com/scan.php?page=article&item=windows10-linux-browsers&num=2

I think that the architecture definitely supports quick build and test scenarios. If someone wanted reliable and consistent testing across different builds (something mentioned on Discourse) this would be the way to go. In most scenarios I don't think Phoronix is robust enough to handle all of the routine developer testing for a project like CmdStan, but it could be helpful in certain situations. Perhaps prior to a release or if a small batch of tests produced inconclusive results and more information would improve decision making.

@seantalts
Copy link
Member

How does it provide the "reliable and consistent testing across different builds?" That does sound good.

@ghost
Copy link
Author

ghost commented May 5, 2019

The test is downloaded, built and run from a developer specified source. Test-suite developers can define which version of the build tools are to be used, or alternatively test to compare different build tools.

Once defined all test-suite users are running the same test regardless of other system variables. Major system specifications which differ are tracked and can be used to demarcate the results.

@seantalts
Copy link
Member

Who causes the test to be downloaded? Is phoronix basically a repo of benchmarks that users can choose to run on their hardware? And so getting results there would depend on a bunch of users reliably running these, right?

@ghost
Copy link
Author

ghost commented May 6, 2019

Basically yes, there is a source repo for the test code. The user has to make sure they use the same test name as everyone else and select the option to upload their system information.

@syclik
Copy link
Member

syclik commented May 21, 2019

I just read through it. I don't really see this as requiring a design document at this level unless I'm missing something. It looks like this was meant to augment our existing testing. I think it'd be good to discuss the specifics of what testing would be appropriate for phoronix and how reliable it is in providing these results.

I guess I'm thinking should go the other way. We should define what sort of testing we want or require (maybe as a design doc) and then pick things that would help accomplish that? Does that make sense?

To be concrete, we could discuss how this would work for benchmarking Math. If it makes sense, we should be able to include since that would only affects Math; and in a sense, it doesn't actually affect design... this is additional instrumentation around the code base. (To rephrase: I'm all for it! Conditioned on it being low maintenance and reliable.)

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented May 21, 2019 via email

@ghost
Copy link
Author

ghost commented May 21, 2019

If we know when to expect a performance regression there are more efficient developer orientated tools than phoronix to isolate where it is being introduced.

When the developer and review process overlooks some unknowable and a regression is introduced what is the current detection procedure and how might it be improved?

@ghost
Copy link
Author

ghost commented May 21, 2019

A blunt hammer outline might be to have a baseline benchmark on the current versioned release and have it that the developer is responsible to replicate relevant test cases for their PR as part of the review process. So we know we aren't thoroughly testing everything within the scope of a change but we're making sure some bundle of representative models still performs equivalent or relatively well compared to the current release for the end user.

As Bob mentions there is a lot of detail involved and the accuracy requirements and an expanding administrative scope for what may need to be tested cannot be easily defined in advance. Certainly not for me. On the other hand the phoronix online forum is a presently untapped resource as are the software developers or maintainers themselves. I'm not comfortable reaching out on behalf of the project though an anonymous post on the forum could help better outline some of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants