-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generate tests using mutation testing #4299
Comments
@dneto0 and @alan-baker for info. |
Thanks for the offer to contribute!
This suggests to me it's appropriate to land such tests, somewhere in the tree but in a segregated bucket, possibly organized according to which backends were mutated to find these separation tests. I'd advise that each test come with (machine-readable) labeling to indicate which implementation (and version?) was mutated to find the separating bug, and where that mutation occurred. E.g. in this case it might be something like:
Also, if a test was found by mutating Mesa, then put it under a 'mesa' tree, e.g.
I don't know if it's preferable to have a subdirectory for mutation tests Tagging @kainino0x for further advice. |
Great! I attach a trio of examples (including the one above) so that you can see the degree of similarity / difference between a couple tests. These examples do not feature input buffers, but other tests may. On your point about organisation according to the mutation subject - I agree that keeping track of this information is useful and that maintainers are more likely to be interested in tests created in response to their own implementation. But there is likely also value in encouraging the running of these tests across all implementations (perhaps only occasionally, due to the time budget issue). I understand from @afd that the GraphicsFuzz tests created based on driver-specific coverage turned out to be effective in exposing problems in a range of drivers. So I wonder if there is a way to achieve a balance between harnessing this potential value and CTS running time? |
Hello! @afd and I have been experimenting with a technique to auto-generate tests for the CTS using mutation testing.
In short: we deliberately mutate (i.e. mess with) some part of a WebGPU implementation or downstream driver and run the CTS on the mutated version. If no tests fail,1 then this indicates a gap in the ability of the CTS to fully exercise the implementation. We use a WGSL fuzzer to create a test that does fail when run on the mutated code. Our idea is that adding such tests to the CTS will detect future bugs that creep into that part of the code.
Below is an example to make the idea more concrete. Execution of the shader covers a part of the Mesa's Lavapipe driver code that the CTS does not currently cover relating to this statement. The test passes when run using Dawn and the unmutated Lavapipe, but fails when the statement of interest is altered. The shaders we generate are all equipped with an expected output buffer value; any deviation from this value is a failure. In this case, the shader should output
1i
. When the mutation is in place, the shader outputs-400i
and the test fails.2We have a couple other initial examples, and a workflow set up to generate a large number (hundreds/thousands) of similar tests that exercise code that is not currently exercised by the CTS. I say exercise rather than cover because in some cases the CTS may cover code but not actually detect a problem when this code is altered.
What do you think? It would be great to get any general thoughts, plus a couple specific questions:
Example
Footnotes
By 'no tests fail' I actually mean 'no previously-passing tests fail', since some tests fail on the current implementations, which I believe is a known issue. ↩
We have a couple detailed questions about the best way to handle different number representations in our expected output buffer, but we can get into that later :) ↩
The text was updated successfully, but these errors were encountered: