-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: thread-safe test suites #353
Conversation
this replaces the global singleton NodeCounter with a new instance per DSA execution. this allows multiple DSA runs to run in parallel. the NodeCounter is constructed by the DataStructureAnalysis class, then passed through an implicit "using" parameter to the classes which require it.
LockManager now uses a single lock around the map and this is shared for reads and writes. this will increase lock contention. however, the locked duration will be extremely small since it only wraps a single hashmap operation. this should have no discernible performance impact.
Clearly something is wrong with our test system if we're having to wrap test cases in synchronisation code. This is good though at lest to allow the tests to run a bit faster, and also shake out some of the thread-unsafety within Basil. Obviously the 'correct' approach here is to have actual sandboxing like dune's test runner, but we can look at later. The only real data contention in the test suites should be the boogie output file which is the same for each test case in every systemtest suite. Ideally we generate a temporary file unique to that test run, and then promote the temporary if the test passes (but you would still want a distinct expected file for each suite which we don't currently have). The test summary should be inherently synchronous within a single suite, and the 'summarytest' can be a test teardown function rather than another test. |
I agree that this is symptomatic of bigger problems. Re more robust test sandboxing, that would be nice to have but I think it can be lower urgency. At the moment, the remaining bottleneck for test runtime is Z3 which is itself multi-threaded. I found that increasing the scalatest parallelism beyond 2 actually l increases total runtime. |
Yeah, I guess that's a fundamental constraint. Hopefully we can eventually reduce the number of combinations of flags we are testing with the SystemTests suite, and rely more on unit tests for the individual components. For example the I wonder whether its worth adding a 'slow' tag for larger aggregate tests or those less important combinations of flags that we may want to just run locally? Also I gather the nonsense with mill |
More tags could definitely be helpful. The tags in #350 partition the test suites, but there's no need that they do that. We can and should add more tags for any interesting features. The ./mill nonsense is definitely needed for the |
It doesn't look like I consistently get two instances of boogie running, so it might help to shuffle the test case order to reduce contention. diff --git a/src/test/scala/SystemTests.scala b/src/test/scala/SystemTests.scala
index 74c7a3698..2a1d962c0 100644
--- a/src/test/scala/SystemTests.scala
+++ b/src/test/scala/SystemTests.scala
@@ -52,7 +52,7 @@ trait SystemTests extends AnyFunSuite, BASILTest {
def runTests(folder: String, conf: TestConfig): Unit = {
val path = testPath + folder
- val programs = getSubdirectories(path)
+ val programs = scala.util.Random.shuffle(getSubdirectories(path))
// get all variations of each program
val testSuffix = if conf.useBAPFrontend then ":BAP" else ":GTIRB" |
I agree that a more straightforward and efficient approach to solve this problem would be to just set up the tests so that they are using different files for each test suite, which would avoid the need for locks entirely. |
honestly, i think this might be annoying, especially if you try to run the tests locally to reproduce an issue and the order keeps changing.
this should avoid clashes where test suites try to write to the same output file. SystemTestsBAP and SystemTestsGTIRB are overriden to have no suffix, to maintain compatibility with the update-expected sbt/mill jobs.
This reverts commit f4b3a53.
I've made the system tests use different output file names per suite 8ed2980. Double-checking this commit would be appreciated, especially the behaviour with updateExpected. |
It seems like it's working correctly with updateExpected, in that it only creates .expected files for the default SystemTests, not any of the other variations with extra flags, which is the only thing we were using the .expected files for. This should mean that the locking isn't needed - no tests write to the .expected files, only the default SystemTests read from them, so now that the different suites write to their own output files there shouldn't be any conflicts. |
Thanks! In the same commit, I'd also removed the locking code from SystemTests - this is a little buried in the other changes, sorry! |
Conflicts: src/test/scala/SystemTests.scala
-T is used to re-sort the test cases into the original order before printing them in batches for each suite to avoid interleaving individual cases from multiple suites running in parallel.
7e948ad
to
2f19513
Compare
This makes steps towards making test suites thread-safe. That is, multiple suites can be executed in parallel, but test cases within the same suite are still run sequentially.
This is done with two changes:
Test cases within SystemTests are changed to acquire a lock based on the source file name. This prevents interference where multiple tests attempt to read or write from the same expected files.SystemTests are changed so their Boogie and text file outputs have different names per suite, allowing suites to run in parallel.
Testing
These changes are somewhat difficult to test.
The first can be tested by modifying the end of LockManager's withLock function to be:Then, run all test cases which reference a single source file (in parallel with-P
):./mill test.runMain org.scalatest.tools.Runner -R out/test/compile.dest/classes/ -oI -P -z 'correct/basic_function_call_caller/clang_O2:GTIRB'
You should see sequences ofin
out
in
out
lines, demonstrating mutual exclusion. If you remove thesynchronized
call, these lines should now be interleaved likein
in
out
out
. Maybe this can be made into a unit test.The second (DataStructureAnalysis) change is exercised by running all the DSA tests in parallel and observing that they pass.
todo: