tests: thread-safe test suites #353

katrinafyi · 2025-03-08T04:45:02Z

This makes steps towards making test suites thread-safe. That is, multiple suites can be executed in parallel, but test cases within the same suite are still run sequentially.

This is done with two changes:

~~Test cases within SystemTests are changed to acquire a lock based on the source file name. This prevents interference where multiple tests attempt to read or write from the same expected files.~~
SystemTests are changed so their Boogie and text file outputs have different names per suite, allowing suites to run in parallel.
In DataStructureAnalysis, the global NodeCounter is now scoped to a particular instance of a DataStructureAnalysis class.

Testing

These changes are somewhat difficult to test.

~~The first can be tested by modifying the end of LockManager's withLock function to be:~~

lock.synchronized {
  println("in " + key)                                                                                                                                                           
  body
  println("out " + key)
}

~~Then, run all test cases which reference a single source file (in parallel with -P):~~

./mill  test.runMain org.scalatest.tools.Runner -R out/test/compile.dest/classes/ -oI  -P -z 'correct/basic_function_call_caller/clang_O2:GTIRB'

You should see sequences of in out in out lines, demonstrating mutual exclusion. If you remove the synchronized call, these lines should now be interleaved like in in out out. Maybe this can be made into a unit test.

The second (DataStructureAnalysis) change is exercised by running all the DSA tests in parallel and observing that they pass.

todo:

-PS2 -T1000000 for sorting

this replaces the global singleton NodeCounter with a new instance per DSA execution. this allows multiple DSA runs to run in parallel. the NodeCounter is constructed by the DataStructureAnalysis class, then passed through an implicit "using" parameter to the classes which require it.

LockManager now uses a single lock around the map and this is shared for reads and writes. this will increase lock contention. however, the locked duration will be extremely small since it only wraps a single hashmap operation. this should have no discernible performance impact.

ailrst · 2025-03-10T01:32:26Z

Clearly something is wrong with our test system if we're having to wrap test cases in synchronisation code. This is good though at lest to allow the tests to run a bit faster, and also shake out some of the thread-unsafety within Basil.

Obviously the 'correct' approach here is to have actual sandboxing like dune's test runner, but we can look at later. The only real data contention in the test suites should be the boogie output file which is the same for each test case in every systemtest suite. Ideally we generate a temporary file unique to that test run, and then promote the temporary if the test passes (but you would still want a distinct expected file for each suite which we don't currently have). The test summary should be inherently synchronous within a single suite, and the 'summarytest' can be a test teardown function rather than another test.

katrinafyi · 2025-03-10T02:04:04Z

I agree that this is symptomatic of bigger problems. Re more robust test sandboxing, that would be nice to have but I think it can be lower urgency.

At the moment, the remaining bottleneck for test runtime is Z3 which is itself multi-threaded. I found that increasing the scalatest parallelism beyond 2 actually l increases total runtime.

ailrst · 2025-03-10T02:23:11Z

Yeah, I guess that's a fundamental constraint. Hopefully we can eventually reduce the number of combinations of flags we are testing with the SystemTests suite, and rely more on unit tests for the individual components. For example the AnalysisTests could be ditched as the DSA memory tests also require the analysis pass to run, but it potentially makes it harder to identify where errors are occurring.

I wonder whether its worth adding a 'slow' tag for larger aggregate tests or those less important combinations of flags that we may want to just run locally?

Also I gather the nonsense with mill ./mill test.runMain org.scalatest.tools.Runner -R out/test/compile.dest/classes/ is also required for -P to work?

katrinafyi · 2025-03-10T02:29:51Z

More tags could definitely be helpful. The tags in #350 partition the test suites, but there's no need that they do that. We can and should add more tags for any interesting features.

The ./mill nonsense is definitely needed for the -n argument to choose tags. I don't remember if i tried -P with the default mill testOnly.

ailrst · 2025-03-10T03:47:41Z

At the moment, the remaining bottleneck for test runtime is Z3 which is itself multi-threaded. I found that increasing the scalatest parallelism beyond 2 actually l increases total runtime.

It doesn't look like I consistently get two instances of boogie running, so it might help to shuffle the test case order to reduce contention.

diff --git a/src/test/scala/SystemTests.scala b/src/test/scala/SystemTests.scala
index 74c7a3698..2a1d962c0 100644
--- a/src/test/scala/SystemTests.scala
+++ b/src/test/scala/SystemTests.scala
@@ -52,7 +52,7 @@ trait SystemTests extends AnyFunSuite, BASILTest {
 
   def runTests(folder: String, conf: TestConfig): Unit = {
     val path = testPath + folder
-    val programs = getSubdirectories(path)
+    val programs = scala.util.Random.shuffle(getSubdirectories(path))
 
     // get all variations of each program
     val testSuffix = if conf.useBAPFrontend then ":BAP" else ":GTIRB"

l-kent · 2025-03-11T05:08:16Z

I agree that a more straightforward and efficient approach to solve this problem would be to just set up the tests so that they are using different files for each test suite, which would avoid the need for locks entirely.

honestly, i think this might be annoying, especially if you try to run the tests locally to reproduce an issue and the order keeps changing.

this should avoid clashes where test suites try to write to the same output file. SystemTestsBAP and SystemTestsGTIRB are overriden to have no suffix, to maintain compatibility with the update-expected sbt/mill jobs.

This reverts commit f4b3a53.

katrinafyi · 2025-03-12T08:08:45Z

I've made the system tests use different output file names per suite 8ed2980. Double-checking this commit would be appreciated, especially the behaviour with updateExpected.

l-kent · 2025-03-13T04:11:04Z

It seems like it's working correctly with updateExpected, in that it only creates .expected files for the default SystemTests, not any of the other variations with extra flags, which is the only thing we were using the .expected files for.

This should mean that the locking isn't needed - no tests write to the .expected files, only the default SystemTests read from them, so now that the different suites write to their own output files there shouldn't be any conflicts.

katrinafyi · 2025-03-13T04:16:27Z

Thanks! In the same commit, I'd also removed the locking code from SystemTests - this is a little buried in the other changes, sorry!

Conflicts: src/test/scala/SystemTests.scala

-T is used to re-sort the test cases into the original order before printing them in batches for each suite to avoid interleaving individual cases from multiple suites running in parallel.

katrinafyi · 2025-03-13T06:09:49Z

We have some numbers now. This is a pleasant improvement over the baseline.

Before both test tagging and this PR:

After test tagging which enabled many more tests but still with one thread:

By running tests in parallel (this PR, do not mind the failure):

Quickly merge this PR before tests start failing again :")

katrinafyi and others added 8 commits March 8, 2025 13:54

add locks around systemtests

ef805cb

remove reset method from NodeCounter

2200d02

refactor to LockManager class

6ae016e

lock around testResults

0860913

scalafmt

812f640

Apply suggestions from code review

8285cfc

katrinafyi added 3 commits March 12, 2025 17:40

SystemTests: shuffle tests

f4b3a53

honestly, i think this might be annoying, especially if you try to run the tests locally to reproduce an issue and the order keeps changing.

SystemTests: insert suite name into output file names

8ed2980

this should avoid clashes where test suites try to write to the same output file. SystemTestsBAP and SystemTestsGTIRB are overriden to have no suffix, to maintain compatibility with the update-expected sbt/mill jobs.

Revert "SystemTests: shuffle tests"

30b0f1c

This reverts commit f4b3a53.

katrinafyi added 3 commits March 13, 2025 14:35

Merge remote-tracking branch 'origin/main' into scalatest-parallel

62698d1

Conflicts: src/test/scala/SystemTests.scala

ci: run 3 suites in parallel

29ad652

-T is used to re-sort the test cases into the original order before printing them in batches for each suite to avoid interleaving individual cases from multiple suites running in parallel.

RegionTimer: use atomic long instead of instance vars

2f19513

katrinafyi force-pushed the scalatest-parallel branch from 7e948ad to 2f19513 Compare March 13, 2025 05:34

ExtraSpecTests: retry on failure

3a7114a

ailrst merged commit 266a28b into main Mar 13, 2025
6 checks passed

katrinafyi deleted the scalatest-parallel branch March 13, 2025 06:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: thread-safe test suites #353

tests: thread-safe test suites #353

katrinafyi commented Mar 8, 2025 •

edited

Loading

ailrst commented Mar 10, 2025

katrinafyi commented Mar 10, 2025

ailrst commented Mar 10, 2025

katrinafyi commented Mar 10, 2025 •

edited

Loading

ailrst commented Mar 10, 2025 •

edited

Loading

l-kent commented Mar 11, 2025

katrinafyi commented Mar 12, 2025

l-kent commented Mar 13, 2025 •

edited

Loading

katrinafyi commented Mar 13, 2025

katrinafyi commented Mar 13, 2025

tests: thread-safe test suites #353

tests: thread-safe test suites #353

Conversation

katrinafyi commented Mar 8, 2025 • edited Loading

Testing

ailrst commented Mar 10, 2025

katrinafyi commented Mar 10, 2025

ailrst commented Mar 10, 2025

katrinafyi commented Mar 10, 2025 • edited Loading

ailrst commented Mar 10, 2025 • edited Loading

l-kent commented Mar 11, 2025

katrinafyi commented Mar 12, 2025

l-kent commented Mar 13, 2025 • edited Loading

katrinafyi commented Mar 13, 2025

katrinafyi commented Mar 13, 2025

katrinafyi commented Mar 8, 2025 •

edited

Loading

katrinafyi commented Mar 10, 2025 •

edited

Loading

ailrst commented Mar 10, 2025 •

edited

Loading

l-kent commented Mar 13, 2025 •

edited

Loading