Skip to content

Commit b45b98f

Browse files
authored
Tagging of test suites (#350)
In the Github actions, test cases are grouped into a number of coarse categories (unit tests, standard system tests, and analysis system tests). This PR aims to make these groups more concrete by defining them within the test files, instead of in the Github actions workflow. This will also allow you to run a particular group on your local machine while excluding the slower-running tests. These tags are defined as annotations within the test_util.tags package, and an annotation can be attached to any test suite class: @test_util.tags.UnitTest class IRTest extends AnyFunSuite { ... } When running tests, one can include or exclude a particular tag by using the -n or -l flags of the scalatest Runner. With SBT, these arguments can be affixed to the testOnly command: sbt "testOnly * -- -n test_util.tags.UnitTest" With mill, it is more complicated. Adding -n or -l to ./mill test.testOnly will not work :( (all test suites will still be run). I think there is a problem with how Mill invokes the scalatest runner. The workaround is to manually invoke the runner: ./mill test.runMain org.scalatest.tools.Runner -o -R out/test/compile.dest/classes -n test_util.tags.UnitTest After the -R .../classes, the usual test.testOnly arguments can be added. Note that the -o flag is important - without it, scalatest will try (and likely fail) to open a GUI. The -R argument should point to the output directory containing compiled class files. The Github CI jobs are changed to use the ./mill test.runMain command. Also, there is a new CI job which reports the test suites which do not have any tag. It is my hope that every test case will eventually have at least one tag - I will need some help with this. This will make sure that every test written is being exercised by the CI. Currently, it is too easy for tests to fall through the craks if they are not run regularly. This PR is part of an effort to improve the testing in Basil and force us to be more honest about the tests. This means that we should have a clear understanding of what each test is testing, whether it currently passes and, if not, why it is currently not working. I will point out these scalatest features to help this goal: pendingUntilFixed should be used to mark a block of code (typically containing an assertion) as one which is currently expected to fail. This should be used to record test cases which fail due to not-yet-implemented features or known bugs. If a change is made and the code no longer fails, this will cause the test to fail until the pendingUntilFixed is removed. It should be used within a test() { ... } block like so: test("1 == 2 sometime soon?") { assert(1 == 1, "obviously") pendingUntilFixed { assert(1 == 2, "todo") } } Tests can be marked as ignored by replacing test() with ignore(). An entire suite can be marked as ignored with the @org.scalatest.Ignore annotation. We also add a trait trait TestCustomisation to help with customising particular dynamically-generated tests based on their test case name (which includes file path and compiler/lifter options). Individual test cases can be marked as retry, disabled, or expect failure. This modified the behaviour of the test case and prints helpful output when running the test. For example: - correct/malloc_with_local3/clang:BAP (pending) + NOTE: Test case is customised with: "ExpectFailure(previous failure was: Expected verification success, but got failure. Failing assertion is: assert (load37_1 == R30_in))" + Failing assertion ./src/test/correct/malloc_with_local3/clang/malloc_with_local3_bap.bpl:264 + 261 | load36_1, Gamma_load36_1 := memory_load64_le(mem_9, bvadd64(R31_in, 18446744073709551600bv64)), (gamma_load64(Gamma_mem_9, bvadd64(R31_in, 18446744073709551600bv64)) || L(bvadd64(R31_in, 18446744073709551600bv64))); 262 | call rely(); 263 | load37_1, Gamma_load37_1 := memory_load64_le(mem_10, bvadd64(R31_in, 18446744073709551608bv64)), (gamma_load64(Gamma_mem_10, bvadd64(R31_in, 18446744073709551608bv64)) || L(bvadd64(R31_in, 18446744073709551608bv64))); > 264 | assert (load37_1 == R30_in); //is returning to caller-set R30 265 | goto printCharValue_2260_basil_return; 266 | printCharValue_2260_basil_return: 267 | assume {:captureState "printCharValue_2260_basil_return"} true; - analysis_differential:malloc_with_local/gcc_O2:GTIRB (pending) + NOTE: Test case is customised with: "ExpectFailure(needs printf_chk)" + STDOUT: "" Note: The tag annotations must be defined in Java (apparently) to make them visible to the test runner at runtime. * tagging up to unit tests * add standardsystemtests and comment for annotatetestcase * ci: check for test tagging * ci: use tags to run tests * scalafmt * ci: update message * use cancel() instead of assume(false) * DifferentialTest marking. added TestCustomisation mixin * tag remaining tests * add script. add some missing tags * ci: add durations to test runner output * SATTest: remove smt.timeout option (it is set by command line) * disable stack_pointer cases in memory region tests * extra spec tests * indirect call tests * tweak customised test messages, fix SimplifyMemorySystemTests * fix testcustomisation case oopsie, use self-type instead of AnyFunSuite * print current outcome of xfail tests * review: split ExpectFailure into TempFailure and NotImplemented * move TestCustomisation into its own file * tests: update indirect call test modes after merge * scalafmt * add script to execute scalatest runner this saves having to use the ./mill test.runMain ... ... command * IndirectCallTests: add remarks from review * add Slow tag, make check-test-tagging stricter * tag IntervalDSATest * IndirectCallTests: move to Analysis and un-mark some not implemented * add some helpful commands to scalatest.sh * docs: add testing.md to describe some testing infrastructure * IntervalDSATest: mark as broken * IntervalDSATest: unbreak one test * IntervalDSATest: unbreak
1 parent 5e01997 commit b45b98f

36 files changed

+493
-53
lines changed
+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/bin/bash -eu
2+
3+
set -o pipefail
4+
5+
test_dir=src/test
6+
7+
echo '::group::All test suites:'
8+
tests="$(./mill test.testOnly -- -t '' -oW | tr -d ':' | sort)"
9+
echo "$tests"
10+
echo '::endgroup::'
11+
echo
12+
echo '::group::Disabled test suites:'
13+
grep --color=always '@test_util\.tags\.DisabledTest' --context=1 -R $test_dir
14+
echo '::endgroup::'
15+
echo
16+
17+
ok=true
18+
echo '::group::Test suites with no tag annotations:'
19+
for t in $tests; do
20+
defn="$(grep 'class\s\+'"$t"'[ (]' --before-context=2 -R $test_dir)"
21+
# search for lines which are entirely "@test_util.tags.*Test"
22+
# leading - is produced by grep to mark prefixes
23+
if ! grep -q -- '-@test_util\.tags\..\+Test$' <<< "$defn"; then
24+
echo 'test suite has no `@test_util.tags.*Test` annotation:' >&2
25+
grep --color=always 'class\s\+'"$t"'[ (]' --before-context=2 -R $test_dir
26+
echo
27+
ok=false
28+
fi
29+
done
30+
echo '::endgroup::'
31+
32+
$ok
33+

.github/workflows/run-examples.yml

+20-48
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,18 @@ on:
88
- main
99
workflow_dispatch:
1010
jobs:
11+
CheckTestTagging:
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: actions/checkout@v4
15+
- uses: actions/setup-java@v4
16+
with:
17+
distribution: 'temurin'
18+
java-version: '21'
19+
- run: './mill test.compile'
20+
- name: Check all test suites have at least one tag
21+
run: .github/workflows/check-test-tagging.sh
22+
1123
Scalafmt:
1224
runs-on: ubuntu-latest
1325
steps:
@@ -52,8 +64,9 @@ jobs:
5264
- run: sudo apt-get install -y z3='4.8.12-*'
5365
- run: dotnet tool install --global boogie --version '3.4.3'
5466

55-
- name: System Tests
56-
run: ./mill test.testOnly 'SystemTests*'
67+
- run: ./mill compile
68+
- run: ./scripts/scalatest.sh -oID -n test_util.tags.StandardSystemTest
69+
5770
- uses: actions/upload-artifact@v4
5871
with:
5972
name: testresult-${{ github.run_number }}
@@ -103,44 +116,11 @@ jobs:
103116
- run: dotnet tool install --global boogie --version '3.4.3'
104117

105118
- run: ./mill test.compile
119+
- run: ./scripts/scalatest.sh -oID -n test_util.tags.UnitTest
106120

107121
# every test with package prefix:
108122
# sbt "show test:definedTests"
109123

110-
# note: tests prefixed with '!' are expected to fail.
111-
# if they are fixed, the '!' should be removed.
112-
113-
- run: ./mill test.testOnly IrreducibleLoop
114-
if: ${{ ! cancelled() }}
115-
- run: ./mill test.testOnly BitVectorAnalysisTests
116-
if: ${{ ! cancelled() }}
117-
- run: ./mill test.testOnly '*IntrusiveListPublicInterfaceTest'
118-
if: ${{ ! cancelled() }}
119-
- run: ./mill test.testOnly util.intrusive_list.IntrusiveListTest
120-
if: ${{ ! cancelled() }}
121-
- run: ./mill test.testOnly DataStructureAnalysisTest
122-
if: ${{ ! cancelled() }}
123-
- run: ./mill test.testOnly ParamAnalysisTests
124-
if: ${{ ! cancelled() }}
125-
- run: ./mill test.testOnly LiveVarsAnalysisTests
126-
if: ${{ ! cancelled() }}
127-
- run: ./mill test.testOnly ir.ToScalaTest
128-
if: ${{ ! cancelled() }}
129-
- run: ./mill test.testOnly ir.IRTest
130-
if: ${{ ! cancelled() }}
131-
- run: ./mill test.testOnly ir.IRToDSLTest
132-
if: ${{ ! cancelled() }}
133-
- run: ./mill test.testOnly ir.CILVisitorTest
134-
if: ${{ ! cancelled() }}
135-
- run: ./mill test.testOnly ir.InterpreterTests
136-
if: ${{ ! cancelled() }}
137-
- run: ./mill test.testOnly ir.InvariantTest
138-
if: ${{ ! cancelled() }}
139-
- run: ./mill test.testOnly ProcedureSummaryTests || true
140-
if: ${{ ! cancelled() }}
141-
- run: ./mill test.testOnly TaintAnalysisTests
142-
if: ${{ ! cancelled() }}
143-
144124
AnalysisSystemTests:
145125
runs-on: ubuntu-latest
146126

@@ -161,15 +141,7 @@ jobs:
161141
- run: sudo apt-get install -y z3='4.8.12-*'
162142
- run: dotnet tool install --global boogie --version '3.4.3'
163143

164-
- run: echo "All systemtest suites:" & ./mill test.testOnly '*SystemTests*' -- -z 'xxxx'
165-
166-
- run: ./mill test.testOnly DSAMemoryRegionSystemTestsBAP
167-
if: ${{ ! cancelled() }}
168-
- run: ./mill test.testOnly DSAMemoryRegionSystemTestsGTIRB
169-
if: ${{ ! cancelled() }}
170-
- run: ./mill test.testOnly AnalysisSystemTestsBAP
171-
if: ${{ ! cancelled() }}
172-
- run: ./mill test.testOnly AnalysisSystemTestsGTIRB
173-
if: ${{ ! cancelled() }}
174-
- run: ./mill test.testOnly SimplifySystemTests
175-
if: ${{ ! cancelled() }}
144+
- run: echo "All systemtest suites:" && ./mill test.testOnly '*SystemTests*' -- -z 'xxxx'
145+
146+
- run: ./mill compile
147+
- run: ./scripts/scalatest.sh -oID -n test_util.tags.AnalysisSystemTest

docs/development/testing.md

+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
Testing
2+
=======
3+
4+
Testing is crucial for validating new features and preventing regressions.
5+
It is important that tests are written for any new features, as well as any
6+
identified bugs.
7+
It is important that we should have a clear understanding of what each test
8+
is testing, whether it currently passes and, if not, why it is currently
9+
not working.
10+
This page will describe some tools in the codebase which help with developing
11+
and maintaining test cases.
12+
13+
Writing test cases
14+
------------------
15+
Test cases are written in Scalatest using its
16+
[`AnyFunSuite`](https://www.scalatest.org/scaladoc/3.1.2/org/scalatest/funsuite/AnyFunSuite.html) style.
17+
See the AnyFunSuite documentation or existing test cases for syntax and examples.
18+
19+
### Exporting IR structures into test cases
20+
21+
Often, you might have found a particular Basil IR program which demonstrates some bug in the code.
22+
It is good practice to extract this into a test case, both to validate the fix and ensure the bug doesn't reoccur.
23+
To help with this, a Basil IR program can be converted to
24+
a Scala literal by using the [ToScala](https://github.com/UQ-PAC/BASIL/blob/main/src/main/scala/ir/dsl/ToScala.scala)
25+
trait.
26+
27+
To do this, first `import ir.dsl.given`, then you can use the `.toScala` extension method on programs, procedures, or blocks.
28+
This gives you a string which is valid Scala code.
29+
This can be copied and pasted into a unit test.
30+
When executed, the Scala code to will re-construct that Basil IR structure using the DSL.
31+
32+
### Tagging test suites
33+
34+
[Tags](https://www.scalatest.org/scaladoc/3.2.1/org/scalatest/Tag.html)
35+
are used to categorise test classes based on, roughly, the kind of test (e.g., unit tests or system (end-to-end) tests).
36+
Each test suite should be tagged with one of the
37+
[`@test_util.tags.*Test`](https://github.com/UQ-PAC/BASIL/tree/main/src/test/scala/test_util/tags) tags,
38+
placed on the line before the AnyFunSuite class declaration.
39+
A test suite may, additionally, be tagged with one or more of the supplementary tags (those not ending in Test).
40+
41+
To run only tests from a specific tag, you can use
42+
```bash
43+
./scripts/scalatest.sh -o -n test_utils.tags.TagName
44+
```
45+
Note that the tag name must be fully-qualified (i.e., including the package name).
46+
See [the Scalatest runner docs](https://www.scalatest.org/user_guide/using_the_runner) or the `scalatest.sh` file
47+
for more options.
48+
49+
### Dynamic tests
50+
51+
Note that the `test("test name")` method can be written anywhere within a AnyFunSuite body, including
52+
within loops or conditionals.
53+
This allows you to dynamically generate test cases, as in
54+
Basil's [`SystemTests`](/src/test/scala/SystemTests.scala).
55+
This should be used sparingly.
56+
57+
58+
Maintaining test cases
59+
----------------------
60+
Over time, test cases might break due to code changes or refactoring.
61+
Of course, failing tests should be fixed as soon as possible.
62+
However, this is not always possible - maybe a test case relies on features not yet implemented.
63+
It might be reasonable to allow a test to fail for a period of time until the fixes are ready.
64+
In these cases, it is important that tests which are known/expected to fail are clearly marked
65+
and the reason for their failure should be recorded in the code.
66+
67+
Generally, the strategy is that failing tests should still be executed.
68+
They should be annotated so that they are allowed to fail, but if they start passing,
69+
that should raise an error until the annotation is removed.
70+
This allows the test code to be an accurate record of the expected outcome of each test.
71+
72+
73+
### pendingUntilFixed (for expected failures)
74+
75+
[`pendingUntilFixed`](https://www.scalatest.org/scaladoc/3.2.3/org/scalatest/Assertions.html#pendingUntilFixed(f:=%3EUnit)(implicitpos:org.scalactic.source.Position):org.scalatest.Assertionwithorg.scalatest.PendingStatement) should be used to mark a block of code (typically containing an assertion) as one which is currently expected to fail. This should be used to record test cases which fail due to not-yet-implemented features or known bugs. If a change is made and the code no longer fails, this will cause the test to fail until the pendingUntilFixed is removed.
76+
77+
It should be used within a `test() { ... }` block like so, with a comment documenting the cause of failure and expected future resolution.
78+
```scala
79+
test("1 == 2 sometime soon?") {
80+
assert(1 == 1, "obviously")
81+
82+
// broken until we fix maths
83+
pendingUntilFixed {
84+
assert(1 == 2, "todo")
85+
}
86+
}
87+
```
88+
89+
- Tests can be marked as ignored by replacing `test()` with `ignore()`. An entire suite can be marked as ignored with the `@org.scalatest.Ignore` annotation.
90+
91+
### TestCustomisation (for dynamically-generated tests)
92+
93+
For some tests, particularly those which are dynamically-generated by a loop, it is not practical to add a `pendingUntilFixed` block
94+
into the test body.
95+
For these cases, there is a `trait TestCustomisation` to help with customising dynamically-generated tests
96+
based on their test case name (for system tests, this includes the file path and compiler/lifter options).
97+
98+
To use this, the test suite class should be made to extend [TestCustomisation](/src/test/scala/test_util/TestCustomisation.scala).
99+
This defines an abstract method customiseTestsByName which controls the mode of each test case.
100+
```scala
101+
@test_util.tags.UnitTest
102+
class ProcedureSummaryTests extends AnyFunSuite, TestCustomisation {
103+
104+
override def customiseTestsByName(name: String) = {
105+
name match {
106+
case "test a" => Mode.NotImplemented("doesn't seem to work yet")
107+
case _ => Mode.Normal
108+
}
109+
}
110+
111+
test("test a") {
112+
assert(false);
113+
}
114+
115+
test("test b") {
116+
assert(true);
117+
}
118+
}
119+
```
120+
Test cases can be marked as retry, disabled, not implemented, or temporary failure
121+
(see TestCustomisation source file for more details).
122+
This modifies the behaviour of the test case and prints helpful output when running the test. For example:
123+
```c
124+
- correct/malloc_with_local3/clang:BAP (pending)
125+
+ NOTE: Test case is customised with: "ExpectFailure(previous failure was: Expected verification success, but got failure. Failing assertion is: assert (load37_1 == R30_in))"
126+
127+
+ Failing assertion ./src/test/correct/malloc_with_local3/clang/malloc_with_local3_bap.bpl:264
128+
+ 261 | load36_1, Gamma_load36_1 := memory_load64_le(mem_9, bvadd64(R31_in, 18446744073709551600bv64)), (gamma_load64(Gamma_mem_9, bvadd64(R31_in, 18446744073709551600bv64)) || L(bvadd64(R31_in, 18446744073709551600bv64)));
129+
262 | call rely();
130+
263 | load37_1, Gamma_load37_1 := memory_load64_le(mem_10, bvadd64(R31_in, 18446744073709551608bv64)), (gamma_load64(Gamma_mem_10, bvadd64(R31_in, 18446744073709551608bv64)) || L(bvadd64(R31_in, 18446744073709551608bv64)));
131+
> 264 | assert (load37_1 == R30_in); //is returning to caller-set R30
132+
265 | goto printCharValue_2260_basil_return;
133+
266 | printCharValue_2260_basil_return:
134+
267 | assume {:captureState "printCharValue_2260_basil_return"} true;
135+
```
136+
137+
```c
138+
- analysis_differential:malloc_with_local/gcc_O2:GTIRB (pending)
139+
+ NOTE: Test case is customised with: "ExpectFailure(needs printf_chk)"
140+
141+
+ STDOUT: ""
142+
```
143+
144+
Modes which expect failure (temp failure and not implemented) will show as "pending" when
145+
executing scalatest.
146+
The disabled mode will show as "cancelled".
147+
Both of these will be output in yellow text if your console is using colour.
148+

scripts/scalatest.sh

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/bin/bash
2+
3+
# Executes the full Scalatest runner (https://www.scalatest.org/user_guide/using_the_runner).
4+
#
5+
# This is needed to use some more advanced arguments of the runner, since `./mill test.run`
6+
# silently ignores some arguments (for example, -n and -l).
7+
8+
# NOTE: executing the runner through this script may try to start a GUI.
9+
# to avoid this and use the console, pass -o.
10+
11+
# useful commands
12+
# ---------------
13+
#
14+
# - to include only a specific tag, use -n.
15+
# - to exclude a specific tag, use -l.
16+
# - these can be combined. for example, to select all StandardSystemTest except Slow ones:
17+
#
18+
# ./scripts/scalatest.sh -o -n test_util.tags.StandardSystemTest -l test_util.tags.Slow
19+
#
20+
# - when passing tags through the command line, the fully-qualified name must be used.
21+
# - to print individual test durations, pass D to -o. for example, -oD.
22+
# - to print a summary of failing tests at the end of the output, pass I to -o.
23+
#
24+
25+
classes="$(./mill show test.compile | grep '"classes"' | cut -d'"' -f4 | cut -d: -f4)"
26+
27+
if ! [[ -d "$classes" ]]; then
28+
echo "unable to determine mill class output directory: $classes" >&2
29+
exit 1
30+
fi
31+
32+
exec ./mill test.runMain org.scalatest.tools.Runner -R "$classes" "$@"
33+

src/test/scala/BitVectorAnalysisTests.scala

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import util.Logger
55

66
import scala.runtime.stdLibPatches.Predef.assert
77

8+
@test_util.tags.UnitTest
89
class BitVectorAnalysisTests extends AnyFunSuite {
910

1011
test("BitVector to Natural - should convert BitVector to natural number") {

src/test/scala/DataStructureAnalysisTest.scala

+1
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ import translating.PrettyPrinter.*
2727
* BASILRESULT.analysis.get.bu is the set of graphs from the end of the bottom-up phase BASILRESULT.analysis.get.td is
2828
* the set of graphs from the end of the top-down phase
2929
*/
30+
@test_util.tags.UnitTest
3031
class DataStructureAnalysisTest extends AnyFunSuite {
3132

3233
def runAnalysis(program: Program): StaticAnalysisContext = {

src/test/scala/DifferentialAnalysisTest.scala

+24-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,28 @@ import util.RunUtils.loadAndTranslate
3030

3131
import scala.collection.mutable
3232

33-
class DifferentialTest extends AnyFunSuite {
33+
abstract class DifferentialTest extends AnyFunSuite, TestCustomisation {
34+
35+
override def customiseTestsByName(name: String) = name match {
36+
case "analysis_differential:floatingpoint/clang:GTIRB" | "analysis_differential:floatingpoint/gcc:GTIRB" =>
37+
Mode.NotImplemented("needs FP_Mul")
38+
39+
case "analysis_differential:function1/gcc_O2:BAP" | "analysis_differential:function1/gcc_O2:GTIRB" |
40+
"analysis_differential:malloc_with_local/gcc_O2:BAP" | "analysis_differential:malloc_with_local/gcc_O2:GTIRB" |
41+
"analysis_differential:malloc_with_local3/gcc_O2:BAP" |
42+
"analysis_differential:malloc_with_local3/gcc_O2:GTIRB" =>
43+
Mode.NotImplemented("needs printf_chk")
44+
45+
case "analysis_differential:syscall/clang:BAP" | "analysis_differential:syscall/clang:GTIRB" |
46+
"analysis_differential:syscall/clang_O2:GTIRB" | "analysis_differential:syscall/gcc:BAP" |
47+
"analysis_differential:syscall/gcc:GTIRB" =>
48+
Mode.NotImplemented("needs fork")
49+
50+
case "analysis_differential:syscall/gcc_O2:BAP" => Mode.TempFailure("traceInit empty")
51+
case "analysis_differential:syscall/gcc_O2:GTIRB" => Mode.NotImplemented("needs fork")
52+
53+
case _ => Mode.Normal
54+
}
3455

3556
Logger.setLevel(LogLevel.WARN)
3657

@@ -114,6 +135,7 @@ class DifferentialTest extends AnyFunSuite {
114135
}
115136
}
116137

138+
@test_util.tags.AnalysisSystemTest
117139
class DifferentialAnalysisTest extends DifferentialTest {
118140

119141
def runSystemTests(): Unit = {
@@ -147,6 +169,7 @@ class DifferentialAnalysisTest extends DifferentialTest {
147169
runSystemTests()
148170
}
149171

172+
@test_util.tags.AnalysisSystemTest
150173
class DifferentialAnalysisTestSimplification extends DifferentialTest {
151174

152175
def runSystemTests(): Unit = {

0 commit comments

Comments
 (0)