Shard mutants (#197)

Allows running on multiple VMs to speed up CI - [x] User manual content - [x] News - [x] Test - [x] Add to cargo-mutants own tests - [x] Test sharding is applied before shuffling
sourcefrog · Dec 17, 2023 · 7171cd8 · 7171cd8
2 parents 4518b13 + a196b43
commit 7171cd8
Show file tree

Hide file tree

Showing 11 changed files with 319 additions and 23 deletions.
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -65,7 +65,7 @@ jobs:
       - run: cargo install --path .
       - name: Mutants
         run: |
-          cargo mutants --no-shuffle --exclude console.rs -j 2 -vV --in-diff git.diff
+          cargo mutants --no-shuffle -vV --in-diff git.diff
       - name: Archive mutants.out
         uses: actions/upload-artifact@v3
         if: always()
@@ -75,7 +75,10 @@ jobs:
 
   cargo-mutants:
     runs-on: ubuntu-latest
-    # needs: [build, incremental-mutants]
+    needs: [build, pr-mutants]
+    strategy:
+      matrix:
+        shard: [0, 1, 2, 3, 4, 5, 6, 7]
     steps:
       - uses: actions/checkout@v3
       - uses: dtolnay/rust-toolchain@master
@@ -85,7 +88,7 @@ jobs:
       - run: cargo install --path .
       - name: Mutants
         run: |
-          cargo mutants --no-shuffle --exclude console.rs -j 2 -vV
+          cargo mutants --no-shuffle -vV --shard ${{ matrix.shard }}/8
       - name: Archive mutants.out
         uses: actions/upload-artifact@v3
         if: always()

diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,11 @@
 # cargo-mutants changelog
 
+## Unreleased
+
+- New: A `--shard k/n` allows you to split the work across n independent parallel `cargo mutants` invocations running on separate machines to get a faster overall solution on large suites. You, or your CI system, are responsible for launching all the shards and checking whether any of them failed.
+
+- Improved: Better documentation about `-j`, with stronger recommendations not to set it too high.
+
 ## 23.12.1
 
 - Improved progress bars and console output, including putting the outcome of each mutant on the left, and the overall progress bar at the bottom. Improved display of estimated remaining time, and other times.

diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
@@ -20,7 +20,8 @@
   - [Error values](error-values.md)
 - [Improving performance](performance.md)
   - [Parallelism](parallelism.md)
-  - [Incremental tests of modified code](in-diff.md)
+  - [Sharding](shards.md)
+  - [Testing code changed in a diff](in-diff.md)
 - [Integrations](integrations.md)
 - [Continuous integration](ci.md)
   - [Incremental tests of pull requests](pr-diff.md)

diff --git a/book/src/in-diff.md b/book/src/in-diff.md
@@ -1,4 +1,4 @@
-# Incremental tests of modified code
+# Testing code changed in a diff
 
 If you're working on a large project or one with a long test suite, you may not want to test the entire codebase every time you make a change. You can use `cargo-mutants --in-diff` to test only mutants generated from recently changed code.
 
@@ -19,6 +19,8 @@ Changes to non-Rust files, or files from which no mutants are produced, are igno
 
 `--in-diff` makes tests faster by covering the mutants that are most likely to be missed in the changed code. However, it's certainly possible that edits in one region cause code in a different region or a different file to no longer be well tested. Incremental tests are helpful for giving faster feedback, but they're not a substitute for a full test run.
 
+The diff is only matched against the code under test, not the test code. So, a diff that only deletes or changes test code won't cause any mutants to run, even though it may have a very material effect on test coverage.
+
 ## Example
 
 In this diff, we've added a new function `two` to `src/lib.rs`, and the existing code is unaltered. With `--in-diff`, `cargo-mutants` will only test mutants that affect the function `two`.

diff --git a/book/src/parallelism.md b/book/src/parallelism.md
@@ -1,30 +1,47 @@
 # Parallelism
 
-After the initial test of the unmutated tree, cargo-mutants can test multiple
-mutants in parallel. This can give significant performance improvements,
-depending on the tree under test and the hardware resources available.
+After the initial test of the unmutated tree, cargo-mutants can run multiple
+builds and tests of the tree in parallel on a single machine. Separately, you can
+[shard](shards.md) work across multiple machines.
+
+**Caution:** `cargo build` and `cargo test` internally spawn many threads and processes and can be very resource hungry. Don't set `--jobs` too high, or your machine may thrash, run out of memory, or overheat.
+
+## Background
 
 Even though cargo builds, rustc, and Rust's test framework launch multiple
-processes or threads, they typically can't use all available CPU cores all the
-time, and many `cargo test` runs will end up using only one core waiting for the
-last task to complete. Running multiple jobs in parallel makes use of resources
-that would otherwise be idle.
+processes or threads, they typically spend some time waiting for straggler tasks, during which time some CPU cores are idle. For example, a cargo build commonly ends up waiting for a single-threaded linker for several seconds.
+
+Running one or more build or test tasks in parallel can use up this otherwise wasted capacity.
+This can give significant performance improvements, depending on the tree under test and the hardware resources available.
+
+## Timeouts
+
+Because tests may be slower with high parallelism, or may exhibit more variability in execution time, you may see some spurious timeouts, and you may need to set `--timeout` manually to allow enough safety margin. (User feedback on this is welcome.)
+
+## Non-hermetic tests
 
-By default, only one job is run at a time.
+If your test suite is non-hermetic -- for example, if it talks to an external database -- then running multiple jobs in parallel may cause test flakes. `cargo-mutants` is just running multiple copies of `cargo test` simultaneously: if that doesn't work in your tree, then you can't use this option.
 
-To run more, use the `--jobs` or `-j` option, or set the `CARGO_MUTANTS_JOBS`
-environment variable.
+## Choosing a job count
 
-Setting this higher than the number of CPU cores is unlikely to be helpful.
+You should set the number of jobs very conservatively, starting at `-j2` or `-j3`.
+
+Higher settings are only likely to be helpful on very large machines, perhaps with >100 cores and >256GB RAM.
+
+Unlike with `make`, setting `-j` proportionally to the number of cores is unlikely to work out well, because so the Rust build and test tools already parallelize very aggressively.
 
 The best setting will depend on many factors including the behavior of your
 program's test suite, the amount of memory on your system, and your system's
-behavior under high thermal load.
+behavior under high load. Ultimately you'll need to experiment to find the best setting.
+
+To tune the number of jobs, you can watch `htop` or some similar program while the tests are running, to see whether cores are fully utilized or whether the system is running out of memory. On laptop or desktop machines you might also want to watch the temperature of the CPU.
+
+As well as using more CPU and RAM, higher `-j` settings will also use more disk space in your temporary directory: Rust `target` directories can commonly be 2GB or more, and there will be one per parallel job, plus whatever temp files your test suite might create.
+
+## Interaction with `--test-threads`
+
+The Rust test framework exposes a `--test-threads` option controlling how many threads run inside a test binary. cargo-mutants doesn't set this, but you can set it from the command line, along with other parameters to the test binary. You might need to set this if your test suite is non-hermetic with regard to global process state.
 
-`-j 4` may be a good starting point. Start there and watch memory and CPU usage,
-and tune towards a setting where all cores are fully utilized without apparent
-thrashing, memory exhaustion, or thermal issues.
+Limiting the number of threads inside a single test binary would tend to make that binary less resource-hungry, and so _might_ allow you to set a higher `-j` option.
 
-Because tests may be slower with high parallelism, you may see some spurious
-timeouts, and you may need to set `--timeout` manually to allow enough safety
-margin.
+Reducing the number of test threads to increase `-j`  seems unlikely to help performance in most trees.
diff --git a/book/src/shards.md b/book/src/shards.md
@@ -0,0 +1,89 @@
+# Sharding
+
+In addition to [running multiple jobs locally](parallelism.md), cargo-mutants can also run jobs on multiple machines, to get an overall job faster.
+
+Each job tests a subset of mutants, selected by a shard. Shards are described as `k/n`, where `n` is the number of shards and `k` is the index of the shard, from 0 to `n-1`.
+
+There is no runtime coordination between shards: they each independently discover the available mutants and then select a subset based on the `--shard` option.
+
+If any shard fails then that would indicate that some mutants were missed, or there was some other problem.
+
+## Consistency across shards
+
+**CAUTION:**
+All shards must be run with the same arguments, and the same sharding `k`, or the results will be meaningless, as they won't agree on how to divide the work.
+
+Sharding can be combined with filters or shuffling, as long as the filters are set consistently in all shards. Sharding can also combine with `--in-diff`, again as long as all shards see the same diff.
+
+## Setting up sharding
+
+Your CI system or other tooling is responsible for launching multiple shards, and for collecting the results. You're responsible for choosing the number of shards (see below).
+
+For example, in GitHub Actions, you could use a matrix job to run multiple shards:
+
+```yaml
+  cargo-mutants:
+    runs-on: ubuntu-latest
+    # needs: [build, incremental-mutants]
+    strategy:
+      matrix:
+        shard: [0, 1, 2, 3, 4, 5, 6, 7]
+    steps:
+      - uses: actions/checkout@v3
+      - uses: dtolnay/rust-toolchain@master
+        with:
+          toolchain: beta
+      - uses: Swatinem/rust-cache@v2
+      - run: cargo install cargo-mutants
+      - name: Mutants
+        run: |
+          cargo mutants --no-shuffle -vV --shard ${{ matrix.shard }}/8
+      - name: Archive mutants.out
+        uses: actions/upload-artifact@v3
+        if: always()
+        with:
+          name: mutants.out
+          path: mutants.out
+```
+
+Note that the number of shards is set to match the `/8` in the `--shard` argument.
+
+## Performance of sharding
+
+Each mutant does some constant upfront work:
+
+* Any CI setup including starting the machine, getting a checkout, installing a Rust toolchain, and installing cargo-mutants
+* An initial clean build of the code under test
+* A baseline run of the unmutated code
+
+Then, for each mutant in its shard, it does an incremental build and runs all the tests.
+
+Each shard runs the same number of mutants, +/-1. Typically this will mean they each take roughly the same amount of time, although it's possible that some shards are unlucky in drawing mutants that happen to take longer to test.
+
+A rough model for the overall execution time for all of the shards, allowing for this work occuring in parallel, is
+
+```raw
+SHARD_STARTUP + (CLEAN_BUILD + TEST) + (N_MUTANTS/K) * (INCREMENTAL_BUILD + TEST)
+```
+
+The total cost in CPU seconds can be modelled as:
+
+```raw
+K * (SHARD_STARTUP + CLEAN_BUILD + TEST) + N_MUTANTS * (INCREMENTAL_BUILD + TEST)
+```
+
+As a result, at very large `k` the cost of the initial setup work will dominate, but overall time to solution will be minimized.
+
+## Choosing a number of shards
+
+Because there's some constant overhead for every shard there will be diminishing returns and increasing ineffiency if you use too many shards. (In the extreme cases where there are more shards than mutants, some of them will do the setup work, then find they have nothing to do and immediately exit.)
+
+As a rule of thumb, you should probably choose `k` such that each worker runs at least 10 mutants, and possibly much more. 8 to 32 shards might be a good place to start.
+
+The optimal setting probably depends on how long your tree takes to build from zero and incrementally, how long the tests take to run, and the performance of your CI system.
+
+If your CI system offers a choice of VM sizes you might experiment with using smaller or larger VMs and more or less shards: the optimal setting probably also depends on your tree's ability to exploit larger machines.
+
+You should also think about cost and capacity constraints in your CI system, and the risk of starving out other users.
+
+cargo-mutants has no internal scaling constraints to prevent you from setting `k` very large, if cost, efficiency and CI capacity are not a concern.
diff --git a/src/main.rs b/src/main.rs
@@ -24,6 +24,7 @@ mod path;
 mod pretty;
 mod process;
 mod scenario;
+mod shard;
 mod source;
 mod span;
 mod tail_file;
@@ -53,6 +54,7 @@ use crate::mutate::{Genre, Mutant};
 use crate::options::Options;
 use crate::outcome::{Phase, ScenarioOutcome};
 use crate::scenario::Scenario;
+use crate::shard::Shard;
 use crate::workspace::{PackageFilter, Workspace};
 
 const VERSION: &str = env!("CARGO_PKG_VERSION");
@@ -200,6 +202,10 @@ struct Args {
     #[arg(long)]
     no_shuffle: bool,
 
+    /// run only one shard of all generated mutants: specify as e.g. 1/4.
+    #[arg(long)]
+    shard: Option<Shard>,
+
     /// maximum run time for all cargo commands, in seconds.
     #[arg(long, short = 't')]
     timeout: Option<f64>,
@@ -291,6 +297,9 @@ fn main() -> Result<()> {
             &read_to_string(in_diff).context("Failed to read filter diff")?,
         )?;
     }
+    if let Some(shard) = &args.shard {
+        mutants = shard.select(mutants);
+    }
     if args.list {
         list_mutants(FmtToIoWrite::new(io::stdout()), &mutants, &options)?;
     } else {

diff --git a/src/shard.rs b/src/shard.rs
@@ -0,0 +1,85 @@
+// Copyright 2023 Martin Pool
+
+//! Sharding parameters.
+
+use std::str::FromStr;
+
+use anyhow::{anyhow, ensure, Context, Error};
+
+/// Select mutants for a particular shard of the total list.
+#[derive(Debug, Clone, Copy, Eq, PartialEq)]
+pub struct Shard {
+    /// Index modulo n.
+    pub k: usize,
+    /// Modulus of sharding.
+    pub n: usize,
+}
+
+impl Shard {
+    /// Select the mutants that should be run for this shard.
+    pub fn select<M, I: IntoIterator<Item = M>>(&self, mutants: I) -> Vec<M> {
+        mutants
+            .into_iter()
+            .enumerate()
+            .filter_map(|(i, m)| if i % self.n == self.k { Some(m) } else { None })
+            .collect()
+    }
+}
+
+impl FromStr for Shard {
+    type Err = Error;
+
+    fn from_str(s: &str) -> Result<Self, Self::Err> {
+        let parts = s.split_once('/').ok_or(anyhow!("shard must be k/n"))?;
+        let k = parts.0.parse().context("shard k")?;
+        let n = parts.1.parse().context("shard n")?;
+        ensure!(k < n, "shard k must be less than n"); // implies n>0
+        Ok(Shard { k, n })
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use std::str::FromStr;
+
+    use super::*;
+
+    #[test]
+    fn shard_from_str_valid_input() {
+        let shard = Shard::from_str("2/5").unwrap();
+        assert_eq!(shard.k, 2);
+        assert_eq!(shard.n, 5);
+        assert_eq!(shard, Shard { k: 2, n: 5 });
+    }
+
+    #[test]
+    fn shard_from_str_invalid_input() {
+        assert_eq!(
+            Shard::from_str("").unwrap_err().to_string(),
+            "shard must be k/n"
+        );
+
+        assert_eq!(
+            Shard::from_str("2").unwrap_err().to_string(),
+            "shard must be k/n"
+        );
+
+        assert_eq!(
+            Shard::from_str("2/0").unwrap_err().to_string(),
+            "shard k must be less than n"
+        );
+
+        assert_eq!(
+            Shard::from_str("5/2").unwrap_err().to_string(),
+            "shard k must be less than n"
+        );
+    }
+
+    #[test]
+    fn shard_select() {
+        assert_eq!(
+            Shard::from_str("1/4").unwrap().select(0..10).as_slice(),
+            &[1, 5, 9]
+        );
+    }
+}
diff --git a/src/snapshots/cargo_mutants__visit__test__expected_mutants_for_own_source_tree.snap b/src/snapshots/cargo_mutants__visit__test__expected_mutants_for_own_source_tree.snap
@@ -456,6 +456,11 @@ src/scenario.rs: replace Scenario::is_mutant -> bool with true
 src/scenario.rs: replace Scenario::is_mutant -> bool with false
 src/scenario.rs: replace Scenario::log_file_name_base -> String with String::new()
 src/scenario.rs: replace Scenario::log_file_name_base -> String with "xyzzy".into()
+src/shard.rs: replace Shard::select -> Vec<M> with vec![]
+src/shard.rs: replace Shard::select -> Vec<M> with vec![Default::default()]
+src/shard.rs: replace == with != in Shard::select
+src/shard.rs: replace <impl FromStr for Shard>::from_str -> Result<Self, Self::Err> with Ok(Default::default())
+src/shard.rs: replace <impl FromStr for Shard>::from_str -> Result<Self, Self::Err> with Err(::anyhow::anyhow!("mutated!"))
 src/source.rs: replace SourceFile::tree_relative_slashes -> String with String::new()
 src/source.rs: replace SourceFile::tree_relative_slashes -> String with "xyzzy".into()
 src/source.rs: replace SourceFile::path -> &Utf8Path with &Default::default()

diff --git a/tests/cli/main.rs b/tests/cli/main.rs
@@ -26,6 +26,7 @@ mod config;
 mod error_value;
 mod in_diff;
 mod jobs;
+mod shard;
 mod trace;
 #[cfg(windows)]
 mod windows;