Skip to content

Commit

Permalink
Shard mutants (#197)
Browse files Browse the repository at this point in the history
Allows running on multiple VMs to speed up CI

- [x] User manual content
- [x] News
- [x] Test
- [x] Add to cargo-mutants own tests
- [x] Test sharding is applied before shuffling
  • Loading branch information
sourcefrog authored Dec 17, 2023
2 parents 4518b13 + a196b43 commit 7171cd8
Show file tree
Hide file tree
Showing 11 changed files with 319 additions and 23 deletions.
9 changes: 6 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
- run: cargo install --path .
- name: Mutants
run: |
cargo mutants --no-shuffle --exclude console.rs -j 2 -vV --in-diff git.diff
cargo mutants --no-shuffle -vV --in-diff git.diff
- name: Archive mutants.out
uses: actions/upload-artifact@v3
if: always()
Expand All @@ -75,7 +75,10 @@ jobs:

cargo-mutants:
runs-on: ubuntu-latest
# needs: [build, incremental-mutants]
needs: [build, pr-mutants]
strategy:
matrix:
shard: [0, 1, 2, 3, 4, 5, 6, 7]
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@master
Expand All @@ -85,7 +88,7 @@ jobs:
- run: cargo install --path .
- name: Mutants
run: |
cargo mutants --no-shuffle --exclude console.rs -j 2 -vV
cargo mutants --no-shuffle -vV --shard ${{ matrix.shard }}/8
- name: Archive mutants.out
uses: actions/upload-artifact@v3
if: always()
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# cargo-mutants changelog

## Unreleased

- New: A `--shard k/n` allows you to split the work across n independent parallel `cargo mutants` invocations running on separate machines to get a faster overall solution on large suites. You, or your CI system, are responsible for launching all the shards and checking whether any of them failed.

- Improved: Better documentation about `-j`, with stronger recommendations not to set it too high.

## 23.12.1

- Improved progress bars and console output, including putting the outcome of each mutant on the left, and the overall progress bar at the bottom. Improved display of estimated remaining time, and other times.
Expand Down
3 changes: 2 additions & 1 deletion book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
- [Error values](error-values.md)
- [Improving performance](performance.md)
- [Parallelism](parallelism.md)
- [Incremental tests of modified code](in-diff.md)
- [Sharding](shards.md)
- [Testing code changed in a diff](in-diff.md)
- [Integrations](integrations.md)
- [Continuous integration](ci.md)
- [Incremental tests of pull requests](pr-diff.md)
Expand Down
4 changes: 3 additions & 1 deletion book/src/in-diff.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Incremental tests of modified code
# Testing code changed in a diff

If you're working on a large project or one with a long test suite, you may not want to test the entire codebase every time you make a change. You can use `cargo-mutants --in-diff` to test only mutants generated from recently changed code.

Expand All @@ -19,6 +19,8 @@ Changes to non-Rust files, or files from which no mutants are produced, are igno

`--in-diff` makes tests faster by covering the mutants that are most likely to be missed in the changed code. However, it's certainly possible that edits in one region cause code in a different region or a different file to no longer be well tested. Incremental tests are helpful for giving faster feedback, but they're not a substitute for a full test run.

The diff is only matched against the code under test, not the test code. So, a diff that only deletes or changes test code won't cause any mutants to run, even though it may have a very material effect on test coverage.

## Example

In this diff, we've added a new function `two` to `src/lib.rs`, and the existing code is unaltered. With `--in-diff`, `cargo-mutants` will only test mutants that affect the function `two`.
Expand Down
53 changes: 35 additions & 18 deletions book/src/parallelism.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,47 @@
# Parallelism

After the initial test of the unmutated tree, cargo-mutants can test multiple
mutants in parallel. This can give significant performance improvements,
depending on the tree under test and the hardware resources available.
After the initial test of the unmutated tree, cargo-mutants can run multiple
builds and tests of the tree in parallel on a single machine. Separately, you can
[shard](shards.md) work across multiple machines.

**Caution:** `cargo build` and `cargo test` internally spawn many threads and processes and can be very resource hungry. Don't set `--jobs` too high, or your machine may thrash, run out of memory, or overheat.

## Background

Even though cargo builds, rustc, and Rust's test framework launch multiple
processes or threads, they typically can't use all available CPU cores all the
time, and many `cargo test` runs will end up using only one core waiting for the
last task to complete. Running multiple jobs in parallel makes use of resources
that would otherwise be idle.
processes or threads, they typically spend some time waiting for straggler tasks, during which time some CPU cores are idle. For example, a cargo build commonly ends up waiting for a single-threaded linker for several seconds.

Running one or more build or test tasks in parallel can use up this otherwise wasted capacity.
This can give significant performance improvements, depending on the tree under test and the hardware resources available.

## Timeouts

Because tests may be slower with high parallelism, or may exhibit more variability in execution time, you may see some spurious timeouts, and you may need to set `--timeout` manually to allow enough safety margin. (User feedback on this is welcome.)

## Non-hermetic tests

By default, only one job is run at a time.
If your test suite is non-hermetic -- for example, if it talks to an external database -- then running multiple jobs in parallel may cause test flakes. `cargo-mutants` is just running multiple copies of `cargo test` simultaneously: if that doesn't work in your tree, then you can't use this option.

To run more, use the `--jobs` or `-j` option, or set the `CARGO_MUTANTS_JOBS`
environment variable.
## Choosing a job count

Setting this higher than the number of CPU cores is unlikely to be helpful.
You should set the number of jobs very conservatively, starting at `-j2` or `-j3`.

Higher settings are only likely to be helpful on very large machines, perhaps with >100 cores and >256GB RAM.

Unlike with `make`, setting `-j` proportionally to the number of cores is unlikely to work out well, because so the Rust build and test tools already parallelize very aggressively.

The best setting will depend on many factors including the behavior of your
program's test suite, the amount of memory on your system, and your system's
behavior under high thermal load.
behavior under high load. Ultimately you'll need to experiment to find the best setting.

To tune the number of jobs, you can watch `htop` or some similar program while the tests are running, to see whether cores are fully utilized or whether the system is running out of memory. On laptop or desktop machines you might also want to watch the temperature of the CPU.

As well as using more CPU and RAM, higher `-j` settings will also use more disk space in your temporary directory: Rust `target` directories can commonly be 2GB or more, and there will be one per parallel job, plus whatever temp files your test suite might create.

## Interaction with `--test-threads`

The Rust test framework exposes a `--test-threads` option controlling how many threads run inside a test binary. cargo-mutants doesn't set this, but you can set it from the command line, along with other parameters to the test binary. You might need to set this if your test suite is non-hermetic with regard to global process state.

`-j 4` may be a good starting point. Start there and watch memory and CPU usage,
and tune towards a setting where all cores are fully utilized without apparent
thrashing, memory exhaustion, or thermal issues.
Limiting the number of threads inside a single test binary would tend to make that binary less resource-hungry, and so _might_ allow you to set a higher `-j` option.

Because tests may be slower with high parallelism, you may see some spurious
timeouts, and you may need to set `--timeout` manually to allow enough safety
margin.
Reducing the number of test threads to increase `-j` seems unlikely to help performance in most trees.
89 changes: 89 additions & 0 deletions book/src/shards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Sharding

In addition to [running multiple jobs locally](parallelism.md), cargo-mutants can also run jobs on multiple machines, to get an overall job faster.

Each job tests a subset of mutants, selected by a shard. Shards are described as `k/n`, where `n` is the number of shards and `k` is the index of the shard, from 0 to `n-1`.

There is no runtime coordination between shards: they each independently discover the available mutants and then select a subset based on the `--shard` option.

If any shard fails then that would indicate that some mutants were missed, or there was some other problem.

## Consistency across shards

**CAUTION:**
All shards must be run with the same arguments, and the same sharding `k`, or the results will be meaningless, as they won't agree on how to divide the work.

Sharding can be combined with filters or shuffling, as long as the filters are set consistently in all shards. Sharding can also combine with `--in-diff`, again as long as all shards see the same diff.

## Setting up sharding

Your CI system or other tooling is responsible for launching multiple shards, and for collecting the results. You're responsible for choosing the number of shards (see below).

For example, in GitHub Actions, you could use a matrix job to run multiple shards:

```yaml
cargo-mutants:
runs-on: ubuntu-latest
# needs: [build, incremental-mutants]
strategy:
matrix:
shard: [0, 1, 2, 3, 4, 5, 6, 7]
steps:
- uses: actions/checkout@v3
- uses: dtolnay/rust-toolchain@master
with:
toolchain: beta
- uses: Swatinem/rust-cache@v2
- run: cargo install cargo-mutants
- name: Mutants
run: |
cargo mutants --no-shuffle -vV --shard ${{ matrix.shard }}/8
- name: Archive mutants.out
uses: actions/upload-artifact@v3
if: always()
with:
name: mutants.out
path: mutants.out
```
Note that the number of shards is set to match the `/8` in the `--shard` argument.

## Performance of sharding

Each mutant does some constant upfront work:

* Any CI setup including starting the machine, getting a checkout, installing a Rust toolchain, and installing cargo-mutants
* An initial clean build of the code under test
* A baseline run of the unmutated code

Then, for each mutant in its shard, it does an incremental build and runs all the tests.

Each shard runs the same number of mutants, +/-1. Typically this will mean they each take roughly the same amount of time, although it's possible that some shards are unlucky in drawing mutants that happen to take longer to test.

A rough model for the overall execution time for all of the shards, allowing for this work occuring in parallel, is

```raw
SHARD_STARTUP + (CLEAN_BUILD + TEST) + (N_MUTANTS/K) * (INCREMENTAL_BUILD + TEST)
```

The total cost in CPU seconds can be modelled as:

```raw
K * (SHARD_STARTUP + CLEAN_BUILD + TEST) + N_MUTANTS * (INCREMENTAL_BUILD + TEST)
```

As a result, at very large `k` the cost of the initial setup work will dominate, but overall time to solution will be minimized.

## Choosing a number of shards

Because there's some constant overhead for every shard there will be diminishing returns and increasing ineffiency if you use too many shards. (In the extreme cases where there are more shards than mutants, some of them will do the setup work, then find they have nothing to do and immediately exit.)

As a rule of thumb, you should probably choose `k` such that each worker runs at least 10 mutants, and possibly much more. 8 to 32 shards might be a good place to start.

The optimal setting probably depends on how long your tree takes to build from zero and incrementally, how long the tests take to run, and the performance of your CI system.

If your CI system offers a choice of VM sizes you might experiment with using smaller or larger VMs and more or less shards: the optimal setting probably also depends on your tree's ability to exploit larger machines.

You should also think about cost and capacity constraints in your CI system, and the risk of starving out other users.

cargo-mutants has no internal scaling constraints to prevent you from setting `k` very large, if cost, efficiency and CI capacity are not a concern.
9 changes: 9 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ mod path;
mod pretty;
mod process;
mod scenario;
mod shard;
mod source;
mod span;
mod tail_file;
Expand Down Expand Up @@ -53,6 +54,7 @@ use crate::mutate::{Genre, Mutant};
use crate::options::Options;
use crate::outcome::{Phase, ScenarioOutcome};
use crate::scenario::Scenario;
use crate::shard::Shard;
use crate::workspace::{PackageFilter, Workspace};

const VERSION: &str = env!("CARGO_PKG_VERSION");
Expand Down Expand Up @@ -200,6 +202,10 @@ struct Args {
#[arg(long)]
no_shuffle: bool,

/// run only one shard of all generated mutants: specify as e.g. 1/4.
#[arg(long)]
shard: Option<Shard>,

/// maximum run time for all cargo commands, in seconds.
#[arg(long, short = 't')]
timeout: Option<f64>,
Expand Down Expand Up @@ -291,6 +297,9 @@ fn main() -> Result<()> {
&read_to_string(in_diff).context("Failed to read filter diff")?,
)?;
}
if let Some(shard) = &args.shard {
mutants = shard.select(mutants);
}
if args.list {
list_mutants(FmtToIoWrite::new(io::stdout()), &mutants, &options)?;
} else {
Expand Down
85 changes: 85 additions & 0 deletions src/shard.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
// Copyright 2023 Martin Pool

//! Sharding parameters.
use std::str::FromStr;

use anyhow::{anyhow, ensure, Context, Error};

/// Select mutants for a particular shard of the total list.
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
pub struct Shard {
/// Index modulo n.
pub k: usize,
/// Modulus of sharding.
pub n: usize,
}

impl Shard {
/// Select the mutants that should be run for this shard.
pub fn select<M, I: IntoIterator<Item = M>>(&self, mutants: I) -> Vec<M> {
mutants
.into_iter()
.enumerate()
.filter_map(|(i, m)| if i % self.n == self.k { Some(m) } else { None })
.collect()
}
}

impl FromStr for Shard {
type Err = Error;

fn from_str(s: &str) -> Result<Self, Self::Err> {
let parts = s.split_once('/').ok_or(anyhow!("shard must be k/n"))?;
let k = parts.0.parse().context("shard k")?;
let n = parts.1.parse().context("shard n")?;
ensure!(k < n, "shard k must be less than n"); // implies n>0
Ok(Shard { k, n })
}
}

#[cfg(test)]
mod tests {
use std::str::FromStr;

use super::*;

#[test]
fn shard_from_str_valid_input() {
let shard = Shard::from_str("2/5").unwrap();
assert_eq!(shard.k, 2);
assert_eq!(shard.n, 5);
assert_eq!(shard, Shard { k: 2, n: 5 });
}

#[test]
fn shard_from_str_invalid_input() {
assert_eq!(
Shard::from_str("").unwrap_err().to_string(),
"shard must be k/n"
);

assert_eq!(
Shard::from_str("2").unwrap_err().to_string(),
"shard must be k/n"
);

assert_eq!(
Shard::from_str("2/0").unwrap_err().to_string(),
"shard k must be less than n"
);

assert_eq!(
Shard::from_str("5/2").unwrap_err().to_string(),
"shard k must be less than n"
);
}

#[test]
fn shard_select() {
assert_eq!(
Shard::from_str("1/4").unwrap().select(0..10).as_slice(),
&[1, 5, 9]
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,11 @@ src/scenario.rs: replace Scenario::is_mutant -> bool with true
src/scenario.rs: replace Scenario::is_mutant -> bool with false
src/scenario.rs: replace Scenario::log_file_name_base -> String with String::new()
src/scenario.rs: replace Scenario::log_file_name_base -> String with "xyzzy".into()
src/shard.rs: replace Shard::select -> Vec<M> with vec![]
src/shard.rs: replace Shard::select -> Vec<M> with vec![Default::default()]
src/shard.rs: replace == with != in Shard::select
src/shard.rs: replace <impl FromStr for Shard>::from_str -> Result<Self, Self::Err> with Ok(Default::default())
src/shard.rs: replace <impl FromStr for Shard>::from_str -> Result<Self, Self::Err> with Err(::anyhow::anyhow!("mutated!"))
src/source.rs: replace SourceFile::tree_relative_slashes -> String with String::new()
src/source.rs: replace SourceFile::tree_relative_slashes -> String with "xyzzy".into()
src/source.rs: replace SourceFile::path -> &Utf8Path with &Default::default()
Expand Down
1 change: 1 addition & 0 deletions tests/cli/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ mod config;
mod error_value;
mod in_diff;
mod jobs;
mod shard;
mod trace;
#[cfg(windows)]
mod windows;
Expand Down
Loading

0 comments on commit 7171cd8

Please sign in to comment.