Skip to content

Commit

Permalink
More docs
Browse files Browse the repository at this point in the history
  • Loading branch information
sourcefrog committed Dec 4, 2022
1 parent 9b1821a commit 923b960
Show file tree
Hide file tree
Showing 10 changed files with 115 additions and 89 deletions.
2 changes: 2 additions & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,7 @@
- [Continuous integration](ci.md)
- [How it works](how-it-works.md)
- [Goals](goals.md)
- [Mutations vs coverage](vs-coverage.md)
- [Differences from fuzzing](vs-fuzzing.md)
- [Limitations](limitations.md)
- [How to help](how-to-help.md)
12 changes: 8 additions & 4 deletions book/src/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
# Getting started

Just run `cargo mutants` in a Rust source directory, and it will point out
functions that may be inadequately tested:
functions that may be inadequately tested.

## Example

```none
; cargo mutants
Found 14 mutants to test
Copy source to scratch directory ... 0 MB in 0.0s
Unmutated baseline ... ok in 1.6s build + 0.3s test
Auto-set test timeout to 20.0s
src/lib.rs:386: replace <impl Error for Error>::source -> Option<&(dyn std::error::Error + 'static)> with Default::default() ... NOT CAUGHT in 0.6s build + 0.3s test
src/lib.rs:485: replace copy_symlink -> Result<()> with Ok(Default::default()) ... NOT CAUGHT in 0.5s build + 0.3s test
src/lib.rs:386: replace <impl Error for Error>::source -> Option<&(dyn std::error::Error + 'static)>
with Default::default() ... NOT CAUGHT in 0.6s build + 0.3s test
src/lib.rs:485: replace copy_symlink -> Result<()> with Ok(Default::default()) ...
NOT CAUGHT in 0.5s build + 0.3s test
14 mutants tested in 0:08: 2 missed, 9 caught, 3 unviable
```

In v0.5.1 of the `cp_r` crate, the `copy_symlink` function was reached by a test
but not adequately tested.
but not adequately tested, and the `Error::source` function was not tested at all.
5 changes: 3 additions & 2 deletions book/src/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,11 @@ The basic approach is:

- For each mutation:
- Apply the mutation to the scratch tree by patching the affected file.
- Run `cargo build`: if this fails, the mutant is unviable, and that's ok.
- Run `cargo test` in the tree, saving output to a log file.
- If the build fails or the tests fail, that's good: the mutation was somehow
- If the the tests fail, that's good: the mutation was somehow
caught.
- If the build and tests succeed, that might mean test coverage was
- If the tests succeed, that might mean test coverage was
inadequate, or it might mean we accidentally generated a no-op mutation.
- Revert the mutation to return the tree to its clean state.

Expand Down
37 changes: 33 additions & 4 deletions book/src/limitations.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,44 @@
# Limitations, caveats, known bugs, and future enhancements

## Cases where cargo-mutants _can't_ help

cargo-mutants can only help if the test suite is hermetic: if the tests are
flaky or non-deterministic, or depend on external state, it will draw the wrong
conclusions about whether the tests caught a bug.

If you rely on testing the program's behavior by manual testing, or by an
integration test not run by `cargo test`, then cargo-mutants can't know this,
and will only tell you about gaps in the in-tree tests. It may still be helpful
to run mutation tests on only some selected modules that do have in-tree tests.

Running cargo-mutants on your code won't, by itself, make your code better. It
only helps suggest places you might want to improve your tests, and that might
indirectly find bugs, or prevent future bugs. Sometimes the results will point
out real current bugs. But it's on you to follow up. (However, it's really easy
to run, so you might as well look!)

cargo-mutants typically can't do much to help with crates that primarily
generate code using macros or build scripts, because it can't "see" the code
that's generated. (You can still run it, but it's may generate very few
mutants.)

## Stability

cargo-mutants behavior, output formats, command-line syntax, json output
formats, etc, may change from one release to the next.

cargo-mutants sees the AST of the tree but doesn't fully "understand" the types.
## Limitations and known bugs

cargo-mutants currently only supports mutation testing of Rust code that builds
using `cargo` and where the tests are run using `cargo test`. Support for other tools such as Bazel or Nextest could in principle be added.

cargo-mutants sees the AST of the tree but doesn't fully "understand" the types, so sometimes generates unviable mutants or misses some opportunities to generate interesting mutants.

cargo-mutants reads `CARGO_ENCODED_RUSTFLAGS` and `RUSTFLAGS` environment variables, and sets `CARGO_ENCODED_RUSTFLAGS`. It does not read `.cargo/config.toml` files, and so any rust flags set there will be ignored.

cargo-mutants does not yet understand platform-specific conditional compilation,
such as `#[cfg(target_os = "linux")]`. It will report functions for other
platforms as missed, when it should know to skip them.
cargo-mutants does not yet understand conditional compilation, such as
`#[cfg(target_os = "linux")]`. It will report functions for other platforms as
missed, when it should know to skip them.

## Caution on side effects

Expand Down
9 changes: 7 additions & 2 deletions book/src/parallelism.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@ The `--jobs` or `-j` option allows to test multiple mutants in parallel, by spaw

It's common that for some periods of its execution, a single Cargo build or test job can't use all the available CPU cores. Running multiple jobs in parallel makes use of resources that would otherwise be idle.

However, running many jobs simultaneously may also put high demands on the system's RAM (by running more compile/link/test tasks simultaneously), IO bandwidth, and cooling (by fully using all cores).
However, running many jobs simultaneously may also put high demands on the
system's RAM (by running more compile/link/test tasks simultaneously), IO
bandwidth, and cooling (by fully using all cores).

The best setting will depend on many factors including the behavior of your program's test suite, the amount of memory on your system, and your system's behavior under high thermal load.

The default is currently to run only one job at a time. Setting this higher than the number of CPU cores is unlikely to be helpful.

`-j 4` may be a good starting point, even if you have many more CPUs. Start there and watch memory and CPU usage, and tune towards a setting where all cores are always utilized without memory usage going too high, and without thermal issues.
`-j 4` may be a good starting point, even if you have many more CPU cores. Start
there and watch memory and CPU usage, and tune towards a setting where all cores
are always utilized without memory usage going too high, and without thermal
issues.

Because tests may be slower with high parallelism, you may see some spurious timeouts, and you may need to set `--timeout` manually to allow enough safety margin.
18 changes: 8 additions & 10 deletions book/src/timeouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,16 @@ file](filter_mutants.md).

## Timeouts

To avoid hangs, cargo-mutants will kill the test suite after a timeout.
To avoid hangs, cargo-mutants will kill the test suite after a timeout and
continue to the next mutant.

cargo-mutants measures the time to run the test suite in the unmodified tree.
By default, the timeout is set automatically. cargo-mutants measures the time to
run the test suite in the unmodified tree, and then sets a timeout for mutated
tests at 5x the time to run tests with no mutations, and a minimum of 20
seconds.

`cargo-mutants` then automatically sets a timeout when running tests with
mutations applied, and reports mutations that hit a timeout. The automatic
timeout is the greater of 20 seconds, or 5x the time to run tests with no
mutations.

The `CARGO_MUTANTS_MINIMUM_TEST_TIMEOUT` environment variable, measured in
seconds, sets a minimum timeout, but allows it to be larger if the unmodified
test suite takes a long time to run.
The minimum of 20 seconds can be overriden by the
`CARGO_MUTANTS_MINIMUM_TEST_TIMEOUT` environment variable, measured in seconds.

You can also set an explicit timeout with the `--timeout` option, also measured
in seconds. If this option is specified then the timeout is also applied to the
Expand Down
14 changes: 11 additions & 3 deletions book/src/using-results.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ to do about them is up to you, bearing in mind your goals and priorities for
your project, but here are some suggestions:

First, look at the overall list of missed mutants: there might be patterns such
as a cluster of related functions all having missed mutants.
as a cluster of related functions all having missed mutants. Probably some will
stand out as potentially more important to the correct function of your program.

You should first look for any mutations where it's very _surprising_ that they
were not caught by any tests, given what you know about the codebase. For
Expand All @@ -56,7 +57,8 @@ break if the private function was buggy, and then add a test for that.

Try to avoid writing tests that are too tightly targeted to the mutant, which is
really just an _example_ of something that could be wrong, and instead write
tests that assert the _correct_ behavior.
tests that assert the _correct_ behavior at the right level of abstraction,
preferably through a public interface.

If it's not clear why the tests aren't already failing, it may help to manually
inject the same mutation into your working tree and then run the tests under a
Expand All @@ -66,13 +68,19 @@ made.)

You may notice some messages about missed mutants in functions that you feel are
not very important to test, such as `Debug` implementations. You can use the
`--re` and `--exclude-re` options to filter out these mutants, or mark them as
`--exclude-re` options to filter out these mutants, or mark them as
skipped with `#[mutants::skip]`. (Or, you might decide that you do want to add
unit tests for the `Debug` representation, but perhaps as a lower priority than
investigating mutants in more important code.)

In some cases cargo-mutants will generate a mutant that is effectively the same as the original code, and so not really incorrect. cargo-mutants tries to avoid doing this, but if it does happen then you can mark the function as skipped.

## Iterating on mutant coverage

After you've changed your program to address some of the missed mutants, you can
run `cargo mutants` again with the [`--file` option](skip_files.md) to re-test
only functions from the changed files.

## Hard-to-test cases

Some functions don't cause a test suite failure if emptied, but also cannot be
Expand Down
27 changes: 27 additions & 0 deletions book/src/vs-coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# How is mutation testing different to coverage measurement?

Coverage measurements tell you which lines of code (or other units) are reached
while running a test. They don't tell you whether the test really _checks_
anything about the behavior of the code.

For example, a function that writes a file and returns a `Result` might be
covered by a test that checks the return value, but not by a test that checks
that the file was actually written. cargo-mutants will try mutating the function
to simply return `Ok(())` and report that this was not caught by any tests.

Historically, rust coverage measurements have required manual setup of several
OS and toolchain-dependent tools, although this is improving. Because
`cargo-mutants` just runs `cargo` it has no OS-specific or tight toolchain
integrations, and so is simple to install and run on any Rust source tree.
cargo-mutants also needs no special tools to view or interpret the results.

Coverage tools also in some cases produce output that is hard to interpret, with
lines sometimes shown as covered or not due to toolchain quirks that aren't easy
to map to direct changes to the test suite. cargo-mutants produces a direct list
of changes that are not caught by the test suite, which can be quickly reviewed
and prioritized.

One drawback of mutation testing is that it runs the whole test suite once per
generated mutant, so it can be slow on large trees with slow test suites. There
are [some techniques to speed up cargo-mutants](performance.md), including
[running multiple tests in parallel](parallel.md).
13 changes: 13 additions & 0 deletions book/src/vs-fuzzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# How is mutation testing different to fuzzing?

Fuzzing is a technique for finding bugs by feeding pseudo-random inputs to a
program, and is particularly useful on programs that parse complex or untrusted
inputs such as binary file formats or network protocols.

Mutation testing makes algorithmically-generated changes to a copy of the
program source, and measures whether the test suite catches the change.

The two techniques are complementary. Although some bugs might be found by
either technique, fuzzing will tend to find bugs that are triggered by complex
or unusual inputs, whereas mutation testing will tend to point out logic that
might be correct but that's not tested.
67 changes: 3 additions & 64 deletions book/src/welcome.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,68 +9,7 @@ code coverage by your tests, where a bug might be lurking.
to tell you something _interesting_ about areas where bugs might be lurking or
the tests might be insufficient.** ([More about these goals](goals.md).)

## How is mutation testing different to coverage measurement?
To get started:

Coverage measurements tell you which lines of code (or other units) are reached
while running a test. They don't tell you whether the test really _checks_
anything about the behavior of the code.

For example, a function that writes a file and returns a `Result` might be
covered by a test that checks the return value, but not by a test that checks
that the file was actually written. cargo-mutants will try mutating the function
to simply return `Ok(())` and report that this was not caught by any tests.

Historically, rust coverage measurements have required manual setup of several
OS and toolchain-dependent tools, although this is improving. `cargo-mutants` is
simple to install and run on any Rust source tree and requires no special
toolchain integrations, and no special tools to interpret the results.

Coverage tools also in some cases produce output that is hard to interpret, with
lines sometimes shown as covered or not due to toolchain quirks that aren't easy
to map to direct changes to the test suite. cargo-mutants produces a direct list
of changes that are not caught by the test suite, which can be quickly reviewed
and prioritized.

One drawback of mutation testing is that it runs the whole test suite once per
generated mutant, so it can be slow on large trees. There are [some techniques
to speed up cargo-mutants](performance.md), including [running multiple tests in
parallel](parallel.md).

## How is mutation testing different to fuzzing?

Fuzzing is a technique for finding bugs by feeding pseudo-random inputs to a
program, and is particularly useful on programs that parse complex or untrusted
inputs such as binary file formats or network protocols.

Mutation testing makes algorithmically-generated changes to a copy of the
program source, and measures whether the test suite catches the change.

The two techniques are complementary. Although some bugs might be found by
either technique, fuzzing will tend to find bugs that are triggered by complex
or unusual inputs, whereas mutation testing will tend to point out logic that
might be correct but that's not tested.

## Cases where cargo-mutants _can't_ help

cargo-mutants currently only supports mutation testing of Rust code that builds
using `cargo` and where the tests are run using `cargo test`.

cargo-mutants can only help if the test suite is hermetic: if the tests are
flaky or non-deterministic, or depend on external state, it will draw the wrong
conclusions about whether the tests caught a bug.

If you rely on testing the program's behavior by manual testing, or by an
integration test not run by `cargo test`, then cargo-mutants can't know this,
and will only tell you about gaps in the in-tree tests. It may still be helpful
to run mutation tests on only some selected modules that do have in-tree tests.

Running cargo-mutants on your code won't, by itself, make your code better. It
only helps suggest places you might want to improve your tests, and that might
indirectly find bugs, or prevent future bugs. Sometimes the results will point
out real current bugs. But it's on you to follow up. (However, it's really easy
to run, so you might as well look!)

cargo-mutants typically can't do much to help with crates that primarily
generate code using macros or build scripts, because it can't "see" the code
that's generated. (You can still run it, but it's may generate very few
mutants.)
1. [Install cargo-mutants](install.md).
2. [Run `cargo mutants](getting-started.md) in your Rust source tree.

0 comments on commit 923b960

Please sign in to comment.