docs/Fuzzing.txt



                         Combining SymCC with a fuzzer


Programs instrumented with SymCC generate new test inputs on every run. This is
the core building block for program testing, but a full analysis requires
additional components: new test cases need to be checked for whether they
trigger vulnerabilities in the target program, we have to sort them by relevance
and feed them back to symbolic execution. These tasks are essentially the same
as in fuzzing, except that we use a smarter (yet more expensive) strategy to
generate new inputs. Here we show how to reuse an existing fuzzer for the
management tasks but additionally generate new inputs with SymCC.


                                     Setup


We use AFL, a popular gray-box fuzzer, in its parallel mode. See AFL's
documentation on parallel fuzzing for details on this mode - the basic idea is
that SymCC and the fuzzer periodically exchange new inputs. SymCC comes with a
helper that coordinates the collaboration with the fuzzer. It is written in
Rust, so rustc and cargo (the Rust package manager) have to be installed. On
Debian-based distributions, for example, a simple "apt install rustc cargo" is
all you need. Build the tool by executing the following command in the root of
SymCC's source repository:

$ cargo install --path util/symcc_fuzzing_helper

Afterwards, you should have a self-contained binary
~/.cargo/bin/symcc_fuzzing_helper. If you are interested in the tool's
internals, you can render documentation as follows:

$ cargo doc --manifest-path util/symcc_fuzzing_helper/Cargo.toml \
      --document-private-items --open

This is all on the SymCC side. Now just make sure that AFL is installed - we've
tested with version 2.56b.


                           Testing an example program


Suppose we wanted to search memory-related vulnerabilities in tcpdump's
link-layer parsers. The program can be instructed to read from a pcap and print
relevant headers like so:

$ tcpdump -e -r <pcap_file>

Compile tcpdump and libpcap, the library it uses for pcap reading, once with
SymCC and once with one of AFL's compiler wrappers (e.g., afl-clang). In order
to detect memory corruptions, enable address sanitizer in the AFL-instrumented
version by exporting AFL_USE_ASAN=1 before compiling:

$ git clone https://github.com/the-tcpdump-group/libpcap.git
$ git clone https://github.com/the-tcpdump-group/tcpdump.git

$ mkdir symcc_build; cd symcc_build
$ cp -r ../{libpcap,tcpdump} .
$ cd libpcap
$ CC=/path/to/symcc ./configure
$ make
$ cd ../tcpdump
$ CC=/path/to/symcc ./configure
$ make
$ cd ..

$ mkdir afl_build; cd afl_build
$ export AFL_USE_ASAN=1
$ cp -r ../{libpcap,tcpdump} .
$ cd libpcap
$ CC=/path/to/afl-clang ./configure
$ make
$ cd ../tcpdump
$ CC=/path/to/afl-clang ./configure
$ make
$ cd ..

Note that we need two copies of the source code because the projects build
in-tree. Also, it is important to place the source code directories next to each
other, so that tcpdump's build system can find and statically link the
previously built libpcap.

Create a corpus of dummy files somewhere (say, in a directory called "corpus");
for tcpdump, we just start with an empty corpus containing only a dummy file for
AFL:

$ mkdir corpus
$ echo A > corpus/dummy

Then launch one AFL master and one AFL secondary instance, both writing their
outputs to the arbitrarily named directory "afl_out":

$ afl-fuzz -M afl-master -i corpus -o afl_out -m none -- afl_build/tcpdump/tcpdump -e -r @@
$ afl-fuzz -S afl-secondary -i corpus -o afl_out -m none -- afl_build/tcpdump/tcpdump -e -r @@

For simplicity, we disable memory limits (with "-m none"); be sure to read AFL's
notes on address sanitizer to learn about the implications. Alternatively, you
can compile the target program without address sanitizer, in which case you
don't need to disable the memory limit.

Finally, we can run SymCC using the helper:

$ ~/.cargo/bin/symcc_fuzzing_helper -o afl_out -a afl-secondary -n symcc -- symcc_build/tcpdump/tcpdump -e -r @@

It will run SymCC on the most promising inputs generated by the secondary AFL
instance and feed any interesting results back to AFL. In AFL's status screen,
you should see the counter "imported" in the "path geometry" section increase
after a short time - this means that the fuzzer instances and SymCC are
exchanging inputs. Crashes will be stored in afl_out/*/crashes as usual.

It is possible to run SymCC with only an AFL master or only a secondary AFL
instance; see the AFL docs for the implications. Moreover, the number of fuzzer
and SymCC instances can be increased - just make sure that each has a unique
name.

Note that there are currently a few gotchas with the fuzzing helper:

1. It expects afl-showmap to be in the same directory as afl-fuzz (which is
   usually the case), and it finds that directory via your afl-fuzz command. If
   afl-fuzz is on your PATH (as we assumed in the example above), all is good
   and you can ignore this point. Otherwise, you need to either call afl-fuzz
   with an absolute path (e.g., /afl/afl-fuzz in the Docker image) or, if you
   use a relative path, start afl-fuzz from the same working directory as the
   fuzzing helper.

2. The helper needs to know how to call the AFL-instrumented version of the
   target, and it finds that information by scanning your afl-fuzz command. To
   this end, it _requires_ the double dash that we used in the example above to
   separate afl-fuzz options from the target command; if you omit it, you'll
   likely get errors from the helper when it tries to run afl-showmap.