Frontend Tutorial

This page is written in order to provide users the ability to construct fuzzers that can be executed or ensembled on DeepState test harnesses.

Introduction

DeepState supports writing frontends, which are standalone executors that wrap around fuzzers in order to help provision and execute fuzzing tests. This allows you to save time from manually building up an environment, compiling and instrumenting (if necessary), and any post-test decoding or extraction. Frontends are also integral for DeepState's ensemble mode, which allows us to fuzz a single test with a diverse ensemble of fuzzers for maximum performance.

When built, a frontend may work as so:

$ deepstate-my-fuzzer --compile_test my_harness.cpp --out_test_name my_deep_test
$ deepstate-my-fuzzer --seeds in --output_test_dir out ./my_deep_test

Writing a frontend for your favorite fuzzer is easy, as it relies on you simply extending upon the base frontend.DeepStateFrontend API available. In fact, the current frontends available for DeepState are all ~200 LOCs.

Writing an Example Frontend

Let's implement an example frontend for Google's honggfuzz fuzzer. honggfuzz is an incredibly powerful fuzzer that utilizes a feedback-driven strategy on top of initial corpora to maximize code coverage (see more here). Our final frontend can be found here.

To start, let's create and open a file called honggfuzz.py in the frontend directory:

$ touch bin/deepstate/frontend/honggfuzz.py
$ vim bin/deepstate/frontend/honggfuzz.py

Let's start by building out the executable in main:

def main():

  # Instantiates our fuzzer object, which inherits from `DeepStateFrontend`.
  fuzzer = Honggfuzz()

  # Parses user arguments. DeepState provides default support for a set of arguments for
  # the CLI, but the developer can extend that by defining their own `parse_args` call.
  fuzzer.parse_args()

  # Performs any specified sanity checks (pre_exec),
  fuzzer.run()
  return 0

if __name__ == "__main__":
  exit(main())

Yes, it's that simple! Of course, we will need to provide some definitions for the Honggfuzz subclass that we instantiated, but that is all that we need to provide for our entry point in order to achieve a basic level of functionality.

Let's take a look at the definitions we need to make the executor work in practice:

#!/usr/bin/env python3
from deepstate.frontend import DeepStateFrontend

class Honggfuzz(DeepStateFrontend):

  FUZZER = "honggfuzz"
  COMPILER = "hfuzz-clang++"

  @classmethod
  def parse_args(self):
    # TODO

  def compile(self):
    # TODO

  def pre_exec(self):
    # TODO

  @property
  def cmd(self):
    # TODO

  @property
  def stats(self):
    # TODO

  def post_exec(self):
    # TODO

At a base level, our DeepStateFrontend object already provides basic functionality for spawning a fuzzer process and also maintaining seed synchronization, all through run(). However, it is up to the developer to implement the rest of the fuzzer's functionality.

1. Initializing our Fuzzer Subclass

When initializing a new frontend, you must specify the binary executable name of the fuzzer and compiler that you may need in order to instrument binaries. Once we instantiate a new fuzzer object, the base frontend object will automatically find these executables from $PATH (or your own custom envvar).

class Honggfuzz(DeepStateFrontend)

  FUZZER = "honggfuzz"
  COMPILER = "hfuzz-clang++"

We also define a parse_args class method. DeepStateFrontend already defines several flags that most fuzzers use (i.e -i for input seeds directory), but can be readily extended in our frontend subclass.

For Honggfuzz, let's introduce some fuzzer-specific flags:

  @classmethod
  def parse_args(self):
    parser = argparse.ArgumentParser(description="Use Honggfuzz as a backend for DeepState")

    # Execution options
    parser.add_argument("--dictionary", type=str, help="Optional fuzzer dictionary for honggfuzz.")
    parser.add_argument("--iterations", type=int, help="Number of iterations to fuzz for.")
    parser.add_argument("--keep_output", action="store_true", help="Output fuzzing feedback during execution.")
    parser.add_argument("--clear_env", action="store_true", help="Clear envvars before execution.")
    parser.add_argument("--save_all", action="store_true", help="Save all test-cases prepended with timestamps.")
    parser.add_argument("--sanitizers", action="store_true", help="Enable sanitizers when fuzzing.")

    # Instrumentation options
    parser.add_argument("--no_inst", type=str, help="Black-box fuzzing with honggfuzz without compile-time instrumentation.")
    parser.add_argument("--persistent", action="store_true", help="Set persistent mode when fuzzing.")

    # Hardware-related features for branch counting/coverage, etc.
    parser.add_argument("--keep_aslr", action="store_true", help="Don't disable ASLR randomization during execution.")
    parser.add_argument("--perf_instr", action="store_true", help="Allow PERF_COUNT_HW_INSTRUCTIONS.")
    parser.add_argument("--perf_branch", action="store_true", help="Allow PERF_COUNT_BRANCH_INSTRUCTIONS.")

    # Misc. options
    parser.add_argument("--post_stats", action="store_true", help="Output post-fuzzing stats.")

    cls.parser = parser
    return super(Honggfuzz, cls).parse_args()

Once parsed, arguments are stored in self._ARGS.

2. Setting up the Environment

We now can implement pre_exec, which provides functionality for checking our parsed arguments, as well as implementing any environmental checks that are necessary before the fuzzer actually executes.

  def pre_exec(self):

    # base class performs internal checks
    super().pre_exec()
    args = self._ARGS

    if not args.no_inst:
      if not args.input_seeds:
        raise FrontendError("No -i/--input_seeds provided.")

      if not os.path.exists(args.input_seeds):
        os.mkdir(args.input_seeds)
        raise FrontendError("Seed path doesn't exist. Creating empty seed directory and exiting.")

      if len([name for name in os.listdir(args.input_seeds)]) == 0:
        raise FrontendError(f"No seeds present in directory {args.input_seeds}.")

For fuzzers that may rely on OS-level features (i.e perf, coredump patterns), this is also the method where a lot of these sanity checks can be done.

3. Providing an Interface for Compilation/Instrumentation

For fuzzers that don't work with black-box binaries, we can implement an interface for supporting test compilation.

  def compile(self)
    args = self._ARGS

    lib_path = "/usr/local/lib/libdeepstate_hfuzz.a"
    L.debug(f"Static library path: {lib_path}")

    if not os.path.isfile(lib_path):
      flags = ["-ldeepstate"]
    else:
      flags = ["-ldeepstate_hfuzz"]

    if args.compiler_args:
      flags += [arg for arg in args.compiler_args.split(" ")]

    compiler_args = ["-std=c++11", args.compile_test] + flags + \
                    ["-o", args.out_test_name + ".hfuzz"]
    super().compile(compiler_args)

We construct our compiler arguments, and pass it to the base DeepStateFrontend class, which executes a compiler process and generates an instrumented binary.

So with this method, we can now compile tests as so:

$ deepstate-my-fuzzer --compile_test MySimpleTest.cpp
$ deepstate-my-fuzzer --compile_test MyComplexTest.cpp --compiler_args="-lmylib -lsomelib"

4. Initializing the Fuzzer Command

After parsing and checking our arguments, we now need to map parsed argument values to the actual fuzzer command line flags. We do this in the cmd property method, which produces a dictionary that our runner method can utilize in order to create a viable command to spawn a fuzzer.

  @property
  def cmd(self):
    args = self._ARGS
    cmd_dict = {
      "--input": args.input_seeds,
      "--workspace": args.output_test_dir,
      "--timeout": str(args.timeout),
    }

    if args.dictionary:
      cmd_dict["--dict"] = args.dictionary
    if args.iterations:
      cmd_dict["--iterations"] = str(args.iterations)

    if args.persistent:
      cmd_dict["--persistent"] = None
    if args.no_inst:
      cmd_dict["--noinst"] = None
    if args.keep_output:
      cmd_dict["--keep_output"] = None
    if args.sanitizers:
      cmd_dict["--sanitizers"] = None
    if args.clear_env:
      cmd_dict["--clear_env"] = None
    if args.save_all:
      cmd_dict["--save_all"] = None
    if args.keep_aslr:
      cmd_dict["--linux_keep_aslr"] = None

    # TODO: autodetect hardware features
    if args.perf_instr:
      cmd_dict["--linux_perf_instr"] = None
    if args.perf_branch:
      cmd_dict["--linux_perf_branch"] = None

    cmd_dict.update({
      "--": args.binary,
      "--input_test_file": "___FILE___",
      "--abort_on_fail": None,
      "--no_fork": None
    })

    if args.which_test:
      cmd_dict["--input_which_test"] = args.which_test

    return cmd_dict

During each fuzzer run, we always ensure that we pass a specific set of DeepState arguments to each instrumented harness binary: ./bin --input_test_file __FILE__ --abort_on_fail --no_fork, where __FILE__ represents the symbol the fuzzer recognizes when parsing ARGV in order to perform file-based parsing (this changes depending on the fuzzer being implemented, many use @@).

5. Miscellaneous Functionality

With the above methods defined, we now have a fuzzer that can operate with minimal functionality. However, with our API, we can take it a step further and implement other fuzzing workflow-related funtionality, including seed synchronization reporting, parsing fuzzer stats, and post-processing!

stats - property method defining structure for fuzzer-produced runtime statistics

This property method returns fuzzer-related stats in a dict. Often times, this is done by parsing a generated stats file produced by the fuzzer:

  @property
  def stats(self):
    """
    Retrieves and parses the stats file produced by Honggfuzz
    """
    args = self._ARGS
    out_dir = os.path.abspath(args.output_test_dir) + "/"
    report_f = "HONGGFUZZ.REPORT.TXT"

    stat_file = out_dir + report_f
    with open(stat_file, "r") as sf:
      lines = sf.readlines()

    stats = {
      "mutationsPerRun": None,
      "externalCmd": None,
      "fuzzStdin": None,
      "timeout": None,
      "ignoreAddr": None,
      "ASLimit": None,
      "RSSLimit": None,
      "DATALimit": None,
      "wordlistFile": None,
      "fuzzTarget": None,
      "ORIG_FNAME": None,
      "FUZZ_FNAME": None,
      "PID": None,
      "SIGNAL": None,
      "FAULT ADDRESS": None,
      "INSTRUCTION": None,
      "STACK HASH": None,
    }

    # strip first 4 and last 5 lines to make a parseable file
    lines = lines[4:][:-5]

    for l in lines:
      for k in stats.keys():
        if k in l:
          stats[k] = l.split(":")[1].strip()

    # add crash metrics
    crashes = len([name for name in os.listdir(out_dir) if name != report_f])
    stats.update({
      "CRASHES": crashes
    })

    return stats

post_exec - implements any post-processing functionality (ie testcase decoding / de-duplication)

Since honggfuzz already implements a good amount of post-processing functionality, including crash ded-duplication, minimization and decoding, we can demonstrate how that would work with a fuzzer like Eclipser, which requires manual testcase decoding:

  def post_exec(self):
    """
    Decode and minimize testcases after fuzzing.
    """
    out = self._ARGS.output_test_dir

    L.info("Performing post-processing decoding on testcases and crashes")
    subprocess.call(["dotnet", self.fuzzer, "decode", "-i", out + "/testcase", "-o", out + "/decoded"])
    subprocess.call(["dotnet", self.fuzzer, "decode", "-i", out + "/crash", "-o", out + "/decoded"])
    for f in glob.glob(out + "/decoded/decoded_files/*"):
      shutil.copy(f, out)
    shutil.rmtree(out + "/decoded")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly