The overall workflow of fuzz-introspector can be visualised as follows:
The two main parts to fuzz-introspector is the compilation-based static analysis and post-processing. Fuzz-introspector has capabilities to integrate data from runtime coverage collection, and in order to use this features fuzz-introspector relies on external infrastructure to extract this coverage. Focus is particularly on using OSS-Fuzz for this, although fuzz-introspector can be integrated into other workflows.
The code for this is located in frontends/llvm
The compiler-based static analysis is responsible for collecting data about the code under analysis. The analysis is done by way of an LLVM link-time optimisations (LTO) pass, which is a convenient tool for making program-wide analysis. As such, in order to use fuzz-introspector your code must be able to compile by way of LTO.
The LLVM pass is set to run only when fuzzer executables are linked. However, at that stage the code in the executables will be compiled with LTO, which enables the pass to operate on the code of the full program.
To use the fuzz-introspector pass simply compile a given project and the fuzzers
using lto and the gold linker (-flto
and -fuse-ld=gold
) clang flags, and then
also set the FUZZ_INTROSPECTOR
environment variable during the compilation and
linking process. However, we currently rely on adding the LTO pass into the clang
build pass pipeline by patching clang, and consequentially you need to use a custom Clang
for using fuzz-introspector, see Custom clang.
Fuzz-introspector relies on a custom clang in order to run. The reason for this is that LTO passes loaded from dynamic analysis does not work properly in LLVM (see this LLVM issue), and thus we need to patch clang in order to instantiate our pass. As such, it's a minor patch only involving a few lines of code. We have an issue open here for discussions on this.
The important part of the patch is that we change these LLVM lines from
ModulePassManager
PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
ModuleSummaryIndex *ExportSummary) {
ModulePassManager MPM;
// Convert @llvm.global.annotations to !annotation metadata.
MPM.addPass(Annotation2MetadataPass());
for (auto &C : FullLinkTimeOptimizationEarlyEPCallbacks)
C(MPM, Level);
// Create a function that performs CFI checks for cross-DSO calls with targets
// in the current module.
MPM.addPass(CrossDSOCFIPass());
to:
ModulePassManager
PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
ModuleSummaryIndex *ExportSummary) {
ModulePassManager MPM;
// Convert @llvm.global.annotations to !annotation metadata.
MPM.addPass(Annotation2MetadataPass());
for (auto &C : FullLinkTimeOptimizationEarlyEPCallbacks)
C(MPM, Level);
// Create a function that performs CFI checks for cross-DSO calls with targets
// in the current module.
MPM.addPass(CrossDSOCFIPass());
MPM.addPass(FuzzIntrospectorPass());
i.e. we only add MPM.addPass(FuzzIntrospectorPass());
to the LTO pass builder pipeline. All
of the LLVM patches are given in patch-llvm.sh
The output of the LLVM pass is composed of two files for each fuzzer executable analysed:
- fuzzerLogFile-X-YYYYY.data
- fuzzerLogFile-X-YYYYY.data.yaml
where X
is a counter and YYYYY
is a uid.
fuzzerLogFile-X-YYYYY.data
Holds the calltree of a given fuzzer and the content of it looks as follows:
Call tree
LLVMFuzzerTestOneInput /src/htslib/test/fuzz/hts_open_fuzzer.c linenumber=-1
abort / linenumber=123
hopen /src/htslib/hfile.c linenumber=127
find_scheme_handler /src/htslib/hfile.c linenumber=1259
isalnum_c /src/htslib/./textutils_internal.h linenumber=1125
__ctype_b_loc / linenumber=161
tolower_c /src/htslib/./textutils_internal.h linenumber=1126
__ctype_tolower_loc / linenumber=171
pthread_mutex_lock / linenumber=1134
load_hfile_plugins /src/htslib/hfile.c linenumber=1135
kh_init_scheme_string /src/htslib/hfile.c linenumber=1057
calloc / linenumber=915
hfile_add_scheme_handler /src/htslib/hfile.c linenumber=1061
...
...
Each row holds three items that are data about a given call site:
- The name of the destination function.
- The file in which the destination function is placed.
- The line in the source code in which the call site is.
Following the above example corresponds to the code here
in file hts_open_fuzzer.c
:
119 int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
120 hFILE *memfile;
121 uint8_t *copy = malloc(size);
122 if (copy == NULL) {
123 abort();
124 }
125 memcpy(copy, data, size);
126 // hopen does not take ownership of `copy`, but hts_hopen does.
127 memfile = hopen("mem:", "rb:", copy, size);
128 if (memfile == NULL) {
129 free(copy);
130 return 0;
131 }
The LLVMFuzzerTestOneInput /src/htslib/test/fuzz/hts_open_fuzzer.c linenumber=-1
is the
entrypoint of the fuzzer. The three data items are:
- destination function is
LLVMFuzzerTestOneInput
- file in which the destination function resides is
/src/htslib/test/fuzz/hts_open_fuzzer.c
because it corresponds to the location on disk - linenumber is -1 because there is no call site to LLVMFuzzerTestOneInput as this is called by the fuzz engine, e.g. libFuzzer.
fuzzerLogFile-X-YYYYY.data.yaml
Holds information about each function in the module that was compiled by the LTO pass. The following is example contents extracted from the above example:
---
Fuzzer filename: '/src/htslib/test/fuzz/hts_open_fuzzer.c'
All functions:
Function list name: All functions
Elements:
- functionName: hts_close_or_abort
functionSourceFile: '/src/htslib/test/fuzz/hts_open_fuzzer.c'
linkageType: InternalLinkage
functionLinenumber: 40
functionDepth: 18
returnType: 'N/A'
argCount: 1
argTypes:
- 'struct.htsFile *'
constantsTouched: []
argNames:
- ''
BBCount: 5
ICount: 23
EdgeCount: 5
CyclomaticComplexity: 2
functionsReached:
- BZ2_bzBuffToBuffCompress
- BZ2_bzBuffToBuffDecompress
- RC_Decode
...
...
- zlib_mem_deflate
- zlib_mem_inflate
functionUses: 4
- functionName: abort
functionSourceFile: '/'
linkageType: externalLinkage
functionLinenumber: -1
functionDepth: 0
returnType: 'N/A'
This is a very verbose file because it enables the post-processing logic to implement the analysis parts. In this sense the high-level goal for the compilation-based static analysis is to extract large volumes of data and then use the post-processing for analysis.
The code for this is located in src/fuzz_introspector
The post-processing logic is responsible for digesting and analysing data from the compilation-based analysis and also runtime coverage data. The architectecture goal of the post-processing is to be modular to make analysis plugin writing easy. For this reason the code is also written in Python to enable rapid prototyping.
The post-processing takes as input the files output from the compilation-based analysis
(.data
and .data.yaml
) as well as coverage reports generated by llvm-cov.
It then digests and performs analysis on all of this, including analysis on a fuzzer-specific
level as well as merging all data into a project-wide analysis.
The post-processing part has some core analysis in the src/fuzz_introspector and
also plugin-like analyses in src/fuzz_introspector/analyses.
The primary output of the post-processing logic is an HTML report that can be interpreted by humans. However, there is currently development taking place in extracting data that is useful by fuzzers to improve the fuzzing, e.g. the analysis plugin engine_input.py
The code is in development mode and things can change somewhat rapidly. We try to keep the documentation up to date, but may miss certain areas. If there are questions about current status quo please feel free to submit a Github issue with the question.