haybale is a general-purpose symbolic execution engine written in Rust.
It operates on LLVM IR, which allows it to analyze programs written in C/C++,
Rust, Swift, or any other language which compiles to LLVM IR.
In this way, it may be compared to KLEE, as it has similar goals, except
that haybale is written in Rust and makes some different design decisions.
That said, haybale makes no claim of being at feature parity with KLEE.
A symbolic execution engine is a way of reasoning - rigorously and
mathematically - about the behavior of a function or program.
It can reason about all possible inputs to a function without literally
brute-forcing every single one.
For instance, a symbolic execution engine like haybale can answer questions
like:
- Are there any inputs to (some function) that cause it to return 0? What are they?
- Is it possible for this loop to execute exactly 17 times?
- Can this pointer ever be NULL?
Symbolic execution engines answer these questions by converting each variable in the program or function into a mathematical expression which depends on the function or program inputs. Then they use an SMT solver to answer questions about these expressions, such as the questions listed above.
haybale is on crates.io, so you can simply
add it as a dependency in your Cargo.toml, selecting the feature corresponding
to the LLVM version you want:
[dependencies]
haybale = { version = "0.7.2", features = ["llvm-14"] }Currently, the supported LLVM versions are llvm-9, llvm-10, llvm-11,
llvm-12, llvm-13, and llvm-14.
haybale depends (indirectly) on the LLVM and Boolector libraries.
- LLVM must be available on your system, in the version which matches the
selected feature. (For instance, if you select the
llvm-14feature, LLVM 14 must be available on your system.) For more details and instructions on installing LLVM and making sure Cargo can find it, see thellvm-sysREADME. - For Boolector you have two options:
- You can compile and install Boolector 3.2.1 on your system as a shared library. (Make sure you configure it as a shared library, e.g., using
./configure.sh --shared, and install it, usingmake install.) - Or, you can enable the
haybalefeaturevendor-boolector. With this option, Cargo will automatically download and build Boolector and statically link to it. E.g.,This option probably only works on Linux and macOS, and requires standard build tools to be available on your system -- e.g., for Debian-based distributions,[dependencies] haybale = { version = "0.7.2", features = ["llvm-14", "vendor-boolector"] }
build-essential,cmake,curl, andgit.
- You can compile and install Boolector 3.2.1 on your system as a shared library. (Make sure you configure it as a shared library, e.g., using
Since haybale operates on LLVM bitcode, you'll need some bitcode to get started.
If the program or function you want to analyze is written in C, you can generate
LLVM bitcode (*.bc files) with clang's -c and -emit-llvm flags:
clang -c -emit-llvm source.c -o source.bcFor debugging purposes, you may also want LLVM text-format (*.ll) files, which
you can generate with clang's -S and -emit-llvm flags:
clang -S -emit-llvm source.c -o source.llIf the program or function you want to analyze is written in Rust, you can likewise
use rustc's --emit=llvm-bc and --emit=llvm-ir flags.
Note that in order for haybale to print source-location information (e.g.,
source filename and line number) in error messages and backtraces, the LLVM
bitcode will need to include debuginfo.
You can ensure debuginfo is included by passing the -g flag to clang,
clang++, or rustc when generating bitcode.
A haybale Project contains all of the code currently being analyzed, which
may be one or more LLVM modules.
To get started, simply create a Project from a single bitcode file:
let project = Project::from_bc_path("/path/to/file.bc")?;For more ways to create Projects, including analyzing entire libraries, see
the Project documentation.
haybale currently includes two simple built-in analyses:
get_possible_return_values_of_func(), which describes all the possible
values a function could return for any input, and find_zero_of_func(),
which finds a set of inputs to a function such that it returns 0.
These analyses are provided both because they may be of some use themselves,
but also because they illustrate how to use haybale.
For an introductory example, let's suppose foo is the following C function:
int foo(int a, int b) {
if (a > b) {
return (a-1) * (b-1);
} else {
return (a + b) % 3 + 10;
}
}We can use find_zero_of_func() to find inputs such that foo will return 0:
match find_zero_of_func("foo", &project, Config::default(), None) {
Ok(None) => println!("foo can never return 0"),
Ok(Some(inputs)) => println!("Inputs for which foo returns 0: {:?}", inputs),
Err(e) => panic!("{}", e), // use the pretty Display impl for errors
}haybale can do much more than just describe possible function return values
and find function zeroes.
In this section, we'll walk through how we could find a zero of the function
foo above without using the built-in find_zero_of_func().
This will illustrate how to write a custom analysis using haybale.
All analyses will use an ExecutionManager to control the progress of the
symbolic execution.
In the code snippet below, we call symex_function() to create an
ExecutionManager which will analyze the function foo - it will start at
the top of the function, and end when the function returns. In between, it
will also analyze any functions called by foo, as necessary and depending
on the Config settings.
let mut em = symex_function("foo", &project, Config::<DefaultBackend>::default(), None);Here it was necessary to not only specify the default haybale
configuration, as we did when calling find_zero_of_func(), but also what
"backend" we want to use.
The DefaultBackend should be fine for most purposes.
The ExecutionManager acts like an Iterator over paths through the function foo.
Each path is one possible sequence of control-flow decisions (e.g., which direction
do we take at each if statement) leading to the function returning some value.
The function foo in this example has two paths, one following the "true" branch and
one following the "false" branch of the if.
Let's examine the first path through the function:
let result = em.next().expect("Expected at least one path");In the common case, result contains the function return value on this path,
as a Boolector BV (bitvector) wrapped in the ReturnValue enum.
Since we know that foo isn't a void-typed function (and won't throw an
exception or abort), we can simply unwrap the ReturnValue to get the BV:
let retval = match result {
Ok(ReturnValue::Return(r)) => r,
Ok(ReturnValue::ReturnVoid) => panic!("Function shouldn't return void"),
Ok(ReturnValue::Throw(_)) => panic!("Function shouldn't throw an exception"),
Ok(ReturnValue::Abort) => panic!("Function shouldn't panic or exit()"),
...result could also be an Err describing an Error which was encountered
while processing the path. In this case, we could just ignore the error and
keep calling next() to try to find paths which didn't have errors. Or we
could get information about the error like this:
...
Err(e) => panic!("{}", em.state().full_error_message_with_context(e)),
};This gets information about the error from the program State, which we'll
discuss next. But for the rest of this tutorial, we'll assume that we got the
Ok result, and at this point retval is a BV representing the function
return value on the first path.
For each path, the ExecutionManager provides not only the final result of
the path (either aReturnValue or an Error), but also the final program
State at the end of that path.
We can get immutable access to the State with state(), or mutable access
with mut_state().
let state = em.mut_state(); // the final program state along this pathTo test whether retval can be equal to 0 in this State, we can use
state.bvs_can_be_equal():
let zero = state.zero(32); // The 32-bit constant 0
if state.bvs_can_be_equal(&retval, &zero)? {
println!("retval can be 0!");
}If retval can be 0, let's find what values of the function parameters
would cause that.
First, we'll add a constraint to the State requiring that the return value
must be 0:
retval._eq(&zero).assert();and then we'll ask for solutions for each of the parameters, given this constraint:
// Get a possible solution for the first parameter.
// In this case, from looking at the text-format LLVM IR, we know the variable
// we're looking for is variable #0 in the function "foo".
let a = state.get_a_solution_for_irname(&String::from("foo"), Name::from(0))?
.expect("Expected there to be a solution")
.as_u64()
.expect("Expected solution to fit in 64 bits");
// Likewise the second parameter, which is variable #1 in "foo"
let b = state.get_a_solution_for_irname(&String::from("foo"), Name::from(1))?
.expect("Expected there to be a solution")
.as_u64()
.expect("Expected solution to fit in 64 bits");
println!("Parameter values for which foo returns 0: a = {}, b = {}", a, b);Alternately, we could also have gotten the parameter BVs from the ExecutionManager
like this:
let a_bv = em.param_bvs()[0].clone();
let b_bv = em.param_bvs()[1].clone();
let a = em.state().get_a_solution_for_bv(&a_bv)?
.expect("Expected there to be a solution")
.as_u64()
.expect("Expected solution to fit in 64 bits");
let b = em.state().get_a_solution_for_bv(&b_bv)?
.expect("Expected there to be a solution")
.as_u64()
.expect("Expected solution to fit in 64 bits");
println!("Parameter values for which foo returns 0: a = {}, b = {}", a, b);Full documentation for haybale can be found on docs.rs,
or of course you can generate local documentation with cargo doc --open.
Currently, the official crates.io releases of haybale (0.7.0 and later)
depend on Boolector 3.2.1 and LLVM 9, 10, 11, 12, 13, or 14, selected via feature
flags llvm-9 through llvm-14.
As of this writing, choosing an LLVM version has essentially no effect
on haybale's features or interface; the only difference is the ability to
analyze bitcode generated with newer LLVMs. (And the LLVM 10+ versions
can process AtomicRMW instructions; see
#12.)
For LLVM 8, you can try the llvm-8 branch of this repo. This branch is
unmaintained, and is approximately at feature parity with haybale 0.2.1.
It may work for your purposes; or you can update to LLVM 9 or later and the
latest haybale.
LLVM 7 and earlier are not supported.
haybale works on stable Rust, and requires Rust 1.45 or later.
haybale is built using the Rust llvm-ir crate and the Boolector SMT
solver (via the Rust boolector crate).
- Support for LLVM 14 via the
llvm-14feature - Fixed/improved support for a couple of intrinsics
- Support for LLVM 13 via the
llvm-13feature haybalenow requires Rust 1.45+ (previously 1.43 or 1.44)
- Support for LLVM 12 via the
llvm-12feature - New Cargo feature to vendor Boolector: automatically download, build, and
statically link Boolector as part of the
haybalebuild process. See the "Install" section of the README above. symex_function()now takes an additional argumentparams. You can use this argument to specify constraints for the function parameters, or even specify specific hardcoded values. Or, you can just passNoneand get the previoushaybalebehavior, treating all parameters as completely unconstrained.find_zero_of_func()andget_possible_return_values_of_func()likewise now take aparamsargument to specify constraints on function parameters.Statehas a new public fieldprojproviding access to theProject.- Function hooks no longer take a
Projectparameter explicitly. Instead, you can access theProjectthrough theprojfield of theStateobject. ExecutionManagerhas a new public method.func()which provides access to the toplevelFunction.Statehas a new public methodget_path_length(), also available as the toplevel functionget_path_length().- Updated
llvm-irdependency to 0.8.0, which results in minor breaking changes to parts ofhaybale's API, wherellvm-irtypes are exposed.
- Fix the build with Rust 1.51+ (#16).
(Minimum Rust version for
haybaleremains unchanged: 1.43+ for LLVM 9 or 10 users, or 1.44+ for LLVM 11 users.)
- Support for LLVM 11 via the
llvm-11feature get_possible_return_values_of_func()now handles void functions properly (#10)- Support LLVM
atomicrmwinstructions (only for LLVM 10+) (#12) - Support LLVM
freezeinstructions (which only exist in LLVM 10+) - Built-in support for a few more Rust standard-library functions related to panic handling
Statehas a new public methodget_bv_by_irname()- LLVM 11 users need Rust 1.44+, due to requirements of
llvm-ir. LLVM 9 or 10 users still need only Rust 1.43+.
- Both
StateandProjectnow have a methodsize_in_bits()which gets the size of anyTypein bits, accounting for theProject's pointer size and struct definitions. This is intended to replacestate.size()andstate.size_opaque_aware(), both of which are now deprecated and will be removed inhaybale0.7.0. Likewise,state.fp_size()was deprecated and renamed tostate.fp_size_in_bits().- Note: these deprecated methods were actually removed in 0.7.1.
haybalenow supports both LLVM 9 and LLVM 10, using the same branch and same crates.io releases. When usinghaybale, you must choose either thellvm-9or thellvm-10feature.- Updated
llvm-irdependency to 0.7.1 (from 0.6.0), which includes runtime and memory-usage performance improvements, particularly for large bitcode files. This also involves a few breaking changes to parts ofhaybale's API. haybalenow requires Rust 1.43+ (previously 1.40+) due to requirements ofllvm-ir0.7.1.
- Fix for issue #9 regarding zero-element arrays (which particularly may appear when analyzing Rust code)
- Built-in support for the
llvm.ctlzandllvm.cttzintrinsics
Compatibility:
haybalenow depends on LLVM 10 by default (up from LLVM 9). LLVM 9 is still supported on a separate branch; see "Compatibility" above.- Updated
boolectordependency to crate version 0.4.0, which requires Boolector version 3.2.1 (up from 3.1.0).
Renames which affect the public API:
- Rename
SimpleMemoryBackendtoDefaultBackendand make it default. RenameBtorBackendtoCellMemoryBackend, and thememorymodule tocell_memory. - Remove the
layoutmodule. Its functions are now available as methods onState. Also, many of these functions now returnu32instead ofusize.
32-bit targets and related changes:
- With
DefaultBackend,haybalenow supports LLVM bitcode which was compiled for 32-bit targets (previously only supported 64-bit targets). - The
new_uninitialized()andnew_zero_initialized()methods on thebackend::Memorytrait,simple_memory::Memory, andcell_memory::Memorynow take an additional parameter indicating the pointer size. Projecthas a new public methodpointer_size_bits().
Other:
- Built-in support for the
llvm.expectintrinsic, and built-in support for thellvm.bswapintrinsic with vector operands (previously only supported scalar operands) solver_utils::PossibleSolutionshas new constructorsempty(),exactly_one(), andexactly_two()(useful for testing), and also implementsFromIterator, allowing you to.collect()an iterator into it- Bugfix for the
{min,max}_possible_solution_for_bv_as_binary_str()functions in thesolver_utilsmodule
New features:
- Support LLVM
cmpxchginstructions - Support for instruction callbacks - see
Config.callbacks. This allows you to take arbitrary actions based on the instruction about to be processed.
Config:
Config.null_detectionhas been renamed toConfig.null_pointer_checking, and its type has been changed to allow for additional options.Config::new()now takes no parameters. It is now the same asConfig::default()except that it comes with no function hooks.
Other utility functions/methods:
- The
hook_utilsmodule now includes two new functionsmemset_bvandmemcpy_bv. layout::size_opaque_awarenow returns anOptionrather than panicking.- The
to_string_*methods onLocationare now public, rather than internal to the crate, allowing users more control over theStringrepresentation of aLocation.
Error handling:
Errorhas three new variantsUnreachableInstruction,FailedToResolveFunctionPointer, andHookReturnValueMismatch. All of these were previously reported asError::OtherError, but now have dedicated variants.Error::LoopBoundExceedednow also includes the value of the loop bound which was exceeded.
Other notes:
haybaleno longer selects features of thelogcrate. This allows downstream users to select these features or not, and in particular, allows users to enable debug logging in release builds.
- New option
Config.max_callstack_depthallows you to limit the callstack depth for an analysis - automatically ignoring calls of LLVM functions which would exceed that callstack depth. The default for this setting is no limit, matching the previous behavior ofhaybale. - New option
Config.max_memcpy_lengthallows you to limit the maximum size ofmemcpy,memset, andmemmoveoperations. The default for this setting is no limit, matching the previous behavior ofhaybale. - New method
FunctionHooks::add_default_hook()allows you to supply a "default hook" which will be used when no other definition or hook is found for a function call. If no default hook is provided, this will result in aFunctionNotFounderror, just as it did previously. - Performance improvements for analyzing calls of function pointers.
- Improved a few error messages.
- Fix some broken links in the README and docs. No functional changes.
Solver timeouts:
- New setting
Config.solver_query_timeoutcontrols the maximum amount of timehaybalewill spend on a single solver query before returningError::SolverError. This setting defaults to 300 seconds (5 minutes). The setting can also be disabled entirely, which results in the same behavior as previous versions ofhaybale(no time limit on solver queries).
Error handling:
- The errors returned by
ExecutionManager.next()are nowhaybale::Errors instead ofStrings, allowing callers to more easily handle different kinds of errors different ways. To get a string representation of theError,.to_string()gives the short description, whileState.full_error_message_with_context()gives the full description which previously was returned byExecutionManager.next(). The usage example in the README has been updated accordingly. - The toplevel function
find_zero_of_func()now returns aResult, with the error type beingString. - New setting
Config.squash_unsatscontrols whetherError::Unsats are silently squashed (the default behavior, and the behavior of previous versions ofhaybale), or returned to the user. For more details, see the docs on that setting.
Logging, error messages, backtraces, etc:
haybalenow prints source-location information (e.g., source filename and line number) in error messages and backtraces when it is available. Similarly, theHAYBALE_DUMP_PATHenvironment variable now has the optionsLLVM,SRC, andBOTH. For more details on all of this, seeConfig.print_source_info.- You can also now disable printing the LLVM module name along with LLVM
location info in error messages, backtraces, path dumps, and log messages.
For more details, see
Config.print_module_name. haybalewill now by default autodetect when C++ or Rust demangling is appropriate for theProject, unless a different setting is chosen inConfig.demangling.- Numeric constants representing
BVvalues in log messages,HAYBALE_DUMP_VARSdumps, etc are now all printed in hexadecimal (previously binary, or an inconsistent mix of binary and hexadecimal).
Function hooks and intrinsics:
- Built-in support for LLVM arithmetic-with-overflow intrinsics.
- Built-in support for LLVM saturating-arithmetic intrinsics.
- Built-in support for the
llvm.assumeintrinsic, with an associated settingConfig.trust_llvm_assumes. - Built-in support for the
llvm.bswapintrinsic with argument sizes 48 or 64 bits (previously only supported 16 or 32 bits). - Default hooks for a number of Rust standard-library functions which
always panic, such as
core::result::unwrap_failed(). - New module
hook_utilscontains the implementations ofmemsetandmemcpyused by the corresponding built-in hooks. These are now publically available for use in custom hooks for other functions.
Changes to data structures and traits:
- The
LocationandPathEntrystructs have been refactored to include source-location information when it is available, to be capable of indicating basic block terminators in addition to normal instructions, and to support some internal refactoring. - The
backend::BVtrait has a new required method,get_solver(), which returns aSolverRefof the appropriate type. (This is similar to the same method on thebackend::Memorytrait.) - Saturating-arithmetic methods (signed and unsigned addition and subtraction)
are now available on
backend::BV, with default implementations in terms of the other trait methods. That means that these come "for free" once the required trait methods are implemented. zero_extend_to_bits()andsign_extend_to_bits()are also now available as trait methods onbackend::BV, with default implementations in terms of the other trait methods. Previously they were private utility functions inhaybale.- Many other structures have had minor changes and improvements, including some small breaking changes.
Compatibility:
- Updated
boolectordependency to crate version 0.3.0, which requires Boolector version 3.1.0 (up from 3.0.0). - This version of
haybalenow requires Rust 1.40+, up from 1.36+ for previous versions ofhaybale.
- New
HAYBALE_DUMP_PATHandHAYBALE_DUMP_VARSenvironment-variable optionsHAYBALE_DUMP_PATH: if set to1, then on error,haybalewill print a description of the path to the error: every LLVM basic block touched from the top of the function until the error location, in order.HAYBALE_DUMP_VARS: if set to1, then on error,haybalewill print the latest value assigned to each variable in the function containing the error.
- New setting
Config.demanglingallows you to apply C++ or Rust demangling to function names in error messages and backtraces - Support hooking calls to inline assembly, with some limitations inherited
from
llvm-ir(see comments onFunctionHooks::add_inline_asm_hook()) - Built-in support for (the most common cases of) the
llvm.bswapintrinsic - Other tiny tweaks - e.g., downgrade one panic to a warning
- Support LLVM
extractvalueandinsertvalueinstructions - Support LLVM
invoke,resume, andlandingpadinstructions, and thus C++throw/catch. Also provide built-in hooks for some related C++ ABI functions such as__cxa_throw(). This support isn't perfect, particularly surrounding the matching of catch blocks to exceptions:haybalemay explore some additional paths which aren't actually valid. But all actually valid paths should be found and explored correctly. - Since functions can be called not only with the LLVM
callinstruction but also with the LLVMinvokeinstruction, function hooks now receive a&dyn IsCallobject which may represent either acallorinvokeinstruction. haybalenow uses LLVM 9 rather than LLVM 8. See the "Compatibility" section in the README.- Improvements for
Projects containing C++ and/or Rust code:- For the function-name arguments to
symex_function(),get_possible_return_values_of_func(),find_zero_of_func(), andProject::get_func_by_name(), you may now pass either the (mangled) function name as it appears in LLVM (as was supported previously), or the demangled function name. That is, you can pass in"foo::bar"rather than"_ZN3foo3barE". - Likewise, you may add function hooks based on the demangled name of
the hooked function. See
FunctionHooks::add_cpp_demangled()andFunctionHooks::add_rust_demangled(). - Also,
llvm-irversions 0.3.3 and later contain an important bugfix for parsing LLVM bitcode generated byrustc.haybale0.2.0 usesllvm-ir0.4.1.
- For the function-name arguments to
- The
ReturnValueenum now has additional optionsThrow, indicating an uncaught exception, andAbort, indicating a program abort (e.g. Rust panic, or call to Cexit()). - Relatedly,
haybalenow has built-in hooks for the Cexit()function and for Rust panics (and for a few more LLVM intrinsics). haybalealso now contains a built-ingeneric_stub_hookandabort_hookwhich you can supply as hooks for any functions which you want to ignore the implementation of, or which always abort, respectively. See docs on thefunction_hooksmodule.Config.initial_mem_watchpointsis now aHashMapinstead of aHashSetof pairs.
- Memory watchpoints: specify a range of memory addresses, and get
a log message for any memory operation which reads or writes any data in
that range. See
State::add_mem_watchpoint(). - Convenience methods on
Statefor constructing constant-valuedBVs (rather than having to use the corresponding methods onBVand passstate.solver):bv_from_i32(),bv_from_u32(),bv_from_i64(),bv_from_u64(),bv_from_bool(),zero(),one(), andones(). - Some internal code refactoring to prepare for 0.2.0 features
- New method
Project::get_inner_struct_type_from_named()which handles opaque struct types by searching the entireProjectfor a definition of the given struct - Support memory reads of size 1-7 bits (in particular, reads of LLVM
i1) - Performance optimization: during
Stateinitialization, global variables are now only allocated, and not initialized until first use (lazy initialization). This gives the SMT solver fewer memory writes to think about, and helps especially for largeProjects which may contain many global variables that won't actually be used in a given analysis. - Minor bugfixes and improved error messages
Changes to README text only; no functional changes.
Initial release!