-
Notifications
You must be signed in to change notification settings - Fork 142
Writing a Data Flow Analysis
Why can't I write a data-flow analysis on-the-fly? Writing a data-flow analysis is a challenging task and can be tough. Therefore, you should be familiar with the underlying theory in order to be able to develop a proper data-flow analysis. We compiled a (incomplete) list of literature for you which you may consult before writing an analysis. Please have a look at Useful Literature.
PhASAR provides several sophisticated mechanisms that allow you to write your own data-flow analysis:
- IFDS
- IDE
- Inter-procedural Monotone Framework (with Call Strings)
- Intra-procedural Monotone Framework
Which analysis framework to use depends heavily on your analysis problem:
- your analysis is a plain reachability problem, that is, a data-flow fact can either hold, or not.
- your analysis problem is distributive, that is, within a flow function the reachability of a successor fact may depend on at most one incoming fact.
- your analysis problem is a may-analysis, since IFDS always uses set-union as merge operator
All bit-vector problems fall into this category, including all classical gen-kill problems. Examples are taint analysis, uninitialized-variables analysis, constness analysis, etc.
- your analysis computes environments that associate the holding data-flow facts with an additional computed value
- OR the set of holding data-flow facts is structured, i.e. some facts subsume other facts
- OR your analysis is a must-analysis, since in contrast to IFDS the merge operator is customizable
- your analysis problem is distributive, that is, within a flow function the reachability a of a successor fact as well as its associated value may depend on at most one incoming fact/value.
Examples are linear constant-propagation, typestate analysis, type analysis, feature-taint analysis, etc.
- your analysis problem is non-distributive and you cannot modify it to make it distributive.
- Literally, there is no other reason to use it.
- Use IFDS/IDE instead
An example is full-constant propagation.
- your analysis is very simple and does not need to take function-calls into account.
Examples are live-variables analysis, reaching-definitions analysis, or other simple analyses you may find inside compilers
Note: We strongly recommend to use IFDS or IDE as they are actively maintained and improved.
PhASAR is designed in a way that it provides an analysis-problem interface for each of the above mentioned algorithmic frameworks. In order to implement an analysis within a specific framework, just provide an implementation of the corresponding interface.
Analysis solving in PhASAR consists of several steps:
- Load the LLVM IR of the target program to analyze and compute the helper analyses.
- In general use the
HelperAnalyses
class for this - Alternatively, you may also want to instantiate
LLVMProjectIRDB
, someTypeHierarchy
, someAliasInfo
and someICFG
manually
- In general use the
- Create an instance of your analysis problem
- Create an instance of the corresponding analysis solver, e.g.
IFDSSolver
,IDESolver
- Use the solver to solve the problem in a fully automated manner
- Process the analysis' results
In general, all data-flow analyses are performed on a control-flow graph. When writing an analysis, you have to choose a control-flow graph for your analysis. For that reason, all of our solvers work either on the CFG (intra-procedural control-flow graph) or on the ICFG (inter-procedural control-flow graph) interface.
For instance, when writing a simple intra-procedural data-flow analysis using the monotone framework, use one of CFG
's concrete implementations or provide your own implementation for that interface.
The pre-implemented LLVMBasedCFG should usually do the job and can be used out-of-the-box.
The inter-procedural call-strings approach and the IFDS/IDE frameworks solve a concrete analysis based on an inter-procedural control-flow graph. Depending on the analysis's needs, a forward- or backward ICFG graph may be used. For instance, for a backward analysis, use the LLVMBasedBackwardICFG. However, most of the time the LLVMBasedICFG should work just fine.
In the following section some useful coding shortcuts are presented which may come in handy when writing a new analysis within PhASAR.
PhASAR tightly integrates with LLVM. Therefore PhASAR internally makes heavy use of LLVM's Support library that provides a lot of useful utilities, most of them performing significantly better than the ones from the STL.
Hence, as a user of PhASAR you automatically have access to the LLVMSupport library as well.
So, you definitely want to take a look at the LLVM headers at llvm/ADT/
and llvm/Support
.
For more information, check out the LLVM Programmer's Manual.
Still for most detailed information, you may want to consult the LLVM source code.
One very useful file to look at is STLExtras.h.
In addition, when writing an analysis within PhASAR, you will very likely have to inspect LLVM IR based on LLVM's API. It may be useful to check whether a utility that you need is already implemented in ValueTracking.h.
When defining flow functions in IFDS/IDE, oftentimes a certain flow function type occurs more than once.
For instance, the Kill
flow function that kills a specific data-flow fact is often needed many times.
An analysis writer, you can find several useful pre-defined flow functions and edge functions that can be directly used.
A very useful example of such a recurring flow function are the identity
, generate
and kill
flow functions.
Some of the flow functions are also defined as a singleton in order to keep their overhead as small as possible.
This is always possible if the flow function has no state.
We also provide some LLVM specific flow functions that you can find here.
Here is a small example that makes use of the pre-defined flow functions:
auto getNormalFlowFunction(...) -> FlowFunctionPtrType {
// check the type of the instruction
if(...) {
// do some work
}
else if(...) {
// unconditionally generate some new data-flow fact
return generateFromZero(/*some fact to be generated*/);
}
else if (...) {
// kill some data-flow fact
return killFlow(/* some fact to be killed */);
}
// just treat everything else as Identity
return identityFlow();
}
When overriding implementing an interface in order to describe a problem within the monotone framework, you oftentimes require set operations like set union, set intersect or set difference.
Writing these functions yourself is tedious.
Therefore it makes much sense to use the existing set operation functions which are defined in the <algorithm>
header file such as:
...
std::includes /* subset */
std::set_difference
std::set_intersection
std::set_union
...
The neat thing about these functions is that they are completely generic as they operate on iterators and provide several useful overloads. Thus, they work on all container types that follow the concept's required by these functions. You may use them as follows:
std::set<int> a = {1, 2, 3};
std::set<int> b = {6, 4, 3, 2, 8, 1};
std::set<int> c(a);
c.insert(b.begin(), b.end());
bool isSubSet = std::includes(b.begin(), b.end(), a.begin(), a.end());
Note: When using the monotone framework, your also may want to checkout our BitVectorSet class that features several utilities that we have found useful for monotone-framework analyses.
- Home
- Reference Material
- Getting Started:
- Building PhASAR
- Using PhASAR with Docker
- A few uses of PhASAR
- Coding Conventions
- Contributing to PhASAR
- Errors and bug reporting
- Update to Newer LLVM Versions
- OS Support