Writing a Data Flow Analysis

Before Writing a Static Analysis

Why can't I write a data-flow analysis on-the-fly? Writing a data-flow analysis is a challenging task and can be tough. Therefore, you should be familiar with the underlying theory in order to be able to develop a proper data-flow analysis. We compiled a (incomplete) list of literature for you which you may consult before writing an analysis. Please have a look at Useful Literature.

Choosing an Analysis Framework

PhASAR provides several sophisticated mechanisms that allow you to write your own data-flow analysis:

IFDS
IDE
Inter-procedural Monotone Framework (with Call Strings)
Intra-procedural Monotone Framework

Which analysis framework to use depends heavily on your analysis problem:

Use IFDS, if

your analysis is a plain reachability problem, that is, a data-flow fact can either hold, or not.
your analysis problem is distributive, that is, within a flow function the reachability of a successor fact may depend on at most one incoming fact.
your analysis problem is a may-analysis, since IFDS always uses set-union as merge operator

All bit-vector problems fall into this category, including all classical gen-kill problems. Examples are taint analysis, uninitialized-variables analysis, constness analysis, etc.

Use IDE, if

your analysis computes environments that associate the holding data-flow facts with an additional computed value
OR the set of holding data-flow facts is structured, i.e. some facts subsume other facts
OR your analysis is a must-analysis, since in contrast to IFDS the merge operator is customizable
your analysis problem is distributive, that is, within a flow function the reachability a of a successor fact as well as its associated value may depend on at most one incoming fact/value.

Examples are linear constant-propagation, typestate analysis, type analysis, feature-taint analysis, etc.

Use Inter-Mono, if

your analysis problem is non-distributive and you cannot modify it to make it distributive.
Literally, there is no other reason to use it.
Use IFDS/IDE instead

An example is full-constant propagation.

Use Intra-Mono, if

your analysis is very simple and does not need to take function-calls into account.

Examples are live-variables analysis, reaching-definitions analysis, or other simple analyses you may find inside compilers

Note: We strongly recommend to use IFDS or IDE as they are actively maintained and improved.

Writing a Static Analysis

PhASAR is designed in a way that it provides an analysis-problem interface for each of the above mentioned algorithmic frameworks. In order to implement an analysis within a specific framework, just provide an implementation of the corresponding interface.

Analysis solving in PhASAR consists of several steps:

Load the LLVM IR of the target program to analyze and compute the helper analyses.
- In general use the HelperAnalyses class for this
- Alternatively, you may also want to instantiate LLVMProjectIRDB, some TypeHierarchy, some AliasInfo and some ICFG manually
Create an instance of your analysis problem
Create an instance of the corresponding analysis solver, e.g. IFDSSolver, IDESolver
Use the solver to solve the problem in a fully automated manner
Process the analysis' results

Choosing a Control-Flow Graph

In general, all data-flow analyses are performed on a control-flow graph. When writing an analysis, you have to choose a control-flow graph for your analysis. For that reason, all of our solvers work either on the CFG (intra-procedural control-flow graph) or on the ICFG (inter-procedural control-flow graph) interface.

For instance, when writing a simple intra-procedural data-flow analysis using the monotone framework, use one of CFG's concrete implementations or provide your own implementation for that interface. The pre-implemented LLVMBasedCFG should usually do the job and can be used out-of-the-box.

The inter-procedural call-strings approach and the IFDS/IDE frameworks solve a concrete analysis based on an inter-procedural control-flow graph. Depending on the analysis's needs, a forward- or backward ICFG graph may be used. For instance, for a backward analysis, use the LLVMBasedBackwardICFG. However, most of the time the LLVMBasedICFG should work just fine.

Useful shortcuts

In the following section some useful coding shortcuts are presented which may come in handy when writing a new analysis within PhASAR.

LLVM Utilities

PhASAR tightly integrates with LLVM. Therefore PhASAR internally makes heavy use of LLVM's Support library that provides a lot of useful utilities, most of them performing significantly better than the ones from the STL.

Hence, as a user of PhASAR you automatically have access to the LLVMSupport library as well. So, you definitely want to take a look at the LLVM headers at llvm/ADT/ and llvm/Support. For more information, check out the LLVM Programmer's Manual. Still for most detailed information, you may want to consult the LLVM source code. One very useful file to look at is STLExtras.h.

In addition, when writing an analysis within PhASAR, you will very likely have to inspect LLVM IR based on LLVM's API. It may be useful to check whether a utility that you need is already implemented in ValueTracking.h.

Pre-defined Flow Function Templates

When defining flow functions in IFDS/IDE, oftentimes a certain flow function type occurs more than once. For instance, the Kill flow function that kills a specific data-flow fact is often needed many times. An analysis writer, you can find several useful pre-defined flow functions and edge functions that can be directly used. A very useful example of such a recurring flow function are the identity, generate and kill flow functions. Some of the flow functions are also defined as a singleton in order to keep their overhead as small as possible. This is always possible if the flow function has no state. We also provide some LLVM specific flow functions that you can find here.

Here is a small example that makes use of the pre-defined flow functions:

auto getNormalFlowFunction(...) -> FlowFunctionPtrType {
    // check the type of the instruction
    if(...) {
        // do some work
    } 
    else if(...) {
        // unconditionally generate some new data-flow fact
        return generateFromZero(/*some fact to be generated*/);
    }
    else if (...) {
        // kill some data-flow fact
        return killFlow(/* some fact to be killed */);
    }

    // just treat everything else as Identity
    return identityFlow();

}

The `<algorithm>` Header

When overriding implementing an interface in order to describe a problem within the monotone framework, you oftentimes require set operations like set union, set intersect or set difference. Writing these functions yourself is tedious. Therefore it makes much sense to use the existing set operation functions which are defined in the <algorithm> header file such as:

    ...
    std::includes /* subset */
    std::set_difference
    std::set_intersection
    std::set_union
    ...

The neat thing about these functions is that they are completely generic as they operate on iterators and provide several useful overloads. Thus, they work on all container types that follow the concept's required by these functions. You may use them as follows:

std::set<int> a = {1, 2, 3};
std::set<int> b = {6, 4, 3, 2, 8, 1};
std::set<int> c(a);
c.insert(b.begin(), b.end());
bool isSubSet = std::includes(b.begin(), b.end(), a.begin(), a.end());

Note: When using the monotone framework, your also may want to checkout our BitVectorSet class that features several utilities that we have found useful for monotone-framework analyses.

Home
Reference Material
Getting Started:
Building PhASAR
- Raw Build
- Building with Distcc
Using PhASAR with Docker
A few uses of PhASAR
Coding Conventions
Contributing to PhASAR
Errors and bug reporting
Update to Newer LLVM Versions
OS Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing a Data Flow Analysis

Before Writing a Static Analysis

Choosing an Analysis Framework

Use IFDS, if

Use IDE, if

Use Inter-Mono, if

Use Intra-Mono, if

Writing a Static Analysis

Choosing a Control-Flow Graph

Useful shortcuts

LLVM Utilities

Pre-defined Flow Function Templates

The `<algorithm>` Header

Clone this wiki locally

Writing a Data Flow Analysis

Before Writing a Static Analysis

Choosing an Analysis Framework

Use IFDS, if

Use IDE, if

Use Inter-Mono, if

Use Intra-Mono, if

Writing a Static Analysis

Choosing a Control-Flow Graph

Useful shortcuts

LLVM Utilities

Pre-defined Flow Function Templates

The <algorithm> Header

Clone this wiki locally

The `<algorithm>` Header