TemporalInvariantMining

Invariants

Each event belongs to exactly one event class. Invariants capture temporal properties among event classes. The following three types of invariants are considered:

x AFby y An event of class x is always followed by an event of class y.
x NFby y An event of class x is never followed by an event of class y.
x AP y An event of class x always precedes an event of class y, i.e., whenever an event of class y occured, an event of class x has occured previously.

Mining

General Idea

The general idea is to enumerate possible invariants and test whether they hold. This is the Daikon approach applied to temporal invariants.

Over-approximation

Instead of generating all possible invariants, we use a heuristic to over-approximate the set of invariants. Over-approximation here means that invariants that hold must be found, but not all found invariants must hold. The over-approximation will still be much smaller than the naive set of all possible invariants. We benchmarked the speedup and found that on average 75% less invariants were generated compared to naive invariant template instantiation.

The over-approximation algorithm

Input: a graph representing the traces.

Output: an over approximation of the set of invariants that hold over the events in the graph.

Generate the transitive closure TC of the input, without doing any summarization or merging of nodes (if the input has several relations, generate transitive closures for each relation separately)
To discover invariants for event classes x,y of type AFby, consider all events of class x and check whether each event has a direct successor event of class y in TC.
To discover invariants for event classes x,y of type NFby consider all events of class x and check that none has a direct successor of class y in TC.
To discover invariants for event classes x,y of type AP consider all events of type y and check that each event has a direct predecessor of class x in TC.

The TemporalInvariantSet class implements this algorithm.

Proof that the over-approximation algorithm produces an over approximation (by contradiction):

Suppose the algorithm fails to produce the invariant i = x AFby y, but x is always followed by y in the input graph. For the algorithm to have failed to output i, the TC must not have had an edge from x to y, which, in turn, means there must not have been a path from x to y in the original graph input. However, that is a contradiction because we assumed that x is always followed by y in the input graph.
Suppose the algorithm fails to produce the invariant i = x NFby y, but x is never followed by y in the input graph. For the algorithm to have failed to output i, the TC must have had at least one edge from x to y, which, in turn, means there must have been at least one path from x to y in the original graph input. However, that is a contradiction because we assumed that x is never followed by y in the input graph.
Suppose the algorithm fails to produce the invariant i = x AP y, but x always precedes y in the input graph. For the algorithm to have failed to output i, the TC must have contained a least one y which is not preceded by an x, which, in turn, means there must be a y in the original graph input to which, for all x, no path exists from that x. However, that is a contradiction because we assumed that x always precedes y in the input graph.

Q.E.D.

Note that the over-approximation algorithm may, in fact, return an over approximation by including invariants that do not hold. For example:

AFby: Suppose the input graph has three nodes: a, b, and c, such that a transitions to b and to c and no other transitions are legal. Then the TC will look the same as the input graph and the algorithm will report a AFby b, even though a is sometims followed by c, not b.
NFby: (YB: did we come up with an example of a spurious NFby invariant or are these not possible?)

(IB: I think these are not possible. Proof: Assume (1) x NFby y is false for G, and (2) there is no edge x -> y in TC of G. Then (2) implies that there exists no path from x -> y in G. But (1) implies that there is at least one path from some x to some y in G. Contradiction.)

AP: Suppose the input graph has three nodes: a, b, and c, such that a transitions to b, b transitions to c, c transitions to a, and no other transitions are legal. Then the TC will have the a-b-c cycle and loops on each node. Then the algorithm will report a AP b even though the original graph may encode the two traces of events b c a and a b c.

(MDE: I don't understand this example. What are the edges in the input graph? Does it already have a cycle? Or, if there are two distinct a events, then there won't be a cycle in the approximated graph.)

(MDE: It's important that the algorithm treat each distinct "chunk" of the input, such as a file, separately, or else spurious AFby and AP events can occur. I assume this is properly handled, since the model checker must do this also.)

Rejecting spurious invariants

Each invariant discovered in the previous step is then checked using a model checker. If the invariant does not hold, it is rejected, otherwise it is retained. The set of retained invariants is the result of the mining process.

Pseudo Code

Note that node reachability is provided by the transitive closure.

for each pair of labels as (label1,label2)
  set neverFollowed = false
  set alwaysFollowedBy = true
  set alwaysPrecededBy = true
  for each node with label1 as node1
    set followerFound = false
    set predecessorFound = false
    for each node with label2 as node1
      if node1 reaches node2 then
        neverFollowedBy = false
        followerFound = true
      end
      if node2 reaches node1 then
        predecessorFound = true
      end
    end for
    if not followerFound then
      alwaysFollowedBy = false
    end
    if not predecessorFound then
      alwaysPreceded = false
    end
  end for
  if neverFollowed then 
    issue label1 NFBy label2
  end
  if alwaysFollowedBy then
    issue label1 AFBy label2
  end
  if alwaysPreceded then
    issue label1 APBy label2
  end 
end for

TODOs

Prove that model checking is sound, in particular with regard to the transformation we make to the graph before applying the model checking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly