-
Notifications
You must be signed in to change notification settings - Fork 25
TemporalInvariantMining
Each event belongs to exactly one event class. Invariants capture temporal properties among event classes. The following three types of invariants are considered:
- x AFby y An event of class x is always followed by an event of class y.
- x NFby y An event of class x is never followed by an event of class y.
- x AP y An event of class x always precedes an event of class y, i.e., whenever an event of class y occured, an event of class x has occured previously.
The general idea is to enumerate possible invariants and test whether they hold. This is the Daikon approach applied to temporal invariants.
Instead of generating all possible invariants, we use a heuristic to over-approximate the set of invariants. Over-approximation here means that invariants that hold must be found, but not all found invariants must hold. The over-approximation will still be much smaller than the naive set of all possible invariants. We benchmarked the speedup and found that on average 75% less invariants were generated compared to naive invariant template instantiation.
Input: a graph representing the traces.
Output: an over approximation of the set of invariants that hold over the events in the graph.
- Generate the transitive closure TC of the input, without doing any summarization or merging of nodes (if the input has several relations, generate transitive closures for each relation separately)
- To discover invariants for event classes x,y of type AFby, consider all events of class x and check whether each event has a direct successor event of class y in TC.
- To discover invariants for event classes x,y of type NFby consider all events of class x and check that none has a direct successor of class y in TC.
- To discover invariants for event classes x,y of type AP consider all events of type y and check that each event has a direct predecessor of class x in TC.
The TemporalInvariantSet
class implements this algorithm.
Proof that the over-approximation algorithm produces an over approximation (by contradiction):
- Suppose the algorithm fails to produce the invariant
i
= x AFby y, but x is always followed by y in the input graph. For the algorithm to have failed to outputi
, the TC must not have had an edge from x to y, which, in turn, means there must not have been a path from x to y in the original graph input. However, that is a contradiction because we assumed that x is always followed by y in the input graph. - Suppose the algorithm fails to produce the invariant
i
= x NFby y, but x is never followed by y in the input graph. For the algorithm to have failed to outputi
, the TC must have had at least one edge from x to y, which, in turn, means there must have been at least one path from x to y in the original graph input. However, that is a contradiction because we assumed that x is never followed by y in the input graph. - Suppose the algorithm fails to produce the invariant
i
= x AP y, but x always precedes y in the input graph. For the algorithm to have failed to outputi
, the TC must have contained a least one y which is not preceded by an x, which, in turn, means there must be a y in the original graph input to which, for all x, no path exists from that x. However, that is a contradiction because we assumed that x always precedes y in the input graph.
Q.E.D.
Note that the over-approximation algorithm may, in fact, return an over approximation by including invariants that do not hold. For example:
-
AFby: Suppose the input graph has three nodes: a, b, and c, such that a transitions to b and to c and no other transitions are legal. Then the TC will look the same as the input graph and the algorithm will report a AFby b, even though a is sometims followed by c, not b.
-
NFby: (YB: did we come up with an example of a spurious NFby invariant or are these not possible?)
(IB: I think these are not possible. Proof: Assume (1) x NFby y is false for G, and (2) there is no edge x -> y in TC of G. Then (2) implies that there exists no path from x -> y in G. But (1) implies that there is at least one path from some x to some y in G. Contradiction.)
- AP: Suppose the input graph has three nodes: a, b, and c, such that a transitions to b, b transitions to c, c transitions to a, and no other transitions are legal. Then the TC will have the a-b-c cycle and loops on each node. Then the algorithm will report a AP b even though the original graph may encode the two traces of events b c a and a b c.
(MDE: I don't understand this example. What are the edges in the input graph? Does it already have a cycle? Or, if there are two distinct a events, then there won't be a cycle in the approximated graph.)
(MDE: It's important that the algorithm treat each distinct "chunk" of the input, such as a file, separately, or else spurious AFby and AP events can occur. I assume this is properly handled, since the model checker must do this also.)
Each invariant discovered in the previous step is then checked using a model checker. If the invariant does not hold, it is rejected, otherwise it is retained. The set of retained invariants is the result of the mining process.
Note that node reachability is provided by the transitive closure.
for each pair of labels as (label1,label2)
set neverFollowed = false
set alwaysFollowedBy = true
set alwaysPrecededBy = true
for each node with label1 as node1
set followerFound = false
set predecessorFound = false
for each node with label2 as node1
if node1 reaches node2 then
neverFollowedBy = false
followerFound = true
end
if node2 reaches node1 then
predecessorFound = true
end
end for
if not followerFound then
alwaysFollowedBy = false
end
if not predecessorFound then
alwaysPreceded = false
end
end for
if neverFollowed then
issue label1 NFBy label2
end
if alwaysFollowedBy then
issue label1 AFBy label2
end
if alwaysPreceded then
issue label1 APBy label2
end
end for
- Prove that model checking is sound, in particular with regard to the transformation we make to the graph before applying the model checking.