Skip to content

Commit 7ad7a20

Browse files
authored
Merge pull request #141 from UQ-PAC/il-cfg-iterator
Implement a simple way of getting the parent, successor and predecessor of any program, block, and command in the IL. Adds a TIP-style `Dependencies` trait that uses this instead of the CFG, this works well for intraprocedural analyses but interprocedural iteration is not correct.
2 parents 03e54a8 + 2b088f3 commit 7ad7a20

File tree

373 files changed

+3011
-348
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

373 files changed

+3011
-348
lines changed

docs/il-cfg.md

+146
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
CFG Iterator Implementation
2+
===========================
3+
4+
This file explains the in-place CFG representation on top of the IL.
5+
6+
Motivations
7+
-----------
8+
9+
We want a unified IL and CFG representation to avoid the problem of keeping two datastructures in sync,
10+
and to essentially avoid the problem of defining the correspondence between the static analysis state domain, and
11+
the IL in order to apply a transformation to the IL using the CFG results.
12+
13+
It also reduces the number of places refactors need to be applied, and reduces memory overhead for static analyses
14+
(hopefully).
15+
16+
17+
Interpreting the CFG from the IL
18+
--------------------------------
19+
20+
The IL has two structural interpretations:
21+
22+
1. Its syntax tree; expressions have sub expressions and so on.
23+
- This can be traversed using Visitors
24+
- It can also be traversed down by accessing class fields, and upward using the Parent trait
25+
- The traversal order is defined by the order of terms in the language with a depth-first traversal of sub-terms.
26+
2. Its control flow graph; this is part of the language's semantics, and is inferred from the Jump and Call statements.
27+
- This is traversed using the control flow iterator, or by constructing the separate Tip-style CFG and traversing that.
28+
From here on we describe the 'control-flow iterator'.
29+
- The traversal order is defined by the `Dependency` structure and `Worklist` solvers and the predecessor/successor
30+
relation between pairs of nodes
31+
32+
We need to derive the predecessor/successor relation on CFG nodes IL .
33+
34+
1. CFG positions are defined as
35+
- The entry to a procedure
36+
- The single return point from a procedure
37+
- The block and jump statement that return from the procedure
38+
- The beginning of a block within a procedure
39+
- A statement command within a block
40+
- A jump or call command within a block
41+
42+
For example we define the language as statements for horn clauses. (`A :- B` means B produces A, with `,` indicating
43+
conjunction and `;` indicating disjunction)
44+
45+
First we have basic blocks belonging to a procedure.
46+
47+
Procedure(id)
48+
Block(id, procedure)
49+
EntryBlock(block_id, procedure)
50+
ReturnBlock(block_id, procedure)
51+
Block(id, procedure) :- EntryBlock(id, procedure); ReturnBlock(id, procedure)
52+
53+
A list of sequential statements belonging to a block
54+
55+
Statement(id, block, index)
56+
57+
A list of jumps (either Calls or GoTos) belonging to a block, which occur after the statements. GoTos form the
58+
intra-procedural edges, and Calls form the inter-procedural edges.
59+
60+
GoTo(id, block, destinationBlock) // multiple destinations
61+
Call(id, block, destinationProcedure, returnBlock), count {Call(id, block, _, _)} == 1
62+
Jump(id, block) :- GoTo(id, block, _) ; Call(id, block, _, _)
63+
64+
Statements and Jumps are both considered commands. All IL terms, commands, blocks, and procedures, have a unique
65+
identifier. All of the above are considered IL terms.
66+
67+
Command(id) :- Statement(id, _, _) ; Jump(id, _)
68+
ILTerm(id) :- Procedure(id); Block(id, _); Command(id)
69+
70+
The predecessor/successor relates ILTerms to ILTerms, and is simply defined in terms of the nodes
71+
72+
pred(i, j) :- succ(j, i)
73+
74+
succ(block, statement) :- Statement(statement, block, 0)
75+
succ(statement1, statement2) :- Statement(statement1, block, i), Statement(statement2, block, i + 1)
76+
succ(statement, goto) :- Statement(block, _last), Jump(block, goto), _last = max i forall Statement(block, i)
77+
78+
succ(goto, targetBlock) :- GoTo(goto, _, _, targetBlock)
79+
80+
succ(call, return_block) :- Call(call, block, dest_procedure, return_block)
81+
82+
For an inter-procedural CFG we also have:
83+
84+
succ(call, return_block) :- ReturnBlock(return_block, call), Procedure(call)
85+
succ(call, targetProcedure) :- Call(call, _, _, targetProcedure)
86+
87+
An inter-procedural solver is expected to keep track of call sites which return statements jump back to.
88+
89+
So a sequential application of `succ` might look like
90+
91+
ProcedureA -> {Block0} -> {Statement1} -> {Statement2} -> {Jump0, Jump1} -> {Block1} | {Block2} -> ...
92+
93+
Implementation
94+
--------------
95+
96+
We want it to be possible to define `succ(term, _)` and `pred(term, _)` for any given term in the IL in `O(1)`.
97+
Successors are easily derived but predecessors are not stored with their successors. Furthermore `ProcedureExit`,
98+
and `CallReturn` are not inherently present in the IL.
99+
100+
In code we have a set of Calls, and Gotos present in the IL: these define the edges from themselves to their target.
101+
102+
Then all vertices in the CFG---that is all Commands, Blocks, and Procedures in the IL---store a list of references to
103+
their set of incoming and outgoing edges. In a sense the 'id's in the formulation above become the JVM object IDs.
104+
105+
For Blocks and Procedures this means a `Set` of call statements. For Commands this means they are
106+
stored in their block in an intrusive linked list.
107+
108+
Specifically this means we store
109+
110+
Command:
111+
- reference to parent block
112+
- procedure to find the next or previous statement in the block
113+
- IntrusiveListElement trait inserts a next() and previous() method forming the linked list
114+
115+
Block
116+
- reference to parent procedure
117+
- list of incoming GoTos
118+
- list of Jumps including
119+
- Outgoing Calls
120+
- Outgoing GoTos
121+
122+
Procedure
123+
- list of incoming Calls
124+
- subroutine to compute the set of all outgoing calls in all contained blocks
125+
126+
This means the IL contains:
127+
- Forward graph edges in the forms of calls and gotos
128+
- Forward syntax tree edges in the form of classes containing their children as fields
129+
- Backwards graph edges in the form of lists of incoming jumps and calls
130+
- Procedure has list of incoming calls
131+
- Block has list of incoming gotos
132+
- Backwards syntax tree edges in the form of a parent field
133+
- Implementation of the `HasParent` trait.
134+
135+
To maintain the backwards edges it is necessary to make the actual data structures private, and only allow
136+
modification through interfaces which maintain the graph/tree.
137+
138+
Jumps:
139+
- Must implement an interface to allow adding or removing edge references (references to themself) to and from their
140+
target
141+
142+
Blocks and Procedures:
143+
- Implement an interface for adding and removing edge references
144+
145+
Furthermore;
146+
- Reparenting Blocks and Commands in the IL must preserve the parent field, this is not really implemented yet

src/main/scala/analysis/Analysis.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -297,4 +297,4 @@ class MemoryRegionAnalysisSolver(
297297
case _ => super.funsub(n, x)
298298
}
299299
}
300-
}
300+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
package analysis
2+
import ir.*
3+
import analysis.solvers.*
4+
5+
trait ILValueAnalysisMisc:
6+
val valuelattice: ConstantPropagationLattice = ConstantPropagationLattice()
7+
val statelattice: MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice] = MapLattice(valuelattice)
8+
9+
def eval(exp: Expr, env: Map[Variable, FlatElement[BitVecLiteral]]): FlatElement[BitVecLiteral] =
10+
import valuelattice._
11+
exp match
12+
case id: Variable => env(id)
13+
case n: BitVecLiteral => bv(n)
14+
case ze: ZeroExtend => zero_extend(ze.extension, eval(ze.body, env))
15+
case se: SignExtend => sign_extend(se.extension, eval(se.body, env))
16+
case e: Extract => extract(e.end, e.start, eval(e.body, env))
17+
case bin: BinaryExpr =>
18+
val left = eval(bin.arg1, env)
19+
val right = eval(bin.arg2, env)
20+
bin.op match
21+
case BVADD => bvadd(left, right)
22+
case BVSUB => bvsub(left, right)
23+
case BVMUL => bvmul(left, right)
24+
case BVUDIV => bvudiv(left, right)
25+
case BVSDIV => bvsdiv(left, right)
26+
case BVSREM => bvsrem(left, right)
27+
case BVUREM => bvurem(left, right)
28+
case BVSMOD => bvsmod(left, right)
29+
case BVAND => bvand(left, right)
30+
case BVOR => bvor(left, right)
31+
case BVXOR => bvxor(left, right)
32+
case BVNAND => bvnand(left, right)
33+
case BVNOR => bvnor(left, right)
34+
case BVXNOR => bvxnor(left, right)
35+
case BVSHL => bvshl(left, right)
36+
case BVLSHR => bvlshr(left, right)
37+
case BVASHR => bvashr(left, right)
38+
case BVCOMP => bvcomp(left, right)
39+
case BVCONCAT => concat(left, right)
40+
41+
case un: UnaryExpr =>
42+
val arg = eval(un.arg, env)
43+
44+
un.op match
45+
case BVNOT => bvnot(arg)
46+
case BVNEG => bvneg(arg)
47+
48+
case _ => valuelattice.top
49+
50+
private final val callerPreservedRegisters = Set("R0", "R1", "R2", "R3", "R4", "R5", "R6", "R7", "R8", "R9", "R10",
51+
"R11", "R12", "R13", "R14", "R15", "R16", "R17", "R18", "R30")
52+
53+
/** Transfer function for state lattice elements.
54+
*/
55+
def localTransfer(n: CFGPosition, s: statelattice.Element): statelattice.Element =
56+
n match
57+
case la: LocalAssign =>
58+
s + (la.lhs -> eval(la.rhs, s))
59+
case c: Call => s ++ callerPreservedRegisters.filter(reg => s.keys.exists(_.name == reg)).map(n => Register(n, BitVecType(64)) -> statelattice.sublattice.top).toMap
60+
case _ => s
61+
62+
63+
64+
object IRSimpleValueAnalysis:
65+
66+
class Solver(prog: Program) extends ILValueAnalysisMisc
67+
with IRIntraproceduralForwardDependencies
68+
with Analysis[Map[CFGPosition, Map[Variable, FlatElement[BitVecLiteral]]]]
69+
with SimplePushDownWorklistFixpointSolver[CFGPosition, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]]
70+
:
71+
/* Worklist initial set */
72+
//override val lattice: MapLattice[CFGPosition, statelattice.type] = MapLattice(statelattice)
73+
override val lattice: MapLattice[CFGPosition, Map[Variable, FlatElement[BitVecLiteral]], MapLattice[Variable, FlatElement[BitVecLiteral], ConstantPropagationLattice]] = MapLattice(statelattice)
74+
75+
override val domain : Set[CFGPosition] = computeDomain(IntraProcIRCursor, prog.procedures).toSet
76+
def transfer(n: CFGPosition, s: statelattice.Element): statelattice.Element = localTransfer(n, s)

src/main/scala/analysis/Cfg.scala

+2-8
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,7 @@ class ProgramCfgFactory:
428428
cfg.addEdge(funcEntryNode, funcExitNode)
429429
} else {
430430
// Recurse through blocks
431-
visitBlock(proc.blocks.head, funcEntryNode)
431+
visitBlock(proc.entryBlock.get, funcEntryNode)
432432
}
433433

434434
/** Add a block to the CFG. A block in this case is a basic block, so it contains a list of consecutive statements
@@ -472,12 +472,10 @@ class ProgramCfgFactory:
472472
* Statements in this block
473473
* @param prevNode
474474
* Preceding block's end node (jump)
475-
* @param cond
476-
* Condition on the jump from `prevNode` to the first statement of this block
477475
* @return
478476
* The last statement's CFG node
479477
*/
480-
def visitStmts(stmts: ArrayBuffer[Statement], prevNode: CfgNode): CfgCommandNode = {
478+
def visitStmts(stmts: Iterable[Statement], prevNode: CfgNode): CfgCommandNode = {
481479

482480
val firstNode = CfgStatementNode(stmts.head, block, funcEntryNode)
483481
cfg.addEdge(prevNode, firstNode)
@@ -506,9 +504,6 @@ class ProgramCfgFactory:
506504
* @param prevNode
507505
* Either the previous statement in the block, or the previous block's end node (in the case that this block
508506
* contains no statements)
509-
* @param cond
510-
* Jump from `prevNode` to this. `TrueLiteral` if `prevNode` is a statement, and any `Expr` if `prevNode` is a
511-
* jump.
512507
* @param solitary
513508
* `True` if this block contains no statements, `False` otherwise
514509
*/
@@ -616,7 +611,6 @@ class ProgramCfgFactory:
616611
cfg.addEdge(jmpNode, noReturn)
617612
cfg.addEdge(noReturn, funcExitNode)
618613
}
619-
case _ => assert(false, s"unexpected jump encountered, jump: $jmp")
620614
} // `jmps.head` match
621615
} // `visitJumps` function
622616
} // `visitBlocks` function

src/main/scala/analysis/Dependencies.scala

+27-5
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
package analysis
2+
import ir.{IRWalk, IntraProcIRCursor, InterProcIRCursor, CFGPosition}
23

34
/** Dependency methods for worklist-based analyses.
45
*/
@@ -21,11 +22,32 @@ trait Dependencies[N]:
2122
def indep(n: N): Set[N]
2223

2324
trait InterproceduralForwardDependencies extends Dependencies[CfgNode] {
24-
def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet
25-
def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet
25+
override def outdep(n: CfgNode): Set[CfgNode] = n.succInter.toSet
26+
override def indep(n: CfgNode): Set[CfgNode] = n.predInter.toSet
2627
}
2728

2829
trait IntraproceduralForwardDependencies extends Dependencies[CfgNode] {
29-
def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet
30-
def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet
31-
}
30+
override def outdep(n: CfgNode): Set[CfgNode] = n.succIntra.toSet
31+
override def indep(n: CfgNode): Set[CfgNode] = n.predIntra.toSet
32+
}
33+
34+
35+
trait IRInterproceduralForwardDependencies extends Dependencies[CFGPosition] {
36+
override def outdep(n: CFGPosition): Set[CFGPosition] = InterProcIRCursor.succ(n)
37+
override def indep(n: CFGPosition): Set[CFGPosition] = InterProcIRCursor.pred(n)
38+
}
39+
40+
trait IRIntraproceduralForwardDependencies extends Dependencies[CFGPosition] {
41+
override def outdep(n: CFGPosition): Set[CFGPosition] = IntraProcIRCursor.succ(n)
42+
override def indep(n: CFGPosition): Set[CFGPosition] = IntraProcIRCursor.pred(n)
43+
}
44+
45+
trait IRInterproceduralBackwardDependencies extends IRInterproceduralForwardDependencies {
46+
override def outdep(n: CFGPosition): Set[CFGPosition] = super.indep(n)
47+
override def indep(n: CFGPosition): Set[CFGPosition] = super.outdep(n)
48+
}
49+
50+
trait IRIntraproceduralBackwardDependencies extends IRIntraproceduralForwardDependencies {
51+
override def outdep(n: CFGPosition): Set[CFGPosition] = super.indep(n)
52+
override def indep(n: CFGPosition): Set[CFGPosition] = super.outdep(n)
53+
}

src/main/scala/analysis/UtilMethods.scala

+4-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,10 @@ def evaluateExpression(exp: Expr, constantPropResult: Map[Variable, FlatElement[
2626
case BVASHR => Some(BitVectorEval.smt_bvashr(l, r))
2727
case BVCOMP => Some(BitVectorEval.smt_bvcomp(l, r))
2828
case BVCONCAT => Some(BitVectorEval.smt_concat(l, r))
29-
case _ => throw new RuntimeException("Binary operation support not implemented: " + binOp.op)
29+
case x => {
30+
Logger.error("Binary operation support not implemented: " + binOp.op)
31+
None
32+
}
3033
}
3134
case _ => None
3235
}

src/main/scala/analysis/solvers/FixPointSolver.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ trait ListSetWorklist[N] extends Worklist[N]:
112112
def add(n: N): Unit =
113113
worklist += n
114114

115-
def add(ns: Set[N]): Unit = worklist ++= ns
115+
def add(ns: Iterable[N]): Unit = worklist ++= ns
116116

117117
def run(first: Set[N]): Unit =
118118
worklist = new ListSet[N] ++ first

0 commit comments

Comments
 (0)