|
| 1 | +CFG Iterator Implementation |
| 2 | +=========================== |
| 3 | + |
| 4 | +This file explains the in-place CFG representation on top of the IL. |
| 5 | + |
| 6 | +Motivations |
| 7 | +----------- |
| 8 | + |
| 9 | +We want a unified IL and CFG representation to avoid the problem of keeping two datastructures in sync, |
| 10 | +and to essentially avoid the problem of defining the correspondence between the static analysis state domain, and |
| 11 | +the IL in order to apply a transformation to the IL using the CFG results. |
| 12 | + |
| 13 | +It also reduces the number of places refactors need to be applied, and reduces memory overhead for static analyses |
| 14 | +(hopefully). |
| 15 | + |
| 16 | + |
| 17 | +Interpreting the CFG from the IL |
| 18 | +-------------------------------- |
| 19 | + |
| 20 | +The IL has two structural interpretations: |
| 21 | + |
| 22 | +1. Its syntax tree; expressions have sub expressions and so on. |
| 23 | + - This can be traversed using Visitors |
| 24 | + - It can also be traversed down by accessing class fields, and upward using the Parent trait |
| 25 | + - The traversal order is defined by the order of terms in the language with a depth-first traversal of sub-terms. |
| 26 | +2. Its control flow graph; this is part of the language's semantics, and is inferred from the Jump and Call statements. |
| 27 | + - This is traversed using the control flow iterator, or by constructing the separate Tip-style CFG and traversing that. |
| 28 | + From here on we describe the 'control-flow iterator'. |
| 29 | + - The traversal order is defined by the `Dependency` structure and `Worklist` solvers and the predecessor/successor |
| 30 | + relation between pairs of nodes |
| 31 | + |
| 32 | +We need to derive the predecessor/successor relation on CFG nodes IL . |
| 33 | + |
| 34 | +1. CFG positions are defined as |
| 35 | + - The entry to a procedure |
| 36 | + - The single return point from a procedure |
| 37 | + - The block and jump statement that return from the procedure |
| 38 | + - The beginning of a block within a procedure |
| 39 | + - A statement command within a block |
| 40 | + - A jump or call command within a block |
| 41 | + |
| 42 | +For example we define the language as statements for horn clauses. (`A :- B` means B produces A, with `,` indicating |
| 43 | +conjunction and `;` indicating disjunction) |
| 44 | + |
| 45 | +First we have basic blocks belonging to a procedure. |
| 46 | + |
| 47 | + Procedure(id) |
| 48 | + Block(id, procedure) |
| 49 | + EntryBlock(block_id, procedure) |
| 50 | + ReturnBlock(block_id, procedure) |
| 51 | + Block(id, procedure) :- EntryBlock(id, procedure); ReturnBlock(id, procedure) |
| 52 | + |
| 53 | +A list of sequential statements belonging to a block |
| 54 | + |
| 55 | + Statement(id, block, index) |
| 56 | + |
| 57 | +A list of jumps (either Calls or GoTos) belonging to a block, which occur after the statements. GoTos form the |
| 58 | +intra-procedural edges, and Calls form the inter-procedural edges. |
| 59 | + |
| 60 | + GoTo(id, block, destinationBlock) // multiple destinations |
| 61 | + Call(id, block, destinationProcedure, returnBlock), count {Call(id, block, _, _)} == 1 |
| 62 | + Jump(id, block) :- GoTo(id, block, _) ; Call(id, block, _, _) |
| 63 | + |
| 64 | +Statements and Jumps are both considered commands. All IL terms, commands, blocks, and procedures, have a unique |
| 65 | +identifier. All of the above are considered IL terms. |
| 66 | + |
| 67 | + Command(id) :- Statement(id, _, _) ; Jump(id, _) |
| 68 | + ILTerm(id) :- Procedure(id); Block(id, _); Command(id) |
| 69 | + |
| 70 | +The predecessor/successor relates ILTerms to ILTerms, and is simply defined in terms of the nodes |
| 71 | + |
| 72 | + pred(i, j) :- succ(j, i) |
| 73 | + |
| 74 | + succ(block, statement) :- Statement(statement, block, 0) |
| 75 | + succ(statement1, statement2) :- Statement(statement1, block, i), Statement(statement2, block, i + 1) |
| 76 | + succ(statement, goto) :- Statement(block, _last), Jump(block, goto), _last = max i forall Statement(block, i) |
| 77 | + |
| 78 | + succ(goto, targetBlock) :- GoTo(goto, _, _, targetBlock) |
| 79 | + |
| 80 | + succ(call, return_block) :- Call(call, block, dest_procedure, return_block) |
| 81 | + |
| 82 | +For an inter-procedural CFG we also have: |
| 83 | + |
| 84 | + succ(call, return_block) :- ReturnBlock(return_block, call), Procedure(call) |
| 85 | + succ(call, targetProcedure) :- Call(call, _, _, targetProcedure) |
| 86 | + |
| 87 | +An inter-procedural solver is expected to keep track of call sites which return statements jump back to. |
| 88 | + |
| 89 | +So a sequential application of `succ` might look like |
| 90 | + |
| 91 | + ProcedureA -> {Block0} -> {Statement1} -> {Statement2} -> {Jump0, Jump1} -> {Block1} | {Block2} -> ... |
| 92 | + |
| 93 | +Implementation |
| 94 | +-------------- |
| 95 | + |
| 96 | +We want it to be possible to define `succ(term, _)` and `pred(term, _)` for any given term in the IL in `O(1)`. |
| 97 | +Successors are easily derived but predecessors are not stored with their successors. Furthermore `ProcedureExit`, |
| 98 | +and `CallReturn` are not inherently present in the IL. |
| 99 | + |
| 100 | +In code we have a set of Calls, and Gotos present in the IL: these define the edges from themselves to their target. |
| 101 | + |
| 102 | +Then all vertices in the CFG---that is all Commands, Blocks, and Procedures in the IL---store a list of references to |
| 103 | +their set of incoming and outgoing edges. In a sense the 'id's in the formulation above become the JVM object IDs. |
| 104 | + |
| 105 | +For Blocks and Procedures this means a `Set` of call statements. For Commands this means they are |
| 106 | +stored in their block in an intrusive linked list. |
| 107 | + |
| 108 | +Specifically this means we store |
| 109 | + |
| 110 | + Command: |
| 111 | + - reference to parent block |
| 112 | + - procedure to find the next or previous statement in the block |
| 113 | + - IntrusiveListElement trait inserts a next() and previous() method forming the linked list |
| 114 | + |
| 115 | + Block |
| 116 | + - reference to parent procedure |
| 117 | + - list of incoming GoTos |
| 118 | + - list of Jumps including |
| 119 | + - Outgoing Calls |
| 120 | + - Outgoing GoTos |
| 121 | + |
| 122 | + Procedure |
| 123 | + - list of incoming Calls |
| 124 | + - subroutine to compute the set of all outgoing calls in all contained blocks |
| 125 | + |
| 126 | +This means the IL contains: |
| 127 | + - Forward graph edges in the forms of calls and gotos |
| 128 | + - Forward syntax tree edges in the form of classes containing their children as fields |
| 129 | + - Backwards graph edges in the form of lists of incoming jumps and calls |
| 130 | + - Procedure has list of incoming calls |
| 131 | + - Block has list of incoming gotos |
| 132 | + - Backwards syntax tree edges in the form of a parent field |
| 133 | + - Implementation of the `HasParent` trait. |
| 134 | + |
| 135 | +To maintain the backwards edges it is necessary to make the actual data structures private, and only allow |
| 136 | +modification through interfaces which maintain the graph/tree. |
| 137 | + |
| 138 | +Jumps: |
| 139 | +- Must implement an interface to allow adding or removing edge references (references to themself) to and from their |
| 140 | + target |
| 141 | + |
| 142 | +Blocks and Procedures: |
| 143 | +- Implement an interface for adding and removing edge references |
| 144 | + |
| 145 | +Furthermore; |
| 146 | +- Reparenting Blocks and Commands in the IL must preserve the parent field, this is not really implemented yet |
0 commit comments