-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Il cfg iterator #141
Il cfg iterator #141
Conversation
It'll be easier to give feedback for this once you've merged in the changes from #140 |
src/main/scala/ir/Block.scala
Outdated
removeJump(g) | ||
addJump(f) | ||
} | ||
case (_, _) => throw Exception("Programmer error: can not replace jump with call or vice versa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not allowed? It is probably necessary for resolving jump tables, where an indirect call has to be resolved to gotos, not direct calls.
The nondeterministic GoTo allows a having no targets, what is the semantics of that in boogie / what does it get translated to? I assume if it is translated naively boogie would fall through to the next block, but cant find right now what it actually gets translated to. |
src/main/scala/ir/Program.scala
Outdated
def removeCaller(c: Call): Unit = { | ||
_callers.add(c) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't doing what it says it does
src/main/scala/ir/Program.scala
Outdated
/* returnBlock: Single return point for all returns from the procedure to its caller; always defined but only included | ||
* in `blocks` if the procedure contains any real blocks.*/ | ||
val returnBlock: Block = new Block(name + "_return", None, List(), new IndirectCall(Register("R30", BitVecType(64)), None, Some(name))) | ||
returnBlock.setParent(this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be defined for every procedure? Not all procedures will have any returns.
src/main/scala/ir/Program.scala
Outdated
/** | ||
* Horrible, compensating for not storing the blocks in-order and storing initialBlock and returnBlock separately. | ||
* @return | ||
*/ | ||
def blocks: Seq[Block] = | ||
(entryBlock match | ||
case Some(b) if _blocks.nonEmpty => Seq(b) ++ _blocks.filter(x => x ne b).toSeq | ||
case _ => _blocks.toSeq) | ||
++ (if _blocks.nonEmpty then Seq(returnBlock) else Seq()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does returnBlock need to be stored separately? This also would seem to be non-deterministic, since _blocks
is a Set, which is very undesirable.
Having this be reconstructed every time Procedure.blocks
is called seems terribly inefficient. It should instead just provide an iterator over _blocks
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a set for O(1) insertion removal if we modify the IL. My understanding is the block order does not have any meaning beyond the first block, Set's order is not defined but is deterministic it seems. I don't think its neccessary for return block to be stored separately, a few things can probably be cleaned up by just inserting it when the transformation is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block order doesn't have any meaning beyond the first block, but using a Set does mean the output order is fragile which is very undesirable for being able to track changes to the output. It also makes the output harder to follow if we further lose correspondence to the original BAP input.
We don't even really care about arbitrary removal right now - the only time blocks are removed at present is when external methods are stubbed out (removing all blocks at once), and anything else is likely to involve iterating over all blocks, so we could really just use a ListBuffer to get O(1) insertion/removal.
src/main/scala/ir/Program.scala
Outdated
var name: String, | ||
var address: Option[Int], | ||
var entryBlock: Option[Block], | ||
private val _blocks: mutable.HashSet[Block], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of making this a set? This seems like it will only cause problems for us, since we want to output blocks in a consistent order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is the block order does not have any meaning beyond the first block. The set removes the notion of an index at all and gives removal using the object identity instead, while also being faster. But I guess that only matters under heavy modification. A hashset should still have an undefined but deterministic order, although we could use a linkedhashset instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinkedHashSet would be another good option yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although that still requires storing the entry block separately which still makes creating the iterator awkward, so I'd really just lean towards ListBuffer. I guess most cases where we iterate over all blocks don't care whether the entry block is first, except the final output, so taking that into account would allow for other possibilities.
Broadly, this approach is definitely a big improvement over what existed before by making it so there isn't the difficulty in trying to relate the analysis results back to the IR, as the IR elements are now directly used in the CFG. However, it still needs to calculate the predecessor and successor sets for everything, even simple traversal of the statements within a block which seems to result in things still being rather convoluted in some ways. I want to try modifying the analysis solvers, which should allow for things to be significantly simplified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I's really like to replace 'CFGPosition' with a sealed trait but that would mean moving procedure, block, and command into one file.
I think this should be OK to merge now, hopefully it doesn't break anything.
After merging we need to sort out the interprocedural situation.
I think it would be useful for have a deep-copy function for the mutable part of the IL for e.g. temporary analyses and inlining.
Wrt. the return blocks; a return block is always added to the procedure but its not always reachable.
@@ -116,52 +120,265 @@ class Program(var procedures: ArrayBuffer[Procedure], var mainProcedure: Procedu | |||
initialMemory = initialMemoryNew | |||
} | |||
|
|||
class ILUnorderedIterator(private val begin: Program) extends Iterator[CFGPosition] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@l-kent Do you want to keep this in? it's just a convenient way to .map() and .filter() etc. the program
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should make it clear what order it traverses the blocks within a procedure in, for instance (otherwise it could be prone to misuse). Would the idea to just be use .map() with the results of an analysis? I'm not entirely sure how useful that is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just commented it, its a very special case visitor I guess, probably only useful in very specific cases to map over the whole program, e.g. collect indirect calls and see if they were resolved. E.g. a simpler way to get the analysis domain.
src/main/scala/util/RunUtils.scala
Outdated
def newSolverTest(): Unit = { | ||
val ilcpsolver = IRSimpleValueAnalysis.Solver(IRProgram) | ||
val newCPResult = ilcpsolver.analyze() | ||
|
||
val newCommandDomain = ilcpsolver.domain.filter(_.isInstanceOf[Command]) | ||
|
||
val newRes = newCPResult.flatMap((x, y) => y.flatMap { | ||
case (_, el) if el == FlatLattice[BitVecLiteral].top || el == FlatLattice[BitVecLiteral].bottom => None | ||
case z => Some(x -> z) | ||
}) | ||
val oldRes = constPropResult.filter((x,y) => x.isInstanceOf[CfgNodeWithData[CFGPosition]]).flatMap((x, y) => y.flatMap { | ||
case (_, el) if el == FlatLattice[BitVecLiteral].top || el == FlatLattice[BitVecLiteral].bottom => None | ||
case z => Some(x.asInstanceOf[CfgNodeWithData[Any]].data -> z) | ||
}) | ||
val both = newRes.toSet.intersect(oldRes.toSet) | ||
val notnew = (newRes.toSet).filter(x => !both.contains(x)).toList.sorted((a, b) => a._2._1.name.compare(b._2._1.name)) | ||
val notOld = (oldRes.toSet).filter(x => !both.contains(x)).toList.sorted((a,b) => a._2._1.name.compare(b._2._1.name)) | ||
// newRes and oldRes should have value equality | ||
|
||
//config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, newCPResult), s"${s}_newconstprop$iteration.txt")) | ||
config.analysisResultsPath.foreach(s => writeToFile(toDot(IRProgram), s"program.dot")) | ||
config.analysisResultsPath.foreach(s => writeToFile(toDot(IRProgram, newCPResult.map((k,v) => (k, v.toString))), s"program-constprop.dot")) | ||
|
||
config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, newCPResult), s"${s}_new_cpres$iteration.txt")) | ||
config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, cfg, constPropResult), s"${s}_old_cpres$iteration.txt")) | ||
|
||
} | ||
newSolverTest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in a test suite or something instead of being a core part of the analysis? Change that and then we can merge everything in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just removed this and made it run the analysis and output the results like the other constprop analysis. The results won't be the same anyway because this handles calls more carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. What's the difference with handling calls, does it do some sort of havoc instead of just skipping over them in the intraprocedural analysis?
It should be good to merge now, just make sure that the expected files are updated properly and haven't gotten muddled up in the merges. |
Implement a simple way of getting the parent, successor and predecessor of any program, block, and command in the IL. Adds a TIP-style `Dependencies` trait that uses this instead of the CFG, this works well for intraprocedural analyses but interprocedural iteration is not correct.
Creating a draft PR for feedback in case you have feedback on the direction at this stage. Still to do are:
The goal of this interface is to remove the need for a separate CFG structure, and separate 'id's for IL nodes that have to be resolved somehow. Broadly this change adds:
intra
flag in the existing CFGThis means the CFG part of the analysis domain is just a reference to an IL procedure, statement or command, and we have a pure function that returns the adjacent set of positions in any given direction; e.g. forwards or backwards in control flow.
See docs/il-cfg.md for more details.