Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Il cfg iterator #141

Merged
merged 46 commits into from
Jan 24, 2024
Merged

Il cfg iterator #141

merged 46 commits into from
Jan 24, 2024

Conversation

ailrst
Copy link
Contributor

@ailrst ailrst commented Nov 14, 2023

Creating a draft PR for feedback in case you have feedback on the direction at this stage. Still to do are:

  • Merge Boogie-Style IR Control Flow #140
  • Merge analysis reworked types in main
  • More testing for the constprop analysis and the block-level CFG
  • Clean up how it handles block entry/exit notes and procedure entry/exit nodes
  • Finish and test the interprocedural version
  • Finish the Dot and text format analysis result output
  • Maybe also want to be able to return the position at the beginning or end of a given block or procedure?

The goal of this interface is to remove the need for a separate CFG structure, and separate 'id's for IL nodes that have to be resolved somehow. Broadly this change adds:

  • Parent node to Blocks and Commands
  • Adds reverse block-level CFG to Block
  • Adds reverse procedure-level CFG to Procedure
  • Encapsulates modification to jump lists in methods that maintain the CFG
  • Adds methods to return the successor and predecessor of any IL node
  • Adds a simple implementation of constant-prop and Direction to using this as the CFG (without changing the underlying solver)
  • Cleans up the intra flag in the existing CFG
  • Moves blocks' statement list to an IntrusiveList which allows getting the successor or predecessor of any list element in O(1)

This means the CFG part of the analysis domain is just a reference to an IL procedure, statement or command, and we have a pure function that returns the adjacent set of positions in any given direction; e.g. forwards or backwards in control flow.

See docs/il-cfg.md for more details.

@l-kent
Copy link
Contributor

l-kent commented Nov 16, 2023

It'll be easier to give feedback for this once you've merged in the changes from #140

removeJump(g)
addJump(f)
}
case (_, _) => throw Exception("Programmer error: can not replace jump with call or vice versa")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not allowed? It is probably necessary for resolving jump tables, where an indirect call has to be resolved to gotos, not direct calls.

@ailrst
Copy link
Contributor Author

ailrst commented Nov 28, 2023

The nondeterministic GoTo allows a having no targets, what is the semantics of that in boogie / what does it get translated to? I assume if it is translated naively boogie would fall through to the next block, but cant find right now what it actually gets translated to.

Comment on lines 209 to 211
def removeCaller(c: Call): Unit = {
_callers.add(c)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't doing what it says it does

Comment on lines 182 to 185
/* returnBlock: Single return point for all returns from the procedure to its caller; always defined but only included
* in `blocks` if the procedure contains any real blocks.*/
val returnBlock: Block = new Block(name + "_return", None, List(), new IndirectCall(Register("R30", BitVecType(64)), None, Some(name)))
returnBlock.setParent(this)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be defined for every procedure? Not all procedures will have any returns.

Comment on lines 199 to 207
/**
* Horrible, compensating for not storing the blocks in-order and storing initialBlock and returnBlock separately.
* @return
*/
def blocks: Seq[Block] =
(entryBlock match
case Some(b) if _blocks.nonEmpty => Seq(b) ++ _blocks.filter(x => x ne b).toSeq
case _ => _blocks.toSeq)
++ (if _blocks.nonEmpty then Seq(returnBlock) else Seq())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does returnBlock need to be stored separately? This also would seem to be non-deterministic, since _blocks is a Set, which is very undesirable.

Having this be reconstructed every time Procedure.blocks is called seems terribly inefficient. It should instead just provide an iterator over _blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a set for O(1) insertion removal if we modify the IL. My understanding is the block order does not have any meaning beyond the first block, Set's order is not defined but is deterministic it seems. I don't think its neccessary for return block to be stored separately, a few things can probably be cleaned up by just inserting it when the transformation is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block order doesn't have any meaning beyond the first block, but using a Set does mean the output order is fragile which is very undesirable for being able to track changes to the output. It also makes the output harder to follow if we further lose correspondence to the original BAP input.

We don't even really care about arbitrary removal right now - the only time blocks are removed at present is when external methods are stubbed out (removing all blocks at once), and anything else is likely to involve iterating over all blocks, so we could really just use a ListBuffer to get O(1) insertion/removal.

var name: String,
var address: Option[Int],
var entryBlock: Option[Block],
private val _blocks: mutable.HashSet[Block],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of making this a set? This seems like it will only cause problems for us, since we want to output blocks in a consistent order.

Copy link
Contributor Author

@ailrst ailrst Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is the block order does not have any meaning beyond the first block. The set removes the notion of an index at all and gives removal using the object identity instead, while also being faster. But I guess that only matters under heavy modification. A hashset should still have an undefined but deterministic order, although we could use a linkedhashset instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkedHashSet would be another good option yeah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although that still requires storing the entry block separately which still makes creating the iterator awkward, so I'd really just lean towards ListBuffer. I guess most cases where we iterate over all blocks don't care whether the entry block is first, except the final output, so taking that into account would allow for other possibilities.

@l-kent
Copy link
Contributor

l-kent commented Dec 1, 2023

Broadly, this approach is definitely a big improvement over what existed before by making it so there isn't the difficulty in trying to relate the analysis results back to the IR, as the IR elements are now directly used in the CFG. However, it still needs to calculate the predecessor and successor sets for everything, even simple traversal of the statements within a block which seems to result in things still being rather convoluted in some ways.

I want to try modifying the analysis solvers, which should allow for things to be significantly simplified.

Copy link
Contributor Author

@ailrst ailrst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I's really like to replace 'CFGPosition' with a sealed trait but that would mean moving procedure, block, and command into one file.

I think this should be OK to merge now, hopefully it doesn't break anything.

After merging we need to sort out the interprocedural situation.

I think it would be useful for have a deep-copy function for the mutable part of the IL for e.g. temporary analyses and inlining.

Wrt. the return blocks; a return block is always added to the procedure but its not always reachable.

@@ -116,52 +120,265 @@ class Program(var procedures: ArrayBuffer[Procedure], var mainProcedure: Procedu
initialMemory = initialMemoryNew
}

class ILUnorderedIterator(private val begin: Program) extends Iterator[CFGPosition] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@l-kent Do you want to keep this in? it's just a convenient way to .map() and .filter() etc. the program

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should make it clear what order it traverses the blocks within a procedure in, for instance (otherwise it could be prone to misuse). Would the idea to just be use .map() with the results of an analysis? I'm not entirely sure how useful that is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just commented it, its a very special case visitor I guess, probably only useful in very specific cases to map over the whole program, e.g. collect indirect calls and see if they were resolved. E.g. a simpler way to get the analysis domain.

@ailrst ailrst marked this pull request as ready for review January 23, 2024 05:32
Comment on lines 169 to 196
def newSolverTest(): Unit = {
val ilcpsolver = IRSimpleValueAnalysis.Solver(IRProgram)
val newCPResult = ilcpsolver.analyze()

val newCommandDomain = ilcpsolver.domain.filter(_.isInstanceOf[Command])

val newRes = newCPResult.flatMap((x, y) => y.flatMap {
case (_, el) if el == FlatLattice[BitVecLiteral].top || el == FlatLattice[BitVecLiteral].bottom => None
case z => Some(x -> z)
})
val oldRes = constPropResult.filter((x,y) => x.isInstanceOf[CfgNodeWithData[CFGPosition]]).flatMap((x, y) => y.flatMap {
case (_, el) if el == FlatLattice[BitVecLiteral].top || el == FlatLattice[BitVecLiteral].bottom => None
case z => Some(x.asInstanceOf[CfgNodeWithData[Any]].data -> z)
})
val both = newRes.toSet.intersect(oldRes.toSet)
val notnew = (newRes.toSet).filter(x => !both.contains(x)).toList.sorted((a, b) => a._2._1.name.compare(b._2._1.name))
val notOld = (oldRes.toSet).filter(x => !both.contains(x)).toList.sorted((a,b) => a._2._1.name.compare(b._2._1.name))
// newRes and oldRes should have value equality

//config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, newCPResult), s"${s}_newconstprop$iteration.txt"))
config.analysisResultsPath.foreach(s => writeToFile(toDot(IRProgram), s"program.dot"))
config.analysisResultsPath.foreach(s => writeToFile(toDot(IRProgram, newCPResult.map((k,v) => (k, v.toString))), s"program-constprop.dot"))

config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, newCPResult), s"${s}_new_cpres$iteration.txt"))
config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, cfg, constPropResult), s"${s}_old_cpres$iteration.txt"))

}
newSolverTest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in a test suite or something instead of being a core part of the analysis? Change that and then we can merge everything in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just removed this and made it run the analysis and output the results like the other constprop analysis. The results won't be the same anyway because this handles calls more carefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. What's the difference with handling calls, does it do some sort of havoc instead of just skipping over them in the intraprocedural analysis?

@l-kent
Copy link
Contributor

l-kent commented Jan 24, 2024

It should be good to merge now, just make sure that the expected files are updated properly and haven't gotten muddled up in the merges.

@ailrst ailrst merged commit 7ad7a20 into main Jan 24, 2024
1 check passed
l-kent pushed a commit that referenced this pull request Feb 21, 2024
Implement a simple way of getting the parent, successor and predecessor of any program, block, and command in the IL. 
Adds a TIP-style `Dependencies` trait that uses this instead of the CFG, this works well for intraprocedural analyses but interprocedural iteration is not correct.
@ailrst ailrst deleted the il-cfg-iterator branch July 3, 2024 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants