Il cfg iterator #141

ailrst · 2023-11-14T01:32:37Z

Creating a draft PR for feedback in case you have feedback on the direction at this stage. Still to do are:

Merge Boogie-Style IR Control Flow #140
Merge analysis reworked types in main
More testing for the constprop analysis and the block-level CFG
Clean up how it handles block entry/exit notes and procedure entry/exit nodes
Finish and test the interprocedural version
Finish the Dot and text format analysis result output
Maybe also want to be able to return the position at the beginning or end of a given block or procedure?

The goal of this interface is to remove the need for a separate CFG structure, and separate 'id's for IL nodes that have to be resolved somehow. Broadly this change adds:

Parent node to Blocks and Commands
Adds reverse block-level CFG to Block
Adds reverse procedure-level CFG to Procedure
Encapsulates modification to jump lists in methods that maintain the CFG
Adds methods to return the successor and predecessor of any IL node
Adds a simple implementation of constant-prop and Direction to using this as the CFG (without changing the underlying solver)
Cleans up the intra flag in the existing CFG
Moves blocks' statement list to an IntrusiveList which allows getting the successor or predecessor of any list element in O(1)

This means the CFG part of the analysis domain is just a reference to an IL procedure, statement or command, and we have a pure function that returns the adjacent set of positions in any given direction; e.g. forwards or backwards in control flow.

See docs/il-cfg.md for more details.

l-kent · 2023-11-16T01:34:17Z

It'll be easier to give feedback for this once you've merged in the changes from #140

l-kent · 2023-11-16T01:46:14Z

src/main/scala/ir/Block.scala

+         removeJump(g)
+         addJump(f)
+       }
+       case (_, _) => throw Exception("Programmer error: can not replace jump with call or vice versa")


Why is this not allowed? It is probably necessary for resolving jump tables, where an indirect call has to be resolved to gotos, not direct calls.

ailrst · 2023-11-28T09:08:06Z

The nondeterministic GoTo allows a having no targets, what is the semantics of that in boogie / what does it get translated to? I assume if it is translated naively boogie would fall through to the next block, but cant find right now what it actually gets translated to.

l-kent · 2023-11-29T05:12:55Z

src/main/scala/ir/Program.scala

+  def removeCaller(c: Call): Unit = {
+    _callers.add(c)
+  }


This isn't doing what it says it does

l-kent · 2023-11-29T05:18:26Z

src/main/scala/ir/Program.scala

+  /* returnBlock: Single return point for all returns from the procedure to its caller; always defined but only included
+   * in `blocks` if the procedure contains any real blocks.*/
+  val returnBlock: Block = new Block(name + "_return", None, List(), new IndirectCall(Register("R30", BitVecType(64)), None, Some(name)))
+  returnBlock.setParent(this)


Why does this need to be defined for every procedure? Not all procedures will have any returns.

l-kent · 2023-11-29T05:22:49Z

src/main/scala/ir/Program.scala

+  /**
+   * Horrible, compensating for not storing the blocks in-order and storing initialBlock and returnBlock separately.
+   * @return
+   */
+  def blocks: Seq[Block] =
+    (entryBlock match
+      case Some(b) if _blocks.nonEmpty => Seq(b) ++ _blocks.filter(x => x ne b).toSeq
+      case _ => _blocks.toSeq)
+     ++ (if _blocks.nonEmpty then Seq(returnBlock) else Seq())


Why does returnBlock need to be stored separately? This also would seem to be non-deterministic, since _blocks is a Set, which is very undesirable.

Having this be reconstructed every time Procedure.blocks is called seems terribly inefficient. It should instead just provide an iterator over _blocks?

Its a set for O(1) insertion removal if we modify the IL. My understanding is the block order does not have any meaning beyond the first block, Set's order is not defined but is deterministic it seems. I don't think its neccessary for return block to be stored separately, a few things can probably be cleaned up by just inserting it when the transformation is done.

The block order doesn't have any meaning beyond the first block, but using a Set does mean the output order is fragile which is very undesirable for being able to track changes to the output. It also makes the output harder to follow if we further lose correspondence to the original BAP input.

We don't even really care about arbitrary removal right now - the only time blocks are removed at present is when external methods are stubbed out (removing all blocks at once), and anything else is likely to involve iterating over all blocks, so we could really just use a ListBuffer to get O(1) insertion/removal.

l-kent · 2023-11-29T05:27:15Z

src/main/scala/ir/Program.scala

+                  var name: String,
+                  var address: Option[Int],
+                  var entryBlock: Option[Block],
+                  private val _blocks: mutable.HashSet[Block],


What's the point of making this a set? This seems like it will only cause problems for us, since we want to output blocks in a consistent order.

My understanding is the block order does not have any meaning beyond the first block. The set removes the notion of an index at all and gives removal using the object identity instead, while also being faster. But I guess that only matters under heavy modification. A hashset should still have an undefined but deterministic order, although we could use a linkedhashset instead.

LinkedHashSet would be another good option yeah

Although that still requires storing the entry block separately which still makes creating the iterator awkward, so I'd really just lean towards ListBuffer. I guess most cases where we iterate over all blocks don't care whether the entry block is first, except the final output, so taking that into account would allow for other possibilities.

l-kent · 2023-12-01T02:12:21Z

Broadly, this approach is definitely a big improvement over what existed before by making it so there isn't the difficulty in trying to relate the analysis results back to the IR, as the IR elements are now directly used in the CFG. However, it still needs to calculate the predecessor and successor sets for everything, even simple traversal of the statements within a block which seems to result in things still being rather convoluted in some ways.

I want to try modifying the analysis solvers, which should allow for things to be significantly simplified.

ailrst

I's really like to replace 'CFGPosition' with a sealed trait but that would mean moving procedure, block, and command into one file.

I think this should be OK to merge now, hopefully it doesn't break anything.

After merging we need to sort out the interprocedural situation.

I think it would be useful for have a deep-copy function for the mutable part of the IL for e.g. temporary analyses and inlining.

Wrt. the return blocks; a return block is always added to the procedure but its not always reachable.

ailrst · 2024-01-23T05:23:21Z

src/main/scala/ir/Program.scala

@@ -116,52 +120,265 @@ class Program(var procedures: ArrayBuffer[Procedure], var mainProcedure: Procedu
    initialMemory = initialMemoryNew
  }

+  class ILUnorderedIterator(private val begin: Program) extends Iterator[CFGPosition] {


@l-kent Do you want to keep this in? it's just a convenient way to .map() and .filter() etc. the program

I think you should make it clear what order it traverses the blocks within a procedure in, for instance (otherwise it could be prone to misuse). Would the idea to just be use .map() with the results of an analysis? I'm not entirely sure how useful that is

I just commented it, its a very special case visitor I guess, probably only useful in very specific cases to map over the whole program, e.g. collect indirect calls and see if they were resolved. E.g. a simpler way to get the analysis domain.

l-kent · 2024-01-24T04:15:40Z

src/main/scala/util/RunUtils.scala

+    def newSolverTest(): Unit = {
+      val ilcpsolver = IRSimpleValueAnalysis.Solver(IRProgram)
+      val newCPResult = ilcpsolver.analyze()
+
+      val newCommandDomain = ilcpsolver.domain.filter(_.isInstanceOf[Command])
+
+      val newRes = newCPResult.flatMap((x, y) => y.flatMap {
+        case (_, el) if el == FlatLattice[BitVecLiteral].top || el == FlatLattice[BitVecLiteral].bottom => None
+        case z => Some(x -> z)
+      })
+      val oldRes = constPropResult.filter((x,y) => x.isInstanceOf[CfgNodeWithData[CFGPosition]]).flatMap((x, y) => y.flatMap {
+        case (_, el) if el == FlatLattice[BitVecLiteral].top || el == FlatLattice[BitVecLiteral].bottom => None
+        case z => Some(x.asInstanceOf[CfgNodeWithData[Any]].data -> z)
+      })
+      val both = newRes.toSet.intersect(oldRes.toSet)
+      val notnew = (newRes.toSet).filter(x => !both.contains(x)).toList.sorted((a, b) => a._2._1.name.compare(b._2._1.name))
+      val notOld = (oldRes.toSet).filter(x => !both.contains(x)).toList.sorted((a,b) => a._2._1.name.compare(b._2._1.name))
+      // newRes and oldRes should have value equality
+
+      //config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, newCPResult), s"${s}_newconstprop$iteration.txt"))
+      config.analysisResultsPath.foreach(s => writeToFile(toDot(IRProgram), s"program.dot"))
+      config.analysisResultsPath.foreach(s => writeToFile(toDot(IRProgram, newCPResult.map((k,v) => (k, v.toString))), s"program-constprop.dot"))
+
+      config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, newCPResult), s"${s}_new_cpres$iteration.txt"))
+      config.analysisResultsPath.foreach(s => writeToFile(printAnalysisResults(IRProgram, cfg, constPropResult), s"${s}_old_cpres$iteration.txt"))
+
+    }
+    newSolverTest()


This should probably be in a test suite or something instead of being a core part of the analysis? Change that and then we can merge everything in.

I've just removed this and made it run the analysis and output the results like the other constprop analysis. The results won't be the same anyway because this handles calls more carefully.

That makes sense. What's the difference with handling calls, does it do some sort of havoc instead of just skipping over them in the intraprocedural analysis?

l-kent · 2024-01-24T06:55:13Z

It should be good to merge now, just make sure that the expected files are updated properly and haven't gotten muddled up in the merges.

Implement a simple way of getting the parent, successor and predecessor of any program, block, and command in the IL. Adds a TIP-style `Dependencies` trait that uses this instead of the CFG, this works well for intraprocedural analyses but interprocedural iteration is not correct.

ailrst added 16 commits November 7, 2023 11:42

move il classes to separate files

6e68244

move il classes to separate files

0acf9b7

add parent references to il

c3f63b6

initial intrusive list work

ff30667

refactor il to use intrusive list and fix intrusive list

aeba762

initial work to add backwards intraproc links to IL

c270d15

add assertions

118ecff

fix jump order issue

2e4f9e8

add procedure called-by links

d6dafeb

move intra to trait param on dependencies

3a2d6bd

implement prototype IL constprop

a3b2c88

cleanup

cd355dd

explanation

51ed576

minor edit

1ee3c3f

edit

b7d9a47

output il cfg

a6af61e

l-kent reviewed Nov 16, 2023

View reviewed changes

l-kent mentioned this pull request Nov 16, 2023

Analysis Cleanup and Explicit Types #144

Merged

ailrst added 10 commits November 21, 2023 17:25

undo split up block and procedure

0dee8b9

merge boogie style control flow

688c2d6

update expected

c57b73a

cleanup

b2eb839

add procedure return block and distinct entry block

c5c6ed4

handle call in constprop

ddb0269

Merge branch 'main' into il-cfg-iterator

465a49e

fix broken tests & cleanup

c04a1f1

udpate expected

678e3d3

cleanup

0fe3041

l-kent reviewed Nov 29, 2023

View reviewed changes

simple review fixes

8693a29

ailrst added 14 commits December 1, 2023 13:09

cleanup entry/exit blocks somewhat

a816ecb

fix

f628b00

proc and block graphs

72f3bc8

stuff

ba65e4d

simpler analysis result printer

a94fd52

fix result printer and block

353e1ba

merge main

c9cd919

simplify Dependencies

02aeae6

update expected

b031e9c

cleanup intrusivelist typecasts

68362ae

format

9ccb025

cleanup

45688e6

fix explanation

4158d49

fix clearblocks

1f78b30

ailrst commented Jan 23, 2024

View reviewed changes

ailrst marked this pull request as ready for review January 23, 2024 05:32

Merge branch 'main' into il-cfg-iterator

2851650

l-kent reviewed Jan 24, 2024

View reviewed changes

cleanup comments

fa65d14

update expected

2b088f3

ailrst merged commit 7ad7a20 into main Jan 24, 2024
1 check passed

ailrst deleted the il-cfg-iterator branch July 3, 2024 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Il cfg iterator #141

Il cfg iterator #141

ailrst commented Nov 14, 2023 •

edited

Loading

l-kent commented Nov 16, 2023

l-kent Nov 16, 2023

ailrst commented Nov 28, 2023

l-kent Nov 29, 2023

l-kent Nov 29, 2023

l-kent Nov 29, 2023

ailrst Dec 1, 2023

l-kent Dec 1, 2023

l-kent Nov 29, 2023

ailrst Dec 1, 2023 •

edited

Loading

l-kent Dec 1, 2023

l-kent Dec 1, 2023

l-kent commented Dec 1, 2023

ailrst left a comment •

edited

Loading

ailrst Jan 23, 2024

l-kent Jan 24, 2024

ailrst Jan 24, 2024

l-kent Jan 24, 2024

ailrst Jan 24, 2024

l-kent Jan 24, 2024

l-kent commented Jan 24, 2024

Il cfg iterator #141

Il cfg iterator #141

Conversation

ailrst commented Nov 14, 2023 • edited Loading

l-kent commented Nov 16, 2023

Choose a reason for hiding this comment

ailrst commented Nov 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ailrst Dec 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Dec 1, 2023

ailrst left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-kent commented Jan 24, 2024

ailrst commented Nov 14, 2023 •

edited

Loading

ailrst Dec 1, 2023 •

edited

Loading

ailrst left a comment •

edited

Loading