Add alias analysis #194

LukasKlenner · 2024-05-02T18:07:26Z

Note: This pull request is based on #178 and should only be merged after that pull request.

This pull request adds two alias analyses for the alias properties added in #171: One intraprocedural alias analysis and one based on the existing points-to analysis.

errt · 2024-06-09T09:13:50Z

...rc/main/scala/org/opalj/tac/fpcf/analyses/alias/AllocationSiteAndTacBasedAliasAnalysis.scala

+                stmt match {
+                    case goto: Goto =>
+                        val targetPC = tac.stmts(goto.targetStmt).pc
+                        if (targetPC <= pc && goto.pc >= pc) return false // jumping from behind the allocation site in front of it -> might break aliasing because allocation site is executed multiple times


This is inefficient and incorrect:

Inefficient:

Checking every statement whether it is a goto, instead of using already available information (CFG, dominator tree)

Matching on the type of the instruction instead of its astID, which allows for an integer check instead of a type check

Extracting pcs instead of just working with stmt ids, which are ordered the same way

Incorrect:

Would interpret some linear control flow as loops (not unsound, but still):

... 5: goto 8 6: allocation site 7: goto 9 8: goto 6 ...

Misses other branching instructions, such as if and switch, so it is unsound as it may miss loops

A better way to do this is to check whether a jump is dominated by its target. That is still not perfect (it misses non-natural loops). OPAL provides facilities to compute dominator trees and check dominance.

If I understand it correct, a better way to do it would be iterate over all basic block with a endPC >= allocationSitePC (don't care what previous bb's do) and check whether they are dominated by a bb in front of the allocation site? I don't see a way to directly check "whether a jump is dominated by its target". That would still require me to iterate over all goto statements to identify them. Or should i keep iterating over the statements and only perform the check, whether the goto is problematic, using the dominatorTree? Regarding the implementation: To check the dominance I only found the DominatorTree.foreachDom(int)(int => U) method. How can I map the given int back to the associated bb to check whether that is a problematic dominance?

You need to do basic blocks before the allocation, too. See the example above how (linear control flow) jumps can reorder code irrespective of program counters.
Jumps are only at the end of basic blocks, so you don't need to look for gotos (and also, you get all kinds of jumps not just gotos).
There is method strictlyDominates on DominatorTree (strictly just means it excludes the BB itself).
I can't come up with the correct procedure from my head just now, but I suppose it will be something like: Is there a dominator (loop header) of the allocation (loop body) which has a predecessor that it dominates (back edge).

Sounds good. So far I have this code:

// check if the allocation site is dominated by a loop header, i.e., is executed multiple times domTree.foreachDom(cfg.bb(tac.properStmtIndexForPC(pc)).nodeId)(dom => { // check if the dominator is a loop header cfg.foreachPredecessor(cfg.allBBs.toSeq(dom).nodeId)(pred => { // if the dominator itself dominates one of its predecessors, it is a loop header if (domTree.strictlyDominates(dom, cfg.bb(pred).nodeId)) { return false } }) })

It checks whether the allocation site is dominated by a loop header. It works so far, but overapproximates cases where the allocation site is after the loop and not inside of it. E.g.:

public static void mustAliasLoopInFront() { for (int i = 0; i < 10; i++) { [...] } Object o2 = new Object(); o2.hashCode(); }

It computes that (o2,o2) is a may alias because the definition site is dominated by a loop header. Do you know whether it is possible to detect such cases, e.g., by checking if the allocation site is behind all predecessors of the loop header? Or would that cause other problems?

I guess also check that the backedge is a postdominator of the allocation site?

I have now found a way to access the postDominatorTree. It currently uses the same way used in the TACAIProvider class. But i am not sure if it is possible to reuse the results used for the tacai instead of calculating them again.

I think that is okay for now. We can check whether that computation is performance-sensitive for the analysis, but I don't expect it to be as long as you don't recompute it multiple times per method.

I have just implemented that they are being cached after the first computation

Note that TACode.cfg.dominatorTree is also recomputed every time and that doesn't seem to be a problem usually.

I think the code you have now will produce different kinds of trees for the dominator and post dominators: the dominator tree from the CFG should be a TAC-based one with Stmts in it, while the post dominator tree you compute is bytecode-based with bytecode Instructions in it, right? That might cause issues.

Good point, that's true (interesting that it still passes all tests* :D). I can of course map the values back to their pcs for the postdominator tree like so: postDomTree.strictlyDominates(cfg.bb(dom).startPC, cfg.bb(allocBB).startPC). But a version of the PostDominatorTree that is based on the tac would probably be better.

* They do fail if I change something else

...n/scala/org/opalj/tac/fpcf/analyses/alias/pointsto/TypePointsToBasedAliasAnalysisState.scala

...in/scala/org/opalj/tac/fpcf/analyses/alias/pointsto/AbstractPointsToBasedAliasAnalysis.scala

OPAL/tac/src/main/scala/org/opalj/tac/fpcf/analyses/alias/package.scala

OPAL/tac/src/main/scala/org/opalj/tac/fpcf/analyses/alias/TypeBasedAliasSet.scala

OPAL/tac/src/main/scala/org/opalj/tac/fpcf/analyses/alias/AliasSetLike.scala

OPAL/tac/src/main/scala/org/opalj/tac/fpcf/analyses/alias/AllocationSiteBasedAliasSet.scala

OPAL/tac/src/main/scala/org/opalj/tac/fpcf/analyses/alias/AbstractAliasAnalysis.scala

OPAL/br/src/main/scala/org/opalj/br/fpcf/properties/alias/AliasEntity.scala

...rc/main/scala/org/opalj/tac/fpcf/analyses/alias/AllocationSiteAndTacBasedAliasAnalysis.scala

…re/alias-analysis

errt · 2024-08-08T08:48:43Z

Please fix the merge conflicts listed below.

…e a loop

…ysis # Conflicts: # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/ArrayAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/FieldAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/NullAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/ParameterAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/ReturnValueAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/StaticFieldAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/UVarAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/properties/alias/MayAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/properties/alias/MustAlias.java # DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/properties/alias/NoAlias.java # OPAL/br/src/main/scala/org/opalj/br/fpcf/properties/alias/AliasEntity.scala

…d; fix formatting

...rc/main/scala/org/opalj/tac/fpcf/analyses/alias/AllocationSiteAndTacBasedAliasAnalysis.scala

OPAL/tac/src/main/scala/org/opalj/tac/fpcf/analyses/alias/AliasSetLike.scala

errt · 2024-08-11T10:26:11Z

I think I'm mostly happy with this PR now and will approve and merge it soon. Just have a final look into the post-dominator tree issue please. It seems very weird to do dominator and post dominator checks using different underlying representations. It would be much nicer to also use a TAC-based post-dominator tree.
I think your original strategy of borrowing from CFG's dominatorTree method should be the way to go, we'd just need to figure out why it failed in your first attempt. As far as I can tell, the foreachSuccessor method in CFG should never invoke the given function with -1, because the call in line 455 is guarded by the check in 453 and the one in 444 uses the next pc, which is the given pc + 1. It should probably be easy to debug where that -1 came from.
On another note, instead of caching the trees yourself, it might be worthwile to consider converting the dominatorTree (and postDominatorTree, once implemented) method on CFG to a lazy val. That way, no complex ad-hoc caching logic is needed and other analyses, etc., also using the dominator/post-dominator trees would also not need to recompute it.

…omputations

…tate

LukasKlenner · 2024-08-13T13:23:29Z

I have managed to fix my initial approach of adding a postDominatorTree method in the cfg class. The problem was that I directly used the id's of the special exit nodes, which are -1 and -2. I am now using the ids of their predecessors, which seems to work fine: val exitNodes = normalReturnNode.predecessors.map(_.nodeId) ++ abnormalReturnNode.predecessors.map(_.nodeId).

I have also converted the dominatorTree and postDominatorTree methods to lazy val as suggested.

However, the same test which made us use a postDominatorTree is still failing. The problem is, that an exception can be thrown inside the loop which creates an edge to the abnormal return node. This again causes the immediate post dominator of the loop header to be the unique exit node of the postDom tree instead of the bb after the loop, breaking our "behind-loop" check.

errt · 2024-09-06T09:25:16Z

Can you have another look at this? It is good work and I'd like to merge it, but I can't as long as there are still failing tests.

LukasKlenner · 2024-09-12T12:09:38Z

The remaining problem is the handling of abnormal returns inside the loop and I am not sure how to solve it. I would for now change the expected values of the two failing tests to MayAlias. If a more mature loop detection is implemented in the future this could be used here to improve the precision.

add alias analysis

885fe71