Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alias analysis #194

Open
wants to merge 37 commits into
base: develop
Choose a base branch
from

Conversation

LukasKlenner
Copy link
Contributor

Note: This pull request is based on #178 and should only be merged after that pull request.

This pull request adds two alias analyses for the alias properties added in #171: One intraprocedural alias analysis and one based on the existing points-to analysis.

stmt match {
case goto: Goto =>
val targetPC = tac.stmts(goto.targetStmt).pc
if (targetPC <= pc && goto.pc >= pc) return false // jumping from behind the allocation site in front of it -> might break aliasing because allocation site is executed multiple times
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inefficient and incorrect:

Inefficient:

  • Checking every statement whether it is a goto, instead of using already available information (CFG, dominator tree)
  • Matching on the type of the instruction instead of its astID, which allows for an integer check instead of a type check
  • Extracting pcs instead of just working with stmt ids, which are ordered the same way

Incorrect:

  • Would interpret some linear control flow as loops (not unsound, but still):
...
5: goto 8
6: allocation site
7: goto 9
8: goto 6
...
  • Misses other branching instructions, such as if and switch, so it is unsound as it may miss loops

A better way to do this is to check whether a jump is dominated by its target. That is still not perfect (it misses non-natural loops). OPAL provides facilities to compute dominator trees and check dominance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correct, a better way to do it would be iterate over all basic block with a endPC >= allocationSitePC (don't care what previous bb's do) and check whether they are dominated by a bb in front of the allocation site? I don't see a way to directly check "whether a jump is dominated by its target". That would still require me to iterate over all goto statements to identify them. Or should i keep iterating over the statements and only perform the check, whether the goto is problematic, using the dominatorTree? Regarding the implementation: To check the dominance I only found the DominatorTree.foreachDom(int)(int => U) method. How can I map the given int back to the associated bb to check whether that is a problematic dominance?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do basic blocks before the allocation, too. See the example above how (linear control flow) jumps can reorder code irrespective of program counters.
Jumps are only at the end of basic blocks, so you don't need to look for gotos (and also, you get all kinds of jumps not just gotos).
There is method strictlyDominates on DominatorTree (strictly just means it excludes the BB itself).
I can't come up with the correct procedure from my head just now, but I suppose it will be something like: Is there a dominator (loop header) of the allocation (loop body) which has a predecessor that it dominates (back edge).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. So far I have this code:

// check if the allocation site is dominated by a loop header, i.e., is executed multiple times
domTree.foreachDom(cfg.bb(tac.properStmtIndexForPC(pc)).nodeId)(dom => {

    // check if the dominator is a loop header
    cfg.foreachPredecessor(cfg.allBBs.toSeq(dom).nodeId)(pred => {

        // if the dominator itself dominates one of its predecessors, it is a loop header
        if (domTree.strictlyDominates(dom, cfg.bb(pred).nodeId)) {
            return false
        }
    })

}) 

It checks whether the allocation site is dominated by a loop header. It works so far, but overapproximates cases where the allocation site is after the loop and not inside of it. E.g.:

public static void mustAliasLoopInFront() {
    for (int i = 0; i < 10; i++) {
        [...]
    }

    Object o2 = new Object();

    o2.hashCode();
}

It computes that (o2,o2) is a may alias because the definition site is dominated by a loop header. Do you know whether it is possible to detect such cases, e.g., by checking if the allocation site is behind all predecessors of the loop header? Or would that cause other problems?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess also check that the backedge is a postdominator of the allocation site?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now found a way to access the postDominatorTree. It currently uses the same way used in the TACAIProvider class. But i am not sure if it is possible to reuse the results used for the tacai instead of calculating them again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is okay for now. We can check whether that computation is performance-sensitive for the analysis, but I don't expect it to be as long as you don't recompute it multiple times per method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just implemented that they are being cached after the first computation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that TACode.cfg.dominatorTree is also recomputed every time and that doesn't seem to be a problem usually.

I think the code you have now will produce different kinds of trees for the dominator and post dominators: the dominator tree from the CFG should be a TAC-based one with Stmts in it, while the post dominator tree you compute is bytecode-based with bytecode Instructions in it, right? That might cause issues.

Copy link
Contributor Author

@LukasKlenner LukasKlenner Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that's true (interesting that it still passes all tests* :D). I can of course map the values back to their pcs for the postdominator tree like so: postDomTree.strictlyDominates(cfg.bb(dom).startPC, cfg.bb(allocBB).startPC). But a version of the PostDominatorTree that is based on the tac would probably be better.

* They do fail if I change something else

@errt
Copy link
Collaborator

errt commented Aug 8, 2024

Please fix the merge conflicts listed below.

…ysis

# Conflicts:
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/ArrayAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/FieldAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/NullAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/ParameterAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/ReturnValueAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/StaticFieldAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/fixtures/alias/UVarAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/properties/alias/MayAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/properties/alias/MustAlias.java
#	DEVELOPING_OPAL/validate/src/test/java/org/opalj/fpcf/properties/alias/NoAlias.java
#	OPAL/br/src/main/scala/org/opalj/br/fpcf/properties/alias/AliasEntity.scala
@errt
Copy link
Collaborator

errt commented Aug 11, 2024

I think I'm mostly happy with this PR now and will approve and merge it soon. Just have a final look into the post-dominator tree issue please. It seems very weird to do dominator and post dominator checks using different underlying representations. It would be much nicer to also use a TAC-based post-dominator tree.
I think your original strategy of borrowing from CFG's dominatorTree method should be the way to go, we'd just need to figure out why it failed in your first attempt. As far as I can tell, the foreachSuccessor method in CFG should never invoke the given function with -1, because the call in line 455 is guarded by the check in 453 and the one in 444 uses the next pc, which is the given pc + 1. It should probably be easy to debug where that -1 came from.
On another note, instead of caching the trees yourself, it might be worthwile to consider converting the dominatorTree (and postDominatorTree, once implemented) method on CFG to a lazy val. That way, no complex ad-hoc caching logic is needed and other analyses, etc., also using the dominator/post-dominator trees would also not need to recompute it.

@LukasKlenner
Copy link
Contributor Author

I have managed to fix my initial approach of adding a postDominatorTree method in the cfg class. The problem was that I directly used the id's of the special exit nodes, which are -1 and -2. I am now using the ids of their predecessors, which seems to work fine: val exitNodes = normalReturnNode.predecessors.map(_.nodeId) ++ abnormalReturnNode.predecessors.map(_.nodeId).

I have also converted the dominatorTree and postDominatorTree methods to lazy val as suggested.

However, the same test which made us use a postDominatorTree is still failing. The problem is, that an exception can be thrown inside the loop which creates an edge to the abnormal return node. This again causes the immediate post dominator of the loop header to be the unique exit node of the postDom tree instead of the bb after the loop, breaking our "behind-loop" check.

@errt
Copy link
Collaborator

errt commented Sep 6, 2024

Can you have another look at this? It is good work and I'd like to merge it, but I can't as long as there are still failing tests.

@LukasKlenner
Copy link
Contributor Author

The remaining problem is the handling of abnormal returns inside the loop and I am not sure how to solve it. I would for now change the expected values of the two failing tests to MayAlias. If a more mature loop detection is implemented in the future this could be used here to improve the precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants