How to Represent Intermediate Path Nodes in Taint Analysis with CodeQL? #16029
-
I've been using CodeQL for about a month now, developing some static analysis queries. My current endeavor involves understanding how to represent intermediate path nodes between a PathNode source and a PathNode sink. I am considering leveraging the isBarrier() function, as it traverses all nodes during data flow analysis, which could potentially include the intermediate nodes I'm interested in. However, I'm curious if there's a more direct method to obtain these PathNode middle nodes. The documentation (https://codeql.github.com/codeql-standard-libraries/cpp/semmle/code/cpp/dataflow/internal/DataFlowImplLocal.qll/type.DataFlowImplLocal$PathNode.html) describes a PathNode as "A Node augmented with a call context (except for sinks), an access path, and a configuration. Only those PathNodes that are reachable from a source, and which can reach a sink, are generated." This suggests a representation for any node on a data flow path, not just sources and sinks. Yet, in the official examples (https://github.blog/changelog/2023-08-14-new-dataflow-api-for-writing-custom-codeql-queries/), PathNode seems to serve as a substitute for Dataflow::Node (with the addition of displaying paths in vscode) and is only used to represent sources or sinks. This distinction has left me a bit puzzled, and I believe there should be a direct representation of intermediate nodes in data flow analysis. Could this be achieved through an API, for example, MyFlow::hasMidNode(source, sink, mid)? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I'm sorry that I maybe missed some documents like (https://codeql.github.com/codeql-standard-libraries/cpp/codeql/dataflow/DataFlow.qll/type.DataFlow$DataFlowMake$MergePathGraph$PathNode.html). However, I still don't know how to get the "middle PathNode" between source and sink. |
Beta Was this translation helpful? Give feedback.
-
Could you explain what you actually want to achieve? You can get a mid node, by calling the PathNode getASuccessor predicate on your source and checking that the result is not equal to your sink. |
Beta Was this translation helpful? Give feedback.
-
There are limitations on what you can do with path nodes. The most obvious thing you can do is require a certain node to be on a path. This is done by using flow states in your flow configuration, making all source nodes have state 1, all sink nodes have state 2, and an additional flow step at the node of your choice which converts the state from state 1 to state 2. And you mentioned the ability to make sure a particular node is not on the path by using barriers. But in both cases, these restrictions are put in place first and then paths are calculated. I don't think there's any way to calculate paths and then examine them - you can only put restrictions in place before the paths are determined. |
Beta Was this translation helpful? Give feedback.
The way that the dataflow library works does not make this feasible. It finds (source, sink) pairs where there is some path between them, but the paths are actually not constructed in CodeQL. They are constructed after the CodeQL has finished running. So even knowing how many paths there are between a given source and sink is not possible in CodeQL. And, for similar reasons, it isn't possible to talk about the nodes on one particular path.