Skip to content

Data Flow Graph + Slicing Basics #521

Open
@Domiii

Description

@Domiii
  • This issue keeps track of (most) aspects of Dbux's big Data Flow feature.

DataFlowView

  • List
  • Allow switching between value and access
  • Allow filtering: Read, Write, Both (DataNode.type)
  • If trace has dataNodes.length > 1, allow selecting individual dataNode
    • NOTE: requires changing buildDataNodes(trace) to buildDataNodes(dataNode)
    • fire traceSelection event with additional param dataNodeId
  • Smarter coherence when selecting different traces:
    • when clicking a trace, automatically choose the dataNode that participated in previous listing (if there is such a node)
  • Button Icons
  • Don't show same trace multiple times
  • (future work) How to best visualize data flow in Call Graph (without a lot of extra coding effort)?

New ValueView

  • Fix ValueCollection.serialize to only record "new data"
  • Add DataNode.type to distinguish between Read, Write and Delete
  • Upon any Write-type DataNode, make sure, we have enough information, to re-construct the final value
  • Do not deserialize anymore in ValueRefCollection
  • Remove getTraceValue function, and replace with constructValueObjectShallow
  • Produce time- and memory- optimized algorithm to re-construct the value of reference types for any trace using those partial recording
    • if not object: get the actual value of a node from first node of same valueId
    • if object: get the actual value by taking the first node's value, and then apply all Write + Delete operations that followed that value
  • Add two inline buttons to each value
    • "Go to write trace" (the latest write of that value before currently selected DataNode)
    • "Go to value creation" (go to first DataNode of same valueId)
  • future-work: Add constructValueObjectFull for rendering/exporting features

Bugs

  • DebugTDNode displays the wrong TraceType name
  • ValueTDNode always displays undefined -> make sure dataProviderUtil uses dataNode to get value and valueId, not trace
  • first DataFlowNode displayed in DataFlowView is incorrect when selecting = in var x = 3 + 5
    • should be 3 + 5 but is 3
  • undefined values don't have a valueId
    • test with calls0.js: during first call of f, investigate its parameters a and b's valueId
      • NOTE: same with g's second parameter
  • When using getCallerTraceOfContext, we cannot be sure if it is the BCE of the current context or not. E.g.: when inside the callback of Array.map, it would return the BCE of Array.map, which is not the actual caller of the callback's context.
    • In RuntimeDataProvider, replace use of getCallerTraceOfContext with a new utility function getOwnCallerTraceOfContext
    • add new getFunctionDefinitionTrace(contextId)
      • Get the callee of the function via bceTrace.data.calleeTid
      • Return the first trace by callee -> valueId which has TraceType.FunctionDe{finition,claration}
    • NOTE: getOwnCallerTraceOfContext calls getCallerTraceOfContext, but then returns null, if the callee (a function object) is not that function (i.e. context):
    • See peekBCECheckCallee
    • Get functionTrace = getFunctionDefinitionTrace(contextId);
    • Return null if functionTrace does not exist or its staticTrace.data.staticContextId does not match the context's staticContextId
    • NOTE2: This is to avoid trying to connect data between caller and function, where the caller is not the actual function (e.g. Array.map(f))

Trace all files, even in node_modules

  • Change default babel include behavior to include all
  • Add package blacklist to @dbux/cli
  • Make sure, @dbux/cli properly instruments all modules, even if they were used by the @dbux/cli itself
  • add some analytical tools to better understand:
    • which modules were loaded and when
    • which modules were require'd/imported vs. which modules were actually instrumented
    • which modules are not instrumented

Data flow involving function calls

When a function is called, we want to add edges to data flow graph:

  • [Parameters and arguments] from CallExpression arguments to Params
    • basics
    • add SpreadElement support [es6]
    • add RestElement support [es6]
  • capture function closure variable declarationTids
    • NOTE: else, local-scope variables inside loops will share a single declarationTid just like a hoisted var, but they actually lead to one copy per iteration
  • for that, we also need to take care of bind, call, apply etc...

We can do that in post-processing using the following procedures.
We assume:

  • callTrace, callStaticTrace and callId to be the BCE's trace, staticTrace and traceId
  • contextId to be that of the called function

NOTE: Built-ins don't have a context and thus will require a different approach (see below)

Return value

When encountering CallExpressionResult:

  • add data flow from ReturnArgument to CallExpressionResult

Alorithm:

  1. returnTraceId = id of first (and only) ReturnArgument trace of contextId.
  2. Set CallExpressionResult's input = [returnTraceId];

Parameters and arguments

See ExecutionContextCollection.setParamInputs.

Data Flow Graph Instrumentation Basics

  • es5
    • Declare + define variables
    • CallExpression + BCE
    • FunctionDeclaration
    • FunctionExpression
    • ArithmeticExpression
    • MemberExpression rvals
    • MemberExpression lvals
    • {Object,Array}Expression
    • UpdateExpression
    • TemplateLiteral
    • IfStatement
    • SwitchStatement
    • SwitchCase
    • ConditionalExpression
    • SequenceExpression
    • ThrowStatement
    • CatchClause
    • WhileStatement
    • DoWhileLoop
    • ForStatement
    • ForInStatement
    • undefined, NaN, Infinity
    • this
    • fix value of AssignmentExpression, if operator !== '='
    • variable redeclaration in same scope (addOwnDeclarationTrace)
    • arguments (Post-processing for arguments #542)
    • global declarationTids
    • handle module + module.exports properly
      • establish data link between: require <-> {module.,}exports
    • delete o[x]
    • typeof
  • es6
    • default parameters
    • ArrowFunctionExpression
    • ObjectMethod
      • kind = get, kind = set
    • SpreadElement (Function, ObjectExpression, ArrayExpression)
    • import <-> export
    • Class{Expression,Declaration}
      • ClassProperty
      • ClassMethod
      • private members (NOTE: private members do not support dynamic access)
      • instance and prop initializers should run in the context of the ctor
      • super
    • ForOfStatement
    • async functions
    • generator functions
    • Decorator
    • Destructuring assignments
      • ObjectPattern
      • ArrayPattern
      • AssignmentPattern
      • RestElement
  • built-ins (Instrumentation of built-ins for data flow analysis #543)

Value identity

Compute two new ids:

  • dataNode.accessId
  • dataNode.valueId (getValueId)

"Value identity" refers to a uid (valueId) that can uniquely identify a value:

  • For object, array, function ("reference types" or "object types")
    • It is dataNode.refId
    • !!refId is always true
    • NOTEs:
      • refId is assigned in runtime.
      • It is the traceId that first recorded that object.
  • For non-"object types", we, similarly, want to determine the traceId of when that value first came into existence.
    • Consider that many traces just access and move existing values, and do not actually create new values.
    • The algorithm is explained in getValueId.
    • !!refId is always false

inputs

  • Fix up inputs: let runtime record nodeId (instead of traceId)
  • Fix up Value Identitiy algorithm to use dataNodeId instead of traceId

ReferencedIdentifier vs. MemberExpression

  • any ReferencedIdentifier refers to a variable name

    • -> dataNode.varAccess contains declarationTid (traceId that declared (or first recorded) the variable)
    • accessId = makeUid(declarationTid)
  • any ME (MemberExpression) refers to accessing some object's (or other value, such as int or string) property

    • example: f(x)[g(y)], where object = f(x) and property = g(y)
    • varAccess consists of:
      • objTid (traceId of object)
      • prop (value of property)
    • accessId = makeUid(${getValueId(objTid)}#${prop})
  • TODO: makeUid should use a Map to convert that string into a number; to more easily maintain AccessIdIndex

getValueId

Goal: Write getValueId function -

function getValueId(dataNodeId) { ... }

We can compute valueId by post-processing in DataNodeCollection.postAddRaw.
Once computed, we want to store it in dataNode.valueId.

If a value is an object (that is, if it has a refId), return the first traceId of same refId.

If a value is not an object, the algorithm is based on each DataNode's Trace's TraceType, as follows:

  • Default
    • Steps
      • -> if staticTrace.dataNode.isNew: valueIdentity = traceId
      • -> else if inputs.length > 0: valueIdentity = getValueId(inputs[0])
      • -> else: lookup last entry before this one in DataNodesByAccessIdIndex by dataNode.accessId
    • affected TraceTypes include (but not limited to):
      • Literal
        • NOTE: always new
      • Declaration
        • NOTE: always implies a "new" undefined value (if not initialized)
      • ExpressionResult, ExpressionValue
        • who? ArithmeticExpression
      • CallExpressionResult
        • Post-processed to get inputs. See Return value section above.
        • Does not have inputs if (i) call did not return a value, or (ii) if we were not able to trace function correctly.
      • Param
        • Post-processed to get inputs. See Parameters and arguments section above.
      • WriteVar
      • WriteME
      • ReturnArgument, ThrowArgument, AwaitArgument
      • Identifier, ME (MemberExpression)
        • Does not have inputs. Always has accessId.
  • BeforeCallExpression -> same as CallExpressionResult (resolve in CallExpressionResult)
    • -> don't assign a valueIdentity.
    • NOTE: special handling in UI - BCE rendering should (for the most part) reflect CallExpressionResult

More Future Work

Trace object keys as data

  • trace always, even if not computed
  • if !computed, isNew = true, when key is added to object the first time
  • if computed, isNew = false, and property is input
    • this DataNode's ValueIdentity has 2 parents, making Data Flow View less intuitive
  • key becomes rval via: for-in, Object.keys, Object.entries, etc.

Metadata

Metadata

Assignees

Labels

a-lot-of-workThis issue requires a lot of work, and is by no means easily taken care ofdata flow

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions