Description
- This issue keeps track of (most) aspects of Dbux's big Data Flow feature.
DataFlowView
- List
- Allow switching between
value
andaccess
- Allow filtering:
Read
,Write
,Both
(DataNode.type
) - If trace has
dataNodes.length > 1
, allow selecting individualdataNode
- NOTE: requires changing
buildDataNodes(trace)
tobuildDataNodes(dataNode)
- fire traceSelection event with additional param
dataNodeId
- NOTE: requires changing
- Smarter coherence when selecting different traces:
- when clicking a trace, automatically choose the
dataNode
that participated in previous listing (if there is such a node)
- when clicking a trace, automatically choose the
- Button Icons
- Don't show same trace multiple times
- (future work) How to best visualize data flow in Call Graph (without a lot of extra coding effort)?
New ValueView
- Fix
ValueCollection.serialize
to only record "new data" - Add
DataNode.type
to distinguish betweenRead
,Write
andDelete
- Upon any
Write
-typeDataNode
, make sure, we have enough information, to re-construct the final value - Do not
deserialize
anymore inValueRefCollection
- Remove
getTraceValue
function, and replace withconstructValueObjectShallow
- Produce time- and memory- optimized algorithm to re-construct the value of reference types for any
trace
using those partial recording- if not object: get the actual
value
of a node from first node of samevalueId
- if object: get the actual value by taking the first node's value, and then apply all
Write
+Delete
operations that followed that value
- if not object: get the actual
- Add two inline buttons to each value
- "Go to write trace" (the latest write of that value before currently selected
DataNode
) - "Go to value creation" (go to first
DataNode
of samevalueId
)
- "Go to write trace" (the latest write of that value before currently selected
- future-work: Add
constructValueObjectFull
for rendering/exporting features
Bugs
-
DebugTDNode
displays the wrongTraceType
name -
ValueTDNode
always displaysundefined
-> make suredataProviderUtil
usesdataNode
to getvalue
andvalueId
, nottrace
- first
DataFlowNode
displayed inDataFlowView
is incorrect when selecting=
invar x = 3 + 5
- should be
3 + 5
but is3
- should be
-
undefined
values don't have avalueId
- test with
calls0.js
: during first call off
, investigate its parametersa
andb
'svalueId
- NOTE: same with
g
's second parameter
- NOTE: same with
- test with
- When using
getCallerTraceOfContext
, we cannot be sure if it is the BCE of the current context or not. E.g.: when inside the callback ofArray.map
, it would return the BCE ofArray.map
, which is not the actual caller of the callback's context.- In
RuntimeDataProvider
, replace use ofgetCallerTraceOfContext
with a new utility functiongetOwnCallerTraceOfContext
- add new
getFunctionDefinitionTrace(contextId)
- Get the callee of the function via
bceTrace.data.calleeTid
- Return the first trace by
callee -> valueId
which hasTraceType.FunctionDe{finition,claration}
- Get the callee of the function via
- NOTE:
getOwnCallerTraceOfContext
callsgetCallerTraceOfContext
, but then returnsnull
, if the callee (a function object) is not that function (i.e. context): - See peekBCECheckCallee
- Get
functionTrace = getFunctionDefinitionTrace(contextId);
- Return null if
functionTrace
does not exist or itsstaticTrace.data.staticContextId
does not match the context'sstaticContextId
- NOTE2: This is to avoid trying to connect data between caller and function, where the caller is not the actual function (e.g.
Array.map(f)
)
- In
Trace all files, even in node_modules
- Change default babel
include
behavior to include all - Add package
blacklist
to@dbux/cli
- Make sure,
@dbux/cli
properly instruments all modules, even if they were used by the@dbux/cli
itself- Might have to consider doing what
NYC
did instead - See @dbux/cli has some issues with
pw
(adding recording packages fromnode_modules
) #513
- Might have to consider doing what
- add some analytical tools to better understand:
- which modules were loaded and when
- which modules were require'd/imported vs. which modules were actually instrumented
- which modules are not instrumented
Data flow involving function calls
When a function is called, we want to add edges to data flow graph:
- [Parameters and arguments] from
CallExpression
arguments toParam
s- basics
- add
SpreadElement
support [es6] - add
RestElement
support [es6]
- capture function closure variable
declarationTid
s- NOTE: else, local-scope variables inside loops will share a single
declarationTid
just like a hoistedvar
, but they actually lead to one copy per iteration
- NOTE: else, local-scope variables inside loops will share a single
- for that, we also need to take care of
bind
,call
,apply
etc...
We can do that in post-processing using the following procedures.
We assume:
callTrace
,callStaticTrace
andcallId
to be theBCE
's trace, staticTrace and traceIdcontextId
to be that of the called function
NOTE: Built-ins don't have a context
and thus will require a different approach (see below)
Return value
When encountering CallExpressionResult
:
- add data flow from
ReturnArgument
toCallExpressionResult
Alorithm:
returnTraceId
= id of first (and only)ReturnArgument
trace ofcontextId
.- Set
CallExpressionResult
'sinput = [returnTraceId];
- Consider
ExecutionContextCollection.setParamInputs
for reference. - test:
CallExpressionResult
should have avalueId
- e.g. in
mix1.js
,const b = o.f()
should be value-traceable
- e.g. in
- test (i)
return X;
, (ii)return;
, (iii) context has no return #537
Parameters and arguments
See ExecutionContextCollection.setParamInputs.
Data Flow Graph Instrumentation Basics
- es5
- Declare + define variables
-
CallExpression
+BCE
-
FunctionDeclaration
-
FunctionExpression
-
ArithmeticExpression
- MemberExpression rvals
- MemberExpression lvals
-
{Object,Array}Expression
-
UpdateExpression
-
TemplateLiteral
-
IfStatement
-
SwitchStatement
-
SwitchCase
-
ConditionalExpression
-
SequenceExpression
-
ThrowStatement
-
CatchClause
-
WhileStatement
-
DoWhileLoop
-
ForStatement
-
ForInStatement
-
undefined
,NaN
,Infinity
-
this
- fix value of
AssignmentExpression
, ifoperator !== '='
- variable redeclaration in same scope (
addOwnDeclarationTrace
) -
arguments
(Post-processing forarguments
#542) - global
declarationTid
s - handle
module
+module.exports
properly- establish data link between:
require
<->{module.,}exports
- establish data link between:
-
delete o[x]
-
typeof
- es6
- default parameters
-
ArrowFunctionExpression
-
ObjectMethod
-
kind = get
,kind = set
-
-
SpreadElement
(Function
,ObjectExpression
,ArrayExpression
) -
import
<->export
-
Class{Expression,Declaration}
-
ClassProperty
-
ClassMethod
- private members (NOTE: private members do not support dynamic access)
- instance and prop initializers should run in the context of the ctor
-
super
-
-
ForOfStatement
-
async
functions -
generator
functions -
Decorator
- Destructuring assignments
-
ObjectPattern
-
ArrayPattern
-
AssignmentPattern
-
RestElement
-
- built-ins (Instrumentation of built-ins for data flow analysis #543)
Value identity
Compute two new ids:
-
dataNode.accessId
-
dataNode.valueId
(getValueId
)
"Value identity" refers to a uid (valueId
) that can uniquely identify a value:
- For
object
,array
,function
("reference types" or "object types")- It is
dataNode.refId
!!refId
is alwaystrue
- NOTEs:
refId
is assigned inruntime
.- It is the
traceId
that first recorded that object.
- It is
- For non-"object types", we, similarly, want to determine the
traceId
of when that value first came into existence.- Consider that many traces just access and move existing values, and do not actually create new values.
- The algorithm is explained in getValueId.
!!refId
is alwaysfalse
inputs
- Fix up
inputs
: letruntime
recordnodeId
(instead oftraceId
) - Fix up
Value Identitiy
algorithm to usedataNodeId
instead oftraceId
ReferencedIdentifier vs. MemberExpression
-
any
ReferencedIdentifier
refers to a variable name- ->
dataNode.varAccess
containsdeclarationTid
(traceId
that declared (or first recorded) the variable) accessId = makeUid(declarationTid)
- ->
-
any
ME
(MemberExpression
) refers to accessing some object's (or other value, such asint
orstring
) property- example:
f(x)[g(y)]
, whereobject
=f(x)
andproperty
=g(y)
varAccess
consists of:objTid
(traceId
ofobject
)prop
(value ofproperty
)
accessId = makeUid(${getValueId(objTid)}#${prop})
- example:
-
TODO:
makeUid
should use aMap
to convert that string into a number; to more easily maintainAccessIdIndex
getValueId
Goal: Write getValueId
function -
function getValueId(dataNodeId) { ... }
We can compute valueId
by post-processing in DataNodeCollection.postAddRaw
.
Once computed, we want to store it in dataNode.valueId
.
If a value
is an object (that is, if it has a refId
), return the first traceId
of same refId
.
If a value
is not an object, the algorithm is based on each DataNode
's Trace
's TraceType
, as follows:
- Default
- Steps
- -> if
staticTrace.dataNode.isNew
:valueIdentity = traceId
- -> else if
inputs.length > 0
:valueIdentity = getValueId(inputs[0])
- -> else: lookup last entry before this one in
DataNodesByAccessIdIndex
bydataNode.accessId
- -> if
- affected
TraceType
s include (but not limited to):Literal
- NOTE: always new
Declaration
- NOTE: always implies a "new"
undefined
value (if not initialized)
- NOTE: always implies a "new"
ExpressionResult
,ExpressionValue
- who?
ArithmeticExpression
- who?
CallExpressionResult
- Post-processed to get
inputs
. SeeReturn value
section above. - Does not have
inputs
if (i) call did not return a value, or (ii) if we were not able to trace function correctly.
- Post-processed to get
Param
- Post-processed to get
inputs
. SeeParameters and arguments
section above.
- Post-processed to get
WriteVar
WriteME
ReturnArgument
,ThrowArgument
,AwaitArgument
Identifier
,ME
(MemberExpression
)- Does not have
inputs
. Always hasaccessId
.
- Does not have
- Steps
BeforeCallExpression
-> same asCallExpressionResult
(resolve inCallExpressionResult
)- -> don't assign a
valueIdentity
. - NOTE: special handling in UI -
BCE
rendering should (for the most part) reflectCallExpressionResult
- -> don't assign a
More Future Work
Trace object keys as data
- trace always, even if not computed
- if
!computed
, isNew = true, when key is added to object the first time - if
computed
, isNew = false, and property is input- this
DataNode
'sValueIdentity
has 2 parents, makingData Flow View
less intuitive
- this
- key becomes rval via:
for-in
,Object.keys
,Object.entries
, etc.