This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Replies: 1 comment
-
@eric-haibin-lin : Please label: Question, Operator |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
when too many connection for the single symbol occur, can the some grads for relu_new_1 be skipped due to the MXNET_EXEC_INPLACE_GRAD_SUM_CAP?
I saw "graph_executor.cc" and there is a comment
[
// Note: For a symbol like the following:
// data = mx.sym.Variable('data')
// sym = data + data + data + data + data + data + data
// the node entries v passed in here are of the same node of
// op _identity_with_attr_like_rhs. We should skip adding a node
// to its own control_deps.
]
I cannot understand the command perfectly, however, relu_new_1 in the following code is duplicated a lots and may have a dependency each other.
Then, can the gradient be different according to MXNET_EXEC_INPLACE_GRAD_SUM_CAP?
(the small connection accept all duplicate gradient and the many connection choose only one gradient among them?)
In my case, the usage of the identity symbol for reducing the direct connection show the quite different training loss even though the seed is fixed.
How does this code execute in mxnet through the graph executor?
Beta Was this translation helpful? Give feedback.
All reactions