This topic walks through the process of creating new MXNet operators (or layers).
We've done our best to provide high-speed operators for most common use cases. However, if you find yourself in need of custom layers, like a novel loss for your research, you have two options:
-
Use CustomOp to write new operators in the front-end language (i.e., Python) that run on CPUs or GPUs. Depending on your implementation, this can range from very fast (if you only use operators under mx.nd) to very slow (if you use
.asnumpy()
to copy out the data). -
Use C++/mshadow (CUDA). This can be difficult if you're not familiar with MXNet, mshadow, or Cuda, but it provides the best performance.
Implementing an operator in Python is simple. As an example, let's create a softmax operator. Start by subclassing mxnet.operator.CustomOp
, and then override a few methods:
import os
# MXNET_CPU_WORKER_NTHREADS must be greater than 1 for custom op to work on CPU
os.environ["MXNET_CPU_WORKER_NTHREADS"] = "2"
import mxnet as mx
import numpy as np
class Softmax(mx.operator.CustomOp):
def forward(self, is_train, req, in_data, out_data, aux):
x = in_data[0].asnumpy()
y = np.exp(x - x.max(axis=1).reshape((x.shape[0], 1)))
y /= y.sum(axis=1).reshape((x.shape[0], 1))
self.assign(out_data[0], req[0], mx.nd.array(y))
We defined the computation for the forward pass of our operator. The forward function takes a list of input and a list of output NDArrays. For convenience, We called .asnumpy()
on the input NDArray to convert it to CPU-based NumPy arrays.
This can be very slow. If you want the best performance, keep data in NDArray format and use operators under mx.nd to do the computation.
At the end, we used CustomOp.assign to assign the resulting array y to out_data[0]. It handles assignment based on the value of req, which can be 'write', 'add', or 'null'.
Then do the same for the backward pass:
def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
l = in_data[1].asnumpy().ravel().astype(np.int)
y = out_data[0].asnumpy()
y[np.arange(l.shape[0]), l] -= 1.0
self.assign(in_grad[0], req[0], mx.nd.array(y))
Softmax defines the computation of our custom operator, but you still need to define its input/output format by subclassing mx.operator.CustomOpProp. First, register the new operator with the name 'softmax':
@mx.operator.register("softmax")
class SoftmaxProp(mx.operator.CustomOpProp):
Then, call the base constructor with need_top_grad=False
because softmax is a loss layer and you don't need gradient input from preceding layers:
def __init__(self):
super(SoftmaxProp, self).__init__(need_top_grad=False)
Then declare the input and output:
def list_arguments(self):
return ['data', 'label']
def list_outputs(self):
return ['output']
Note that list_arguments declares both input and parameter. We recommend ordering them as follows: ['input1', 'input2', ... , 'weight1', 'weight2', ...]
Next, provide infer_shape
to declare the shape of the output/weight and check the consistency of the input shapes:
def infer_shape(self, in_shape):
data_shape = in_shape[0]
label_shape = (in_shape[0][0],)
output_shape = in_shape[0]
return [data_shape, label_shape], [output_shape], []
The first dim of an input/output tensor is batch size. The label is a set of integers, one for each data entry, and the output has the same shape as the input. Infer_shape should always return three lists in this order: inputs, outputs, and auxiliary states (which we don't have here), even if one of them is empty.
Finally, define a create_operator function that will be called by the back end to create an instance of softmax:
def create_operator(self, ctx, shapes, dtypes):
return Softmax()
To use the custom operator, create an mx.sym.Custom symbol with op_type as the registered name:
mlp = mx.symbol.Custom(data=fc3, name='softmax', op_type='softmax')
Please see the full code for this example here.
With MXNet v0.9 (the NNVM refactor) and later, creating new operators has become easier. Operators are now registered with NNVM. The following code is an example on how to register an operator (checkout src/operator/tensor for more examples):
NNVM_REGISTER_OP(abs)
.MXNET_DESCRIBE("Take absolute value of the src")
.set_num_inputs(1)
.set_num_outputs(1)
.set_attr<nnvm::FInferShape>("FInferShape", ElemwiseShape<1,1>);
The syntax is quite simple, we register the operator with a name, then set number of inputs and outputs. You can register attributes with any key (FInferShape
for example) to any operator, without having to modify a central class interface definition.
One of the biggest improvements brought by NNVM is the operator attribute system. This is like traits for types in common languages like c++. We can register any attribute to any operator, with the syntax
NNVM_REGISTER_OP(op-name)
.set_attr<AttributeType>("AttributeKey", CorrespondingAttributeObject);
These attributes can be retrieved later for various purposes. For example FInferShape
is used for shape inference, FCompute<cpu>
is used for carrying out actual computation on CPU.
As long as all attributes registered with the same key have the same type, we can register any attributes to operators. The more attribute an operator provide, the more information the system can use for optimization.
In this section, we will go through the basic attributes MXNet expect for all operators. You can find the definition for them in the following two files:
.describe(comment)
adds a comment to the operator. Use .MXNET_DESCRIBE(comment)
to add the current file name and line number to comment.
Set attribute parser with .set_attr_parser(PARSER)
where PARSER is a function with prototype void(nnvm::NodeAttr* attrs)
. This function should parse the key-word arguments in attrs->dict
and store the result in attrs->parsed
.
Simple arguments can be parsed like
NNVM_REGISTER_OP(scalar_op)
.set_attr_parser(
[](NodeAttrs* attrs) {
attrs->parsed = std::stod(attrs->dict["scalar"]);
})
The parsed arguments can then be accessed in other attribute functions with
double alpha = nnvm::get<double>(attrs.parsed);
More complex ops can use dmlc::Parameters
and ParamParser
(defined in operator_common.h) for parsing:
#include <dmlc/parameter.h>
#include <operator_common.h>
struct ActivationParam : public dmlc::Parameter<ActivationParam> {
// use int for enumeration
int act_type;
DMLC_DECLARE_PARAMETER(ActivationParam) {
DMLC_DECLARE_FIELD(act_type)
.add_enum("relu", activation::kReLU)
.add_enum("sigmoid", activation::kSigmoid)
.add_enum("tanh", activation::kTanh)
.add_enum("softrelu", activation::kSoftReLU)
.describe("Activation function to be applied.");
}
};
NNVM_REGISTER_OP(Activation)
.set_attr_parser(ParamParser<ActivationParam>);
// access with:
// const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed);
Number of inputs/outputs can be set with .set_num_inputs(n_in)
and .set_num_outputs(n_out)
where n_in and n_out are integers.
Alternatively, if the number of inputs/outputs is variable and depends on arguments, you can set n_in/n_out to functions with prototype uint32_t(const nnvm::NodeAttrs& attrs)
that return the number of inputs/outputs based on parsed arguments.
Outputs can be made invisible to other operators by registering FNumVisibleOutputs
and return an integer smaller than n_out.
Inputs/outputs can be named by registering FListInputNames
and FListOutputNames
with prototype std::vector<std::string>(const NodeAttrs& attrs)
.
Set argument descriptions with .add_argument(name, type, comment)
. This is necessary for operators to be properly called imperatively.
First, add NDArray arguments num_inputs times with type "NDArray" or one time with type "NDArray[]" for ops with variable length inputs.
Then add key-word arguments with proper type (float, string, etc). Operators that parse key-word arguments with dmlc::Parameter
can add argument descriptions in bulk with .add_arguments(ActivationParam::__FIELDS__())
(NDArray arguments still need to be manually added with type "NDArray").
Normally operators need to have FInferShape
with prototype bool(const nnvm::NodeAttrs& attrs, std::vector<TShape> *in_attrs, std::vector<TShape> *out_attrs)
. FInferShape
fills unknown shapes (shape.ndim() == 0
) in in_attrs/out_attrs based on known shapes in in_attrs/out_attrs. Use ElemwiseShape<n_in, n_out>
for simple operators with uniform shapes.
Operators that are only used for a backward pass can instead register .set_attr<nnvm::TIsBackward>("TIsBackward", true)
and its shapes with be copied from the corresponding forward operator.
Similar to FInferShape, FInferType fills unknown types (-1) based on known types. Use ElemwiseType<n_in, n_out>
for simple operators with uniform types. Operators that registered TIsBackward
don't need to register this.
FInplaceOption
with prototype std::vector<std::pair<int, int> >(const NodeAttrs& attrs)
specifies which input/output pairs can be computed in-place and share memory with each other. Each pair (i, j) in the returned list means that the i-th input can share memory with the j-th output.
If and operator has gradient, it can be described with FGradient
with prototype
std::vector<nnvm::NodeEntry>(const nnvm::NodePtr& n,
const std::vector<nnvm::NodeEntry>& ograds)
Use utility functions ElemwiseGradUseIn{op_name}
, ElemwiseGradUseOut{op_name}
, ElemwiseGradUseNone{op_name}
for ops that need corresponding forward op's input, output or nothing to calculating gradient.
For more complicated pattern, use MakeGradNode(op_name, n, heads, dict)
to create gradient entries, where heads are input entries to the backward op, composed from ograds and n->inputs.
Simple operators can register FCompute with .set_attr<FCompute>("FCompute<cpu>", ...)
and .set_attr<FCompute>("FCompute<gpu>", ...)
for both CPU and (optionally) GPU computation.
FCompute has prototype
void(const nnvm::NodeAttrs& attrs,
const OpContext& ctx,
const std::vector<TBlob>& inputs,
const std::vector<OpReqType>& req,
const std::vector<TBlob>& outputs)
req
has the same length with outputs
. Each entry of req
specifies how the corresponding output
should be written to. OpReqType
is defined as:
enum OpReqType {
kNullOp,
kWriteTo,
kWriteInplace,
kAddTo
};
Normally, the req
of all outputs
should be kWriteTo
, meaning that the provided outputs
tensor is a raw memory block, so the operator should write results directly into it. In some cases, for example, when calculating the gradient tensor, it would be great if we could accumulate the result, rather than directly overwrite the tensor contents so that no extra space needs to be created each time. In such cases, the corresponding req
is set to kAddTo
, indicating that a +=
should be used.
NNVM_REGISTER_OP(abs)
.MXNET_DESCRIBE("Take absolute value of the src")
.set_num_inputs(1)
.set_num_outputs(1)
.set_attr<nnvm::FInferShape>("FInferShape", ElemwiseShape<1, 1>)
.set_attr<nnvm::FInferType>("FInferType", ElemwiseType<1, 1>)
.set_attr<nnvm::FInplaceOption>("FInplaceOption",
[](const NodeAttrs& attrs){
return std::vector<std::pair<int, int> >{{0, 0}};
})
.set_attr<FCompute>("FCompute<cpu>", UnaryCompute<cpu, mshadow_op::abs>)
.set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_abs"});
.add_argument("data", "NDArray", "Source input")
NNVM_REGISTER_OP(_backward_abs)
.set_num_inputs(2)
.set_num_outputs(1)
.set_attr<nnvm::FInferShape>("FInferShape", ElemwiseShape<2, 1>)
.set_attr<nnvm::FInferType>("FInferType", ElemwiseType<2, 1>)
.set_attr<nnvm::FInplaceOption>("FInplaceOption",
[](const NodeAttrs& attrs){
return std::vector<std::pair<int, int> >{{0, 0}, {1, 0}};
})
.set_attr<FCompute>("FCompute<cpu>", BinaryCompute<cpu, unary_bwd<mshadow_op::sign> >);
For the legacy (pre 0.9) way of defining operators with C++, please see: