Layer Normalized Long Short-Term Memory layer.
This LSTM layer applies layer normalization to the input, recurrent, and output activations of a standard LSTM. The implementation is fused and GPU-accelerated. DropConnect and Zoneout regularization are built-in, and this layer allows setting a non-zero initial forget gate bias.
Details about the exact function this layer implements can be found at #1.
__init__(
num_units,
direction='unidirectional',
**kwargs
)
Initialize the parameters of the LSTM layer.
num_units
: int, the number of units in the LSTM cell.direction
: string, 'unidirectional' or 'bidirectional'.**kwargs
: Dict, keyword arguments (see below).
kernel_initializer
: (optional) the initializer to use for the input matrix weights. Defaults toglorot_uniform
.recurrent_initializer
: (optional) the initializer to use for the recurrent matrix weights. Defaults toorthogonal
.bias_initializer
: (optional) the initializer to use for both input and recurrent bias vectors. Defaults tozeros
unlessforget_bias
is non-zero (see below).kernel_transform
: (optional) a function with signature(kernel: Tensor) -> Tensor
that transforms the kernel before it is used. Defaults to the identity function.recurrent_transform
: (optional) a function with signature(recurrent_kernel: Tensor) -> Tensor
that transforms the recurrent kernel before it is used. Defaults to the identity function.bias_transform
: (optional) a function with signature(bias: Tensor) -> Tensor
that transforms the bias before it is used. Defaults to the identity function.forget_bias
: (optional) float, sets the initial weights for the forget gates. Defaults to 1 and overrides thebias_initializer
unless this argument is set to 0.dropout
: (optional) float, sets the dropout rate for DropConnect regularization on the recurrent matrix. Defaults to 0.zoneout
: (optional) float, sets the zoneout rate for Zoneout regularization. Defaults to 0.dtype
: (optional) the data type for this layer. Defaults totf.float32
.name
: (optional) string, the name for this layer.
True
if this is a bidirectional RNN, False
otherwise.
Returns the name of this module as passed or determined in the ctor.
NOTE: This is not the same as the self.name_scope.name
which includes
parent module names.
Returns a tf.name_scope
instance for this class.
Sequence of all sub-modules.
Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).
a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []
A sequence of all submodules.
Sequence of variables owned by this module and it's submodules.
Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don't expect the return value to change.
A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).
Sequence of variables owned by this module and it's submodules.
Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don't expect the return value to change.
A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).
__call__(
inputs,
training,
sequence_length=None,
time_major=False
)
Runs the RNN layer.
inputs
: Tensor, a rank 3 input tensor with shape [N,T,C] iftime_major
isFalse
, or with shape [T,N,C] iftime_major
isTrue
.training
: bool,True
if running in training mode,False
if running in inference mode.sequence_length
: (optional) Tensor, a rank 1 tensor with shape [N] and dtype oftf.int32
ortf.int64
. This tensor specifies the unpadded length of each example in the input minibatch.time_major
: (optional) bool, specifies whetherinput
has shape [N,T,C] (time_major=False
) or shape [T,N,C] (time_major=True
).
A pair, (output, state)
for unidirectional layers, or a pair
([output_fw, output_bw], [state_fw, state_bw])
for bidirectional
layers.
build(shape)
Creates the variables of the layer.
Calling this method is optional for users of the RNN class. It is called
internally with the correct shape when __call__
is invoked.
shape
: instance ofTensorShape
.
@classmethod
with_name_scope(
cls,
method
)
Decorator to automatically enter the module name scope.
class MyModule(tf.Module):
@tf.Module.with_name_scope
def __call__(self, x):
if not hasattr(self, 'w'):
self.w = tf.Variable(tf.random.normal([x.shape[1], 64]))
return tf.matmul(x, self.w)
Using the above module would produce tf.Variable
s and tf.Tensor
s whose
names included the module name:
mod = MyModule()
mod(tf.ones([8, 32]))
# ==> <tf.Tensor: ...>
mod.w
# ==> <tf.Variable ...'my_module/w:0'>
method
: The method to wrap.
The original method wrapped such that it enters the module's name scope.