Skip to content

Latest commit

 

History

History
272 lines (175 loc) · 7.27 KB

LayerNormLSTM.md

File metadata and controls

272 lines (175 loc) · 7.27 KB

haste_tf.LayerNormLSTM

Class LayerNormLSTM

Layer Normalized Long Short-Term Memory layer.

This LSTM layer applies layer normalization to the input, recurrent, and output activations of a standard LSTM. The implementation is fused and GPU-accelerated. DropConnect and Zoneout regularization are built-in, and this layer allows setting a non-zero initial forget gate bias.

Details about the exact function this layer implements can be found at #1.

__init__(
    num_units,
    direction='unidirectional',
    **kwargs
)

Initialize the parameters of the LSTM layer.

Arguments:

  • num_units: int, the number of units in the LSTM cell.
  • direction: string, 'unidirectional' or 'bidirectional'.
  • **kwargs: Dict, keyword arguments (see below).

Keyword Arguments:

  • kernel_initializer: (optional) the initializer to use for the input matrix weights. Defaults to glorot_uniform.
  • recurrent_initializer: (optional) the initializer to use for the recurrent matrix weights. Defaults to orthogonal.
  • bias_initializer: (optional) the initializer to use for both input and recurrent bias vectors. Defaults to zeros unless forget_bias is non-zero (see below).
  • kernel_transform: (optional) a function with signature (kernel: Tensor) -> Tensor that transforms the kernel before it is used. Defaults to the identity function.
  • recurrent_transform: (optional) a function with signature (recurrent_kernel: Tensor) -> Tensor that transforms the recurrent kernel before it is used. Defaults to the identity function.
  • bias_transform: (optional) a function with signature (bias: Tensor) -> Tensor that transforms the bias before it is used. Defaults to the identity function.
  • forget_bias: (optional) float, sets the initial weights for the forget gates. Defaults to 1 and overrides the bias_initializer unless this argument is set to 0.
  • dropout: (optional) float, sets the dropout rate for DropConnect regularization on the recurrent matrix. Defaults to 0.
  • zoneout: (optional) float, sets the zoneout rate for Zoneout regularization. Defaults to 0.
  • dtype: (optional) the data type for this layer. Defaults to tf.float32.
  • name: (optional) string, the name for this layer.

Properties

bidirectional

True if this is a bidirectional RNN, False otherwise.

name

Returns the name of this module as passed or determined in the ctor.

NOTE: This is not the same as the self.name_scope.name which includes parent module names.

name_scope

Returns a tf.name_scope instance for this class.

output_size

state_size

submodules

Sequence of all sub-modules.

Submodules are modules which are properties of this module, or found as properties of modules which are properties of this module (and so on).

a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []

Returns:

A sequence of all submodules.

trainable_variables

Sequence of variables owned by this module and it's submodules.

Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don't expect the return value to change.

Returns:

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

variables

Sequence of variables owned by this module and it's submodules.

Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don't expect the return value to change.

Returns:

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

Methods

__call__(
    inputs,
    training,
    sequence_length=None,
    time_major=False
)

Runs the RNN layer.

Arguments:

  • inputs: Tensor, a rank 3 input tensor with shape [N,T,C] if time_major is False, or with shape [T,N,C] if time_major is True.
  • training: bool, True if running in training mode, False if running in inference mode.
  • sequence_length: (optional) Tensor, a rank 1 tensor with shape [N] and dtype of tf.int32 or tf.int64. This tensor specifies the unpadded length of each example in the input minibatch.
  • time_major: (optional) bool, specifies whether input has shape [N,T,C] (time_major=False) or shape [T,N,C] (time_major=True).

Returns:

A pair, (output, state) for unidirectional layers, or a pair ([output_fw, output_bw], [state_fw, state_bw]) for bidirectional layers.

build(shape)

Creates the variables of the layer.

Calling this method is optional for users of the RNN class. It is called internally with the correct shape when __call__ is invoked.

Arguments:

  • shape: instance of TensorShape.
@classmethod
with_name_scope(
    cls,
    method
)

Decorator to automatically enter the module name scope.

class MyModule(tf.Module):
  @tf.Module.with_name_scope
  def __call__(self, x):
    if not hasattr(self, 'w'):
      self.w = tf.Variable(tf.random.normal([x.shape[1], 64]))
    return tf.matmul(x, self.w)

Using the above module would produce tf.Variables and tf.Tensors whose names included the module name:

mod = MyModule()
mod(tf.ones([8, 32]))
# ==> <tf.Tensor: ...>
mod.w
# ==> <tf.Variable ...'my_module/w:0'>

Args:

  • method: The method to wrap.

Returns:

The original method wrapped such that it enters the module's name scope.