Stative Attenders #569

armatthews · 2019-02-27T19:52:00Z

This PR enables stative attenders, and contains a sample implementation of "Modeling Coverage for Neural Machine Translation" (Tu et al. 2016).

msperber

I had a chance to take a look. Besides the minor comments on documentation, it basically looks fine, although I'm wondering if there would be a way to make the changes more modular and local to the attender, i.e. remove the need to call update() from outside the attender. I'm not sure if that would work well with beam search etc. though?

-- Matthias
(this comment is "not a contribution")

msperber · 2019-03-15T09:23:17Z

xnmt/modelparts/attenders.py


 class Attender(object):
  """
  A template class for functions implementing attention.
  """

-  def init_sent(self, sent: expression_seqs.ExpressionSequence) -> None:
+  def init_sent(self, sent: expression_seqs.ExpressionSequence) -> AttenderState:
    """Args:
         sent: the encoder states, aka keys and values. Usually but not necessarily an :class:`expression_seqs.ExpressionSequence`
    """


Return value needs documentation.

msperber · 2019-03-15T09:28:49Z

xnmt/modelparts/attenders.py

+            hidden_dim=self.coverage_dim,
+            param_init=param_init,
+            bias_init=bias_init))
+
  def init_sent(self, sent: expression_seqs.ExpressionSequence) -> None:


Return type outdated.

msperber · 2019-03-15T09:33:00Z

xnmt/modelparts/attenders.py

    I = self.curr_sent.as_tensor()
    return I * attention

+  def update(self, dec_state: decoders.DecoderState, att_state: AttenderState, attention: dy.Expression):
+    return None
+


Could you document how update's intended use is?

armatthews · 2019-03-18T18:28:40Z

Thanks for the feedback, Matthias! I updated the documentation.

I talked to Graham about the mechanism of compute_attention() and update(). I agree that it would be nice to factor this so the translator class doesn't have to call update() but I'm not sure that's possible in general.

The real problem is that the attention vector returned by the attender may not be the "final" attention vector used downstream. For example, if one chooses to ensemble multiple attenders then the final attention vector will not be the same as the vector produced by any individual one. This mechanism allows for us to feed the real attention vector back into the attender even in these types of cases.

I talked to Graham a bit about this, and this was the best solution we came up with. I'm happy to discuss further if you see a better way!

msperber · 2019-03-19T07:48:09Z

I see, yeah I had suspected something like that. In that case I think this can be merged (once the merge conflicts are resolved)!

-- Matthias
(this comment is "not a contribution")

neubig · 2019-04-19T18:25:50Z

@armatthews If you resolve the conflicts on this I think we can merge.

austinma added 2 commits February 26, 2019 16:03

allow for stative attenders

19462c3

Added coverage-based attender (Tu et. al 2016)

6eea27d

armatthews mentioned this pull request Feb 27, 2019

Stative Attenders #566

Open

fix to coverage attention with batches > 1

2697911

msperber reviewed Mar 15, 2019

View reviewed changes

bit more documentation

97c1792

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stative Attenders #569

Stative Attenders #569

armatthews commented Feb 27, 2019

msperber left a comment

msperber Mar 15, 2019

msperber Mar 15, 2019

msperber Mar 15, 2019

armatthews commented Mar 18, 2019

msperber commented Mar 19, 2019

neubig commented Apr 19, 2019

Stative Attenders #569

Are you sure you want to change the base?

Stative Attenders #569

Conversation

armatthews commented Feb 27, 2019

msperber left a comment

Choose a reason for hiding this comment

msperber Mar 15, 2019

Choose a reason for hiding this comment

msperber Mar 15, 2019

Choose a reason for hiding this comment

msperber Mar 15, 2019

Choose a reason for hiding this comment

armatthews commented Mar 18, 2019

msperber commented Mar 19, 2019

neubig commented Apr 19, 2019