-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard Monotonic Transducer #165
Comments
My only thought is I am most excited about variant 2. I thought the A&G thing was outmoded also, but it's harmless if you want to do it later. |
Yeah this sounds good. I have a partial implementation of 3) in a fork somewhere from a year ago that I never finished because I got distracted :D. Will be great to see them in here. iirc 2) and 3) should be small variations on the implementation of 1), right? I.e. in the paper I think 2) is basically 1), but they just enforce the monotonicity constraint in the mask.
Agree
Not sure I follow---why is this not an issue with existing architectures? I am pretty sure I have a trick for this in eval, where I think I PAD the shorter one to be the length of the other. EDIT: I just realized the issue (also my trick does not actually solve loss issues). We normally do teacher forcing so it is a non-issue...
I cannot remember---this would imply that all of the constraints are strictly for training, and at inference a regular old soft attention distribution is used?
Agree, though if it's low effort, I am always a fan of having more baselines available. However, I feel like the trick for this model is fairly different from what our codebase typically does, so it might be more effort to implement than it seems. On the topic of baselines, I think Wu and Cotterel also compared to an RL baseline that samples alignments and optimizes with REINFORCE. We could also add that at some point :D. It is probably also available in their library. Both of those are very low priority, though. |
Yeah it's kinda annoying right? I'm tempted to just repeat the last character up to target length but that's not going to be accureate.
Probably need the constraint for inference too unless the model just learns to zero out prior attention. What I mean is, there's a bit of duplicate work between the two (technically the outputs are taking an attention over all potential alignments), but I need to sit down a moment to figure how far that can be stretched without violating some assumptions.
Yeah, it's not a major model anymore, but I think it's handy for just showing power of constraints in word level tasks. A general focus of the library seems to be how monotonic and attention assumptions improve transduction tasks. So may be worth including for posterity.
My RL is weak but I believe the Edit Action Transducer employs a version of reinforce. (Or Dagger. It's Daume adjacent is what I'm saying.) So while low-priority, it may play into a general framework of student-teacher approaches to include in here (#77). It'll take a few weekends for me to parse out, but I really like the idea that any model can support a drop-in expert/policy advisor for training/exploration. |
(Adding on issues board for documentation, this PR will be out over the week.)
Wu and Cotterell's papers on strong alignment seem just up our alley for the library. There should be an implementation of https://aclanthology.org/P19-1148/ , particularly the monotonic cases.
Currently I have going a version that allows the following variants:
Things to add on to the PR once it's up (this makes sense to me now, will makes sense with accompanying PR).
@kylebgorman @Adamits Any additional preferences during development? I've been dancing back and forth with adding in the Ahroni and Goldberg Transducer too, just for completeness. (This and the Swissmen's transducer both supercede that one.)
The text was updated successfully, but these errors were encountered: