@@ -163,16 +163,17 @@ \section{L1}
163
163
and the `` word with tag" feature is $ house\_ NN$ .
164
164
165
165
\section {L2 }
166
- The `` layer two" classifier, L2, is an extension to the L1 system , with the
166
+ The `` layer two" classifier, L2, is an extension to the L1 approach , with the
167
167
addition of multilingual features. Particularly, L2 makes use of the
168
168
translations of the target word into the four target languages other than the
169
- one we are currently trying to predict. At training time, these translations
170
- are extracted from Europarl Intersection data, since we have the
171
- translations of each of the English sentences into all five target
172
- languages; the appropriate translations are extracted from the parallel
173
- sentences as described in section \ref {extraction }. At testing time, since
174
- translations of the test sentences are not given, we estimate translations for
175
- $ w$ in the four other languages using the cached L1 classifiers.
169
+ one we are currently trying to predict. At training time, since we have the
170
+ translations of each of the English sentences into the other target languages,
171
+ the appropriate features are extracted from the corresponding sentences in
172
+ those languages. This is the same as the process by which labels are given to
173
+ training instances, described in Section \ref {extraction }. At testing time,
174
+ since translations of the test sentences are not given, we estimate the
175
+ translations for the target word in the four other languages using the cached
176
+ L1 classifiers.
176
177
177
178
Lefever and Hoste \shortcite {lefever -hoste -decock:2011:ACL -HLT2011 } used the
178
179
Google Translate API to translate the source English sentences into the four
@@ -223,17 +224,15 @@ \section{MRF}
223
224
translation decisions of their neighbors, but only proportionally to the
224
225
correlation between the translations that we observe in the two languages.
225
226
226
- We reframe the MAP inference task as a minimization problem by using
227
- negative-log probabilities; we want to find an assignment that minimizes the
228
- sum of all of our penalty functions, which we will describe next.
229
- First, we have a unary function from each of the five L1 classifiers, one for
230
- each target language, which corresponds to a node in the network. The
231
- function assigns a penalty to each possible label for the target word. The
232
- penalty assigned here is the negative log-probability of each possible output
233
- label; the classifier returns a probability distribution, and we map the
234
- probability values $ [0 ,1 ]$ into negative-log space, $ [0 , +\infty ]$ .
235
-
236
- This unary potential $ \phi _i$ , for some fixed set of features $ f$ and a
227
+ We frame the MAP inference task as a minimization problem; we want to find an
228
+ assignment that minimizes the sum of all of our penalty functions, which we
229
+ will describe next. First, we have a unary function from each of the five L1
230
+ classifiers, which correspond to nodes in the network. These functions each
231
+ assign a penalty to each possible label for the target word in the
232
+ corresponding language; that penalty is simply the negative log of the
233
+ probability of the label, as estimated by the classifier.
234
+
235
+ Formally, a unary potential $ \phi _i$ , for some fixed set of features $ f$ and a
237
236
particular language $ i$ , is a function from a label $ l$ to some positive
238
237
penalty value.
239
238
0 commit comments