You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/semeval2013.tex
+88-73Lines changed: 88 additions & 73 deletions
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,7 @@
3
3
\usepackage{times}
4
4
\usepackage{latexsym}
5
5
\setlength\titlebox{6.5cm} % Expanding the titlebox
6
+
\usepackage{url}
6
7
\usepackage{float}
7
8
\floatstyle{boxed}
8
9
\restylefloat{figure}
@@ -21,59 +22,73 @@
21
22
22
23
%what resource did we use,
23
24
\begin{abstract}
24
-
25
-
We present our approaches to CL-WSD(Cross-Lingual Word Sense Disambiguation) for the Semeval 2013 Task 10, which came in
26
-
three varieties:
27
-
"One layer" Classifiers, which are single maximum-entropy classifiers making use of monolingual context features. %local context features,
28
-
"Two layer" Classifiers,which are based on layer-one classifiers and also use multilingual features that are translations for four other languages.
29
-
%which are the same as the one-layer classifiers except that they use the translation of the word of interest into four other target languages as features,
30
-
And lastly, the "MRF(Markov Random Field)" Classifiers, which also use multilingual features. Instead of translate each language separately, they build a network of five layer-one classifiers to allow them to find the translation for five languages jointly.%solve the classification task jointly.
31
-
%=We will also discuss the results and findings.
32
-
33
25
We present our entries for the SemEval-2013 cross-language word-sense
34
-
disambiguation task \cite{task10}. We submitted three systems based
35
-
on classifiers trained on local context features, with some elaborations.
36
-
Our three systems, in increasing order of complexity, were: maximum entropy
37
-
classifiers trained to predict the desired target-language phrase using only monolingual features (we called this system ``L1"); similar classifiers, but with the desired target-language
38
-
phrase for the other four languages as features (``L2"); and lastly, networks
39
-
of five classifiers, over which we do loopy belief propagation in an attempt to
40
-
solve the classification task jointly (``MRF").
26
+
disambiguation task \cite{task10}. We submitted three systems based on
27
+
classifiers trained on local context features, with some elaborations. Our
28
+
three systems, in increasing order of complexity, were: maximum entropy
29
+
classifiers trained to predict the desired target-language phrase using only
30
+
monolingual features (we called this system \emph{L1}); similar classifiers,
31
+
but with the desired target-language phrase for the other four languages as
32
+
features (\emph{L2}); and lastly, networks of five classifiers, over which we
33
+
do loopy belief propagation to solve the classification tasks jointly
34
+
(\emph{MRF}).
41
35
\end{abstract}
42
36
43
37
\section{Introduction}
44
38
In the cross-language word-sense disambiguation (CL-WSD) task, given an
45
39
instance of an ambiguous word used in a context, we want to predict the
46
40
appropriate translation into some target language. This setting for WSD has an
47
41
immediate application in machine translation, since many words have many
48
-
possible translations.
49
-
50
-
Framing lexical ambiguities in this way, as an explicit classification task,
51
-
has been shown to be improve machine translation even in the case of
52
-
phrase-based SMT systems (cite Carpuat and Wu), which can mitigate the
53
-
ambiguities through the use of a language model and phrase-tables with
54
-
multi-word phrases.
55
-
CL-WSD has been shown useful for statistical machine translation (cite Carpuat and Wu), although in future work we are particularly interested in applying it to rule-based systems. (XXX: is this relevant?)
56
-
57
-
In the Semeval-2013 CL-WSD task \cite{task10}, we are asked to build a system that can provide
58
-
translations for twenty ambiguous English nouns in their contexts. The five target languages in the shared task are Spanish, Dutch, German, Italian and French. There were two settings for the evaluation, ``best" and ``oof". In either case, systems may present multiple possible answers for a given translation, although in the ``best" setting, the first answer is given more weight, which encourages only returning the one-best. In the ``oof" setting, systems are encouraged to return the top-five most likely translations. For a complete explanation of the settings, please see the shared task description \cite{task10}.
42
+
possible translations. Framing the resolution of lexical ambiguities as an
43
+
explicit classification task has been shown to be improve machine translation
44
+
even in the case of phrase-based SMT systems \cite{carpuatpsd}, which can
45
+
mitigate lexical ambiguities through the use of a language model and
46
+
phrase-tables with multi-word phrases.
47
+
48
+
XXX: work in Brown 1991 reference too:
49
+
\cite{Brown91word-sensedisambiguation}
50
+
51
+
In the Semeval-2013 CL-WSD task \cite{task10}, entrants are asked to build a
52
+
system that can provide translations for twenty ambiguous English nouns, given
53
+
appropriate contexts. The five target languages in the shared task are Spanish,
54
+
Dutch, German, Italian and French. There were two settings for the evaluation,
55
+
``best" and ``oof". In either case, systems may present multiple possible
56
+
answers for a given translation, although in the ``best" setting, the first
57
+
answer is given more weight in the evaluation, and this setting encourages only
58
+
returning the top answer. In the ``oof" setting, systems are asked to
59
+
return the top-five most likely translations. For a complete explanation of the
60
+
task and its evaluation, please see the shared task description \cite{task10}.
59
61
60
62
%% consider: maybe move this to related work?
61
63
Following the work of Lefever and Hoste
62
64
\shortcite{lefever-hoste-decock:2011:ACL-HLT2011}, we wanted to develop systems
63
-
that make use of multiple bitext corpora for the CL-WSD task.
64
-
ParaSense, the system of Lefever and Hoste, takes into account evidence from all of the available parallel corpora. Let $S$ be the set of five target languages and $t$ be the particular target language of interest at the moment; ParaSense creates bag-of-words features from the translations of the target sentence into the languages $S - \lbrace{t \rbrace}$. Given corpora that are parallel over many languages, this is straightforward to do at training time, however at testing time it requires the use of a complete MT system into the four other languages, which is computationally prohibitive. Thus in our work, we have developed systems that make use of many parallel corpora but require neither a locally running MT system nor access to an online translation API.
65
+
that make use of multiple bitext corpora for the CL-WSD task. ParaSense, the
66
+
system of Lefever and Hoste, takes into account evidence from all of the
67
+
available parallel corpora. Let $S$ be the set of five target languages and $t$
68
+
be the particular target language of interest at the moment; ParaSense creates
69
+
bag-of-words features from the translations of the target sentence into the
70
+
languages $S - \lbrace{t \rbrace}$. Given corpora that are parallel over many
71
+
languages, this is straightforward to do at training time, however at testing
72
+
time it requires the use of a complete MT system into the four other languages,
73
+
which is computationally prohibitive. Thus in our work, we have developed
74
+
systems that make use of many parallel corpora but require neither a locally
75
+
running MT system nor access to an online translation API.
65
76
66
77
We presented three systems in this competition, which were variations on the
67
78
theme of a maximum entropy classifier for each ambiguous noun, trained on local
68
79
context features similar to those used in previous work and familiar from the
69
80
WSD literature.
70
81
71
-
Our systems had similar results, but at the time of the evaluation, our simplest system came in first place for the out-of-five evaluation for three languages (Spanish, German, and Italian).
72
-
However, after the evaluation, we fixed a simple (slightly embarrassing) bug in our MRF code, which resulted in the MRF system posting even better results for the OOF evaluation.
73
-
74
-
on the \emph{oof} evaluation, we had the best results for Spanish, German, and Italian.
75
-
All of our systems beat the ``most-frequent sense" baseline in every case.
82
+
Our systems had similar results, but at the time of the evaluation, our
83
+
simplest system came in first place for the out-of-five evaluation for three
84
+
languages (Spanish, German, and Italian). However, after the evaluation
85
+
deadline, we fixed a simple (slightly embarrassing) bug in our MRF code, which
86
+
resulted in the MRF system producing even better results for the OOF
87
+
evaluation.
76
88
89
+
... on the \emph{oof} evaluation, we had the best results for Spanish, German,
90
+
and Italian. All of our systems beat the ``most-frequent sense" baseline in
91
+
every case.
77
92
78
93
Our three systems made use of the same training data, which we extracted from
79
94
the Europarl Intersection corpus, meaning that the English-language source
@@ -103,13 +118,15 @@ \section{L1}
103
118
in question to the appropriate target-language lemma), we extract features from
104
119
the English-language sentence.
105
120
106
-
Several steps of preprocessing were needed. We first POS tagged the sentences, since we are only interested in nouns.
107
-
Then align the words in each sentence pair, and lemmatize the target sentence.
108
-
After locating words of interest in the
109
-
Europarl Intersection corpus, training instances were extracted, and a maxent
110
-
classifier was trained over local context features similar to those used by Lefever and
111
-
Hoste.
121
+
%% rework a bit
122
+
Several steps of preprocessing were needed. We first POS tagged the sentences,
123
+
since we are only interested in nouns. Then align the words in each sentence
124
+
pair, and lemmatize the target sentence. After locating words of interest in
125
+
the Europarl Intersection corpus, training instances were extracted, and a
126
+
maxent classifier was trained over local context features similar to those used
127
+
by Lefever and Hoste.
112
128
129
+
%% howto do a nested list?
113
130
\begin{figure}
114
131
\begin{itemize}
115
132
\item word form
@@ -121,7 +138,7 @@ \section{L1}
121
138
\item bigrams and tagged bigrams (just in case)
122
139
\end{itemize}
123
140
\label{features}
124
-
\caption{some features}
141
+
\caption{Features used in our classifiers}
125
142
\end{figure}
126
143
127
144
Note: word tag is different from word with tag (so as for bigram and bigram
@@ -166,6 +183,8 @@ \section{MRF}
166
183
Spanish, Italian and French. Can this closeness be represented by the pairwise
167
184
potentials?
168
185
186
+
%% TODO: build a diagram of the network. THE TRANSLATION PENTAGRAM.
187
+
169
188
There was some concern about pairwise potential in MRF, which is joint probability. Consider a word which occurs 500 times in the training data, it could co-occur with
170
189
We had some concern about pairwise potential in MRF, which is joint
171
190
probability. Consider a word which occurs 500 times in the training data, it
@@ -186,16 +205,19 @@ \section{Resources and tools}
0 commit comments