A similarity measure is a numerical measure of the degree the two objects are alike. Usually, it quantifies similarity with a scalar in range [0; 1] or [0; ∞]. A semantic similarity measure is a specific similarity measure designed to quantify semantic relatedness of lexical units (e.g. nouns and multiword expressions). It yields high values for the pairs of words in a semantic relation (synonyms, hyponyms, associations or co-hyponyms) and zero values for all other pairs.
Semantic similarity measures proved to be useful for text processing applications, including text similarity, query expansion, question answering and word sense disambiguation. Such measures are practical because of the gap between lexical surface of the text and its meaning. Indeed, the same concept is often represented by different terms. Furthermore, these measures can be useful in linguistic and philological studies.
Measures of semantic similarity is an actively developing field of computational linguistics. Many methods were proposed and tested during last 20 years. Recently with the advent of neural network language models yielding state-of-the-art results on the semantic similarity task the interest to this field increased even more. Many authors tried to performed exhaustive comparisons of semantic similarity measures and developed a whole range of benchmarks and evaluations datasets.
Unfortunately, most of the approaches to semantic similarity were implemented and evaluated only on a handful of European languages, mostly in English. While some Russian researchers sporadically tried to adopt several methods developed for English, these efforts were mostly done in a context of some specific applications without any proper evaluation. To the best of our knowledge, no systematic investigation of semantic similarity measures of Russian language was ever performed.
The goal of the RUSSE is to fill this gap and to conduct an evaluation campaign of key currently available methods. The RUSSE competition will perform a systematic comparison and evaluation of the baseline and the most recent approaches to semantic similarity in the context of Russian language. This will let us identify specific features of the semantic similarity phenomena in Russian language. The event will be organized in a form of a competition of systems that calculate similarity between words.
Further details, including task rationale, schedule and datasets can be found on the RUSSE website: http://russe.nlpub.ru/. Participants will be invited to submit a paper to the Dialogue-2015 conference describing their system.