Semeval 2010

From Language and Information Technologies

Jump to: navigation, search

Contents

Semeval 2010: Cross Lingual Lexical Substitution

Organizers

  • Rada Mihalcea and Ravi Sinha, University of North Texas
  • Diana McCarthy, University of Sussex

Discussion Board

Google Group

Note: If you do not have a Google/ GMail account, you can subscribe to this discussion board by sending an email to
ravisinha AT my DOT unt DOT edu and mentioning the email address you would like to use.

Description

The goal of this task is to provide a framework for the evaluation of systems for cross-lingual lexical substitution. Given a paragraph and a target word, the goal is to provide several correct translations for that word in a given language, with the constraint that the translations fit the given context in the source language. This is a follow-up of the English lexical substitution task from SemEval-2007 (McCarthy and Navigli, 2007), but this time the task is cross-lingual. While there are connections between this task and the task of automatic machine translation, there are several major differences. First, cross-lingual lexical substitution targets one word at a time, rather than an entire sentence as machine translation does. Second, in cross-lingual lexical substitution we seek as many good translations as possible for the given target word, as opposed to just one translation, which is the typical output of machine translation. There are also connections between this task and a word sense disambiguation task which uses distinctions in translations for word senses (Resnik and Yarowsky, 1997) however in this task we do not restrict the translations to those in a specific parallel corpus; the annotators and systems are free to choose the translations from any available resource. Also, we do not assume a fixed grouping of translations to form "senses" and so it is possible that any token instance of a word may have translations in common with other token instances that are not themselves directly related. An automatic system for cross-lingual lexical substitution would be useful for a number of applications. For instance, such system can be used to assist human translators in their work, by providing a number of correct translations that the human translator can choose from. Similarly, the system could be used to assist language learners, by providing them with the interpretation of the unknown words in a text written in the language they learn. Last but not least, the output of a cross-lingual lexical substitution system could be used as input to existing systems for automatic machine translation.

The task

Given a paragraph and a target word, the task is to provide several correct translations for that word in a given language. We will use English as the source language and Spanish as the target language.

Data

We will provide both development and test sets. The datasets will be released without licensing restrictions. We will provide a development data set of 300 English instances of 30 words with their Spanish substitutes (by September 2009). The words will include nouns, verbs, adjectives and adverbs. The test data will consist of approximately 1000 instances of 100 words.

Scoring

The scoring will follow the precision and recall best and out-of-ten scores in the 2007 English Lexical Substitution scoring guidelines . We will refine these before September 2009 and the release of the development data for this task.

Timeline

The timeline of the task will follow the general Semeval timeframe as follows:

  • September 2009: a development corpus of approximately 300 instances annotated with their Spanish substitutes will be provided
  • one week before the deadline the data set of approx. 1,000 instances will be provided
  • deadline the participants submit their answers, including the Spanish lexical substitutes for each of the 1,000 test instances. As done in previous Semeval evaluations, the participating teams are free to choose their own submission deadline, as long as it falls within the window of time imposed by Semeval.

References

  • Diana McCarthy and Roberto Navigli (2007) SemEval-2007 Task 10: English Lexical Substitution Task. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, pp.48-53.
  • Philip Resnik and David Yarowsky (2000) "Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation", Natural Language Engineering 5(2), pp. 113-133.
Personal tools