Language and Information Technologies | University of North Texas

About

Word Sense Disambiguation (WSD) is a core task in natural language processing, and is considered essential for major applications like text understanding, common sense reasoning, and machine translation. The research done so far in WSD has produced good disambiguation schemes for the relatively few words for which training data have been available. In contrast, there have been only few attempts to create systems that disambiguate all words in unrestricted text. The goal of the SenseLearner project is to conduct exploratory research of various WSD techniques to enable the development of a tool for semantic tagging of all words in unrestricted text.

This project is sponsored by the National Science Foundation

People

Downloads

  • Software
    • SenseLearner 2.0 [download] (June 13, 2005). Try the demo!
      • Changes in version 2.0: a client-server model that allows for significantly faster tagging; simpler input file format (the SemCor-like format is not anymore a requierment)
    • SenseLearner 1.0 - A Tool for All-Words Word Sense Disambiguation (beta) [download]

  • Data
    • OMWE 1.0: Sense tagged data for 288 nouns, created within the Open Mind Word Expert framework during one year of activity (2002) [download]
    • OMWE 2.0: Sense tagged data for nouns, verbs, adjectives, created within the Open Mind Word Expert framework. These data sets were used during the Senseval-3 evaluations. Data for 57 English ambiguous words, annotated with WordNet/Wordsmyth senses [download] (see Open Mind Word Expert for additional annotated data).

Publications

  • Unsupervised and minimally supervised WSD
    • Rada Mihalcea and Andras Csomai, SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text, in Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics, companion volume (ACL 2005), Ann Arbor, MI, June 2005.[pdf] [ps]
    • Rada Mihalcea and Ehsanul Faruque, SenseLearner: Minimally Supervised Word Sense Disambiguation for All Words in Open Text, in Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain, July 2004. [ps]
    • Rada Mihalcea, Paul Tarau and Elizabeth Figa, PageRank on Semantic Networks, with application to Word Sense Disambiguation, in Proceedings of The 20st International Conference on Computational Linguistics (COLING 2004), Switzerland, Geneva, August 2004. [ps]
    • Rada Mihalcea, Knowledge Based Methods for Word Sense Disambiguation, book chapter in Word Sense Disambiguation: Algorithms, Applications, and Trends, Editors Phil Edmonds and Eneko Agirre, Kluwer, 2004.
    • Rada Mihalcea, Unsupervised Natural Language Disambiguation Using Non-Ambiguous Words, book chapter in Current Issues in Linguistic Theory: Recent Advances in Natural Language Processing, Editors Nicolas Nicolov and Ruslan Mitkov, John Benjamins Publishers, 2004.
    • Rada Mihalcea, Voted Co-training for Bootstrapping Sense Classifiers, in Proceedings of the European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain, August 2004.
    • Rada Mihalcea, Co-training and Self-training for Word Sense Disambiguation, in Proceedings of the Conference on Natural Language Learning (CoNLL 2004), Boston, May 2004. [ps]
  • Identification of semantic relations between words [See also related project SPOT]
    • Lei Shi and Rada Mihalcea, Semantic Parsing Using FrameNet and WordNet, in Proceedings of the Human Language Technology Conference, companion volume (HLT/NAACL 2004), Boston, May 2004. [ps]
    • Lei Shi and Rada Mihalcea, An Algorithm for Open Text Semantic Parsing, in Proceedings of the ROMAND 2004 workshop on ``RObust Methods in Analysis of Natural language Data'', Geneva, Switzerland, August 2004. [ps]
  • Construction of sense-annotated data
    • Rada Mihalcea and Timothy Chklovski, Teaching Computers: Building Multilingual Linguistic Resources with Volunteer Contributions over the Web, in The LISA Newsletter - Globalization Insider, September 2004. [online]
    • Rada Mihalcea and Timothy Chklovski, Building Sense Tagged Corpora with Volunteer Contributions over the Web, book chapter in Current Issues in Linguistic Theory: Recent Advances in Natural Language Processing, Editors Nicolas Nicolov and Ruslan Mitkov, John Benjamins Publishers, 2004.
    • Rada Mihalcea, Timothy Chklovski and Adam Killgariff, The Senseval-3 English Lexical Sample Task, in Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain, July 2004. [ps] [data]


Last modified 11/13/2004