Finding important information in unstructured text

From Language and Information Technologies

Jump to: navigation, search

A vast majority of the information we deal with in everyday life consists of raw, unstructured text, where the most important facts or concepts are not always readily available, but hidden in the myriad of details that accompany them. To handle and digest the sheer amount of information we are exposed to in this information age, more sophisticated procedures are required to unveil the important parts of a text, and to allow us to process more information in less time. The goal of this project is to develop robust and accurate techniques to automatically extract important information from unstructured text, in the form of keyphrases (keyphrase extraction), entire sentences (extractive summarization), or automatically assigned categories (category assignment).


Funded by Google and Texas Higher Education Coordinating Board

People

Students

Publications

  • Hakan Ceylan and Rada Mihalcea, The Decomposition of Human-Written Book Summaries, in Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2009), Mexico City, Mexico, March 2009.
  • Andras Csomai and Rada Mihalcea, Linking Documents to Encyclopedic Knowledge, IEEE Intelligent Systems, special issue on "Natural Language Processing for the Web," 2008.
  • Kino Coursey, Rada Mihalcea and William Moen, Automatic Keyword Extraction for Learning Object Repositories, in Proceedings of the Conference of the American Society for Information Science and Technology (ASIST 2008), Columbus, Ohio, October 2008. pdf
  • Andras Csomai and Rada Mihalcea, Linguistically Motivated Features for Enhanced Back-of-the-book Indexing, in Proceedings of the Association for Computational Linguistics (ACL 2008), Columbus, Ohio, June 2008. pdf
  • Andras Csomai and Rada Mihalcea, Linking Educational Materials to Encyclopedic Knowledge, in Proceedings of the International Conference on Artificial Intelligence in Education (AIED 2007), Los Angeles, CA, July 2007. pdf
  • Rada Mihalcea and Hakan Ceylan, Explorations in Automatic Book Summarization, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2007), Prague, June 2007. pdf
  • Andras Csomai and Rada Mihalcea, Investigations in Unsupervised Back-of-the-Book Indexing, in Proceedings of the Florida Artificial Intelligence Research Society (FLAIRS 2007), Key West, May 2007. Best paper in track. pdf
  • Andras Csomai and Rada Mihalcea, Creating a Testbed for the Evaluation of Automatically Generated Back-of-the-book Indexes, in Proceedings of the Conference on Computational Linguistics andIntelligent Text Processing (CICLing), LNCS, Mexico City, February 2006. pdf data
Personal tools