Bilkent University
Department of Computer Engineering


Automated Text Summarization and Keyphrase Extraction


Gonenc Ercan
Computer Engineering Department
Bilkent University

As the number of electronic documents increase rapidly, the need for faster techniques to asses the relevance of documents emerges. A summary can be considered as a concise representation of the underlying text. To form an ideal summary, a full understanding of the document is essential. For computers, full understanding is difficult, if not impossible. Thus, selecting important sentences from the original text and presenting these sentences as a summary is a common technique in automated text summarization research. The lexical cohesion structure of the text can be exploited to determine the importance of a sentence/phrase. Lexical chains are useful tools to analyze the lexical cohesion structure in a text. This thesis discusses our research on automated text summarization and keyphrase extraction using lexical chains. We investigate the effect of the use of lexical cohesion features in keyphrase extraction, with a supervised machine learning algorithm. Our summarization algorithm constructs the lexical chains, detects topics roughly from lexical chains, segments the text with respect to the topics and selects the most important sentences. Our experiments show that lexical cohesion based features improve keyphrase extraction. Our summarization algorithm has achieved good results, compared to some other lexical cohesion based algorithms.


DATE: August 31, 2006, Thursday@ 09:30