Bilkent University
Department of Computer Engineering


Multi-Document News Summarization Based on Sentence Coverage and Graphs


Tolga Çekiç
MSc Student
(Supervisor: Prof. Dr. Fazlı Can)
Computer Engineering Department
Bilkent University

The high amount of online news sources available may cause important information about a topic to be spread out on many different documents that were published in a certain time frame. Automated summarization of news articles from the same topic can be a solution to this problem, because it allows a reader to access important information quickly and efficiently. This research presents a number of summarization methods designed to generate a summary of multiple news documents on the same topic. These methods use extractive summarization. An extractive summary is generated by using parts from source documents directly to generate a summary. Methods described in this thesis extract sentences from documents. Intuitive assumption made for designing a summarization method is that sentences that are significant enough to be put into a summary needs to cover information from other sentences. Each sentence from the source document is compared with each other using cover-coefficient and containment similarity algorithms. These algorithms give a score based on the shared information between sentences. Our initial summarization methods use these relations between sentences to identify important sentences to form the summary. Furthermore, these relations are used to create graphs. In graphs, sentences are designated as vertices. Links and link weights are set according to relations between sentences obtained by aforementioned algorithms. Link analysis algorithms such as PageRank, are used to rank sentences according to their importance in graphs. This ranking is used in extracting sentences for summaries. After obtaining results from these methods, a final method based on data fusion is developed to collect summaries from all other methods. Sentences that was chosen for summaries by different methods are clustered. Another intuitive assumption made is that sentences that are different from most other sentences contain interesting formation that is desired in a summary. Based on this assumption, outliers of the previously mentioned cluster is identified to be used in ranking of sentences to generate a summary. Experiments for summarization methods are performed using ROUGE tool by comparing summaries generated by our methods with golden standard summaries generated by human evaluators. Evaluation results indicate that our methods perform well compared to other summarization studies.


DATE: 9 September, 2015, Wednesday @ 16:00