Bilkent University
Department of Computer Engineering


A New Approach to Search Result Clustering and Labeling


Anıl Türel
MSc. Student
Computer Engineering Department
Bilkent University

Search engines present query results as a long ordered list of web documents divided into several pages. Post-processing of information retrieval results for easier access to the desired information is an important research problem. A way of solving this problem is grouping search results by topics and labeling these groups to reflect the topic of each cluster. In this thesis, we present a novel search result clustering approach to split the long list of documents returned by search engines into elegantly grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient and sequential k-means clustering algorithms. On the other hand, labeling of clusters is very important because meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra time. Additionally, labels should reflect the contents of documents within the cluster. To be able to label clusters effectively, a new cluster labeling method based on term weighting is introduced. In addition, we present a new metric that employ precision and recall to assess the success of cluster labeling. We adopt a comparative evaluation strategy to derive the relative performance of the proposed method with respect to the two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Moreover, we performed experiments using the publicly available Ambient and ODP-239 datasets. Experimental results show that the proposed method can successfully achieve both clustering and labeling tasks.

Keywords: Search result clustering, cluster labeling, web information retrieval, labeling evaluation.


DATE: 24 August, 2011, Wednesday @ 10:40