Bilkent University
Department of Computer Engineering


Constructing Turkish Test Collections for Search Engine Result Diversification


Bilge Köroğlu
MSc Student Student
Computer Engineering Department
Bilkent University

In the last two decades, web search engines have undertaken a crucial role in satisfying information needs. It is essential to rank relevant pages to the submitted query at the beginning of search result lists. A query can have more than one meaning, like "jaguar" that means an animal, a car brand name, a kind of cocktail, etc… The queries, which have different meanings, are named as ambiguous or multi-intent query. Actual intend of the user from only the submitted query is needed to be clarified. For this purpose, search engine result diversification algorithms are proposed to list web pages corresponding to different meanings of ambiguous queries in higher rank positions.

The performance of search result diversification algorithms can be measured using language specific test collections. To the best of our knowledge, there is no Turkish test collection for the evaluation of these algorithms. In our work, six different test collections are constructed. The corpora of web pages for these test collections are retrieved from widely used web search engines, Google and Bing. These test collections are the same in terms of the queries but different from each other by distinct formulization of submitted queries and unique patterns of page retrieval from two search engines. Different diversification algorithms can be objectively compared on these collections with standard Information Retrieval evaluation metrics, which are redesigned for diversification. They are aimed to be the standard test collections for result diversification studies in Turkish.


DATE: 21 November, 2011, Monday @ 16:50