Bilkent University
Department of Computer Engineering


Optimization of Search Technologies in the Education Domain


Tolga Yılmaz
MSc Student
Computer Engineering Department
Bilkent University

In educational search systems, it is common for users to make spelling mistakes. In the first part of our work, actual query logs of two commercial search engines in the education domain are analyzed in terms of spelling mistakes using 5 well-known spell correction software that are not education specific and lack the terms that are used in the education field. It is shown that by extending the spell-check dictionary of one of them, even with a small-sized education oriented word-list, one can improve the precision, recall and F1 values of a spell-checker.

The second part of our work involves Social Question Answering (Q&A) websites which are commonly used by students. In order to gain contextual and behavioral insights, we have extracted the content of a fairly used Q&A website with a scraper we implemented. After analyzing the user behavior, we argue that the answers to the questions posed by the users can be found by simple web searches. In accordance with our previous study suggesting that classifying the subject of these questions can be used to enhance search engine ranking, we implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and two retrieval based ones that utilize external resources. We show that the ensemble method is more accurate than others. Further, we fetch the snippets of results for a subset of educational questions from a commercial search engine and implement four new re-ranking algorithms that utilize the classifier to re-rank the result pages. We annotate the result pages of each question using a graded relevance scheme. We take the default ranking of the search engine as baseline and compare the new methods using Normalized Discounted Cumulative Gain (NDCG) and Normalized Expected Reciprocal Rank(NERR) measures and show that our methods outperform the baseline.


DATE: 04 May, 2015, Monday @ 15:40