Turkish Factoid Question Answering System


Nagehan Pala Er
Computer Engineering Department
Bilkent University

When we use web search engines, we get a document set, and then read them to find the desired information. There are many situations where we want a particular piece of information rather than a document set. Question Answering is the task of returning a particular piece of information to the user in response to a question. If the information is a simple fact, especially if the fact is a named entity like a person or organization, the task is called Factoid Question Answering. There are many factoid question-answering systems developed for English, but there is no such a system for Turkish. In this talk, we introduce our Turkish Factoid Question Answering System that is developed by considering typical factoid question answering systems. Our system performs its task in three steps: question processing, passage retrieval and answer processing. In the first step, question is classified and its answer type is identified. Answer type taxonomy is developed for Turkish. The answer type taxonomy is limited to the types that can be identified by the Named Entity Tagger. A list of keywords from the question is also created. These keywords are used to query an information retrieval system in the second step. We use the web service of Yahoo web search engine to retrieve documents. Then, a set of potential answer passages is extracted from the retrieved set of documents. In the final step, specific answers are extracted from the passages. The most successful answer extraction method is to combine several techniques. Answer-type pattern extraction method is one of them. Several answer patterns for each answer type are automatically extracted from the web. One of our future works is to generalize these answer-type patterns to increase the recall. The result of answer-type pattern matching technique is used as a feature by a classifier. Answer type matching, number of matched question keywords, and the length of the longest sequences of question terms are the other features used by the classifier. The classifier ranks the candidate answers and the answers above a threshold are returned to the user. The lack of some good linguistic tools for Turkish is a limiting factor on the performance of the system.


DATE: 21 April, 2008, Monday@ 15:40