Bilkent University
Department of Computer Engineering


Noun Phrase Chunker for Turkish Using Dependency Parser


Mucahit Kutlu
MSc. Student
Computer Engineering Department
Bilkent University

Noun phrase chunking is a sub-category of shallow parsing that can be used for many natural language processing tasks. In this thesis, we propose a noun phrase chunker system for Turkish texts. We use a weighted constraint dependency parser to represent the relationship between sentence components and to determine noun phrases. The dependency parser uses a set of hand-crafted rules which can combine morphological and semantic information for constraints. The rules are suitable for handling complex structures because of their flexible structure. The dependency parser can be easily used for shallow parsing of all phrase types by changing the employed rule set. Being lack of reliable human tagged dataset is a significant problem for natural language studies about Turkish. Therefore, we constructed the first noun phrase dataset for Turkish. According to our evaluation results, our noun phrase chunker gives promising results on this dataset.

The correct morphological disambiguation of words is required for correctness of the dependency parser. Therefore, in this thesis, we also propose a hybrid morphological disambiguation technique which combines statistical information, hand-crafted grammar rules and transformation based learning rules. We have also constructed a dataset for testing the performance of our disambiguation system. According to our tests, our disambiguation system gives very good results.


DATE: 26 July, 2010, Monday @ 13:00