Bilkent University
Department of Computer Engineering
S E M I N A R

 

Morphological Disambiguation of Turkish Words

 

Mücahid Kutlu
MSc. Student
Computer Engineering Department
Bilkent University

In this study, we propose a morphological disambiguation method for Turkish, which is an agglutinative language and have flexible grammar rules in terms of structure of sentences. We use a hybrid method which combines statistical information with hand-crafted rules and learned rules. We use five different techniques step by step and we move to next step until disambiguation is performed. First, we select the most likely tag of word. Second, we perform selection according to suffix with considering hand crafted rules. Third, we use hand crafted rules for selection. Forth, we use transformation based learned rules. If the word is still ambiguous, we use some heuristics under control of hand crafted rules. For training we constructed a dataset and applied ten-fold cross validation with corpus for testing. We obtained %93.5 accuracy on average when whole morphological parse is considered in calculation. The accuracy increased to %94.3 when only part-of-speech tag and inflections after last derivation is considered. Our accuracy is %97.6 in terms of part-of-speech tagging.

 

DATE: 22 March, 2010, Monday @ 15:40
PLACE: EA 409