Bilkent University
Department of Computer Engineering


Generalization of Predicates with String Arguments


Göker Canitezer

M.S. in Computer Engineering

Supervisors: Prof. Dr. H. Altay Güvenir, Asst. Prof. Dr. Ilyas Çiçekli


String/sequence generalization is used in many different areas such as machine learning, example-based machine translation and DNA sequence alignment. In this thesis, a method is proposed to find the generalizations of the predicates with string arguments from the given positive examples. Trying to learn from only positive examples is a very hard problem in machine learning, since finding the global optimal point, covering all the positive examples and not covering any negative examples, to stop generalization is nearly impossible. All the work done until now is about employing a heuristic to find the best solution. This work is one of them. In this project, some restrictions applied by the SLGG (Specific Least General Generalization) algorithm, which is developed to be used in an example-based machine translation system, are relaxed to find the all possible alignments of two strings. Moreover, an Euclidean distance like scoring mechanism is used to find the most specific generalizations. Some of the generated templates are eliminated by four different selection/filtering approach to get a good solution set. Finally, the result set is presented as a decision list, which provides the handling of exceptional cases.

Keywords: generalization, rlgg, slgg, sequence alignment, ILP, machine translation


DATE: January 29, 2002, Tuesday @ 10:00