Technical Reports Published in 1993




BU-CEIS-9301: PDF

An Algorithm for Classification by Feature Partitioning

İzzet Şirin and H. Altay Güvenir

This paper presents a new methodology for learning from examples, called Classification by Feature Partitioning (CFP), which is an inductive, incremental and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. A partition, which is initially a point in the feature dimension, is expanded through generalization. The CFP algorithm specializes a partition by subdividing it into sub-partitions. It is shown that the CFP algorithm has low sample complexity and training complexity. CFP is also empirically evaluated in three different domains, and the results are compared with Instance-based learning, Nested Generalized Exemplars and Decision Tree techniques.

BU-CEIS-9302: PDF

A Lexical-Functional Grammar for Turkish (M. Sc. Thesis)

Zelal Güngördü

Natural language processing is a research area which is becoming increasingly popular each day for both academic and commercial reasons. Syntactic parsing underlies most of the applications in natural language processing. Although there have been comprehensive studies of Turkish syntax from a linguistic perspective, this is one of the first attempts for investigating it extensively from a computational point of view. In this thesis, a lexical-functional grammar for Turkish syntax is presented. Our current work deals with regular Turkish sentences that are structurally simple or complex.

BU-CEIS-9303: PDF

An ATN Grammar for Turkish (M. Sc. Thesis)

Coşkun Demir

Syntactic parsing is an important step in any natural language processing system. Augmented Transition Networks (ATNs) are procedural mechanisms which have been one of the earliest and most common paradigms for parsing natural language. ATNs have the generative power of a Turing machine and were first popularized by Woods in 1970. This thesis presents our efforts in developing an ATN grammar for a subset of Turkish including simple and complex sentences. There are five networks in our grammar: the sentence (S) network, which includes the sentence structures that falls in our scope, the noun phrase (NP) networkö the adverbial phrase (ADVP) network and finally the clause (CLAUSE) and gerung (GERUND) networks for handling complex sentences. We present our results from parsing a large number of Turkish sentences.

BU-CEIS-9304: PDF

Two-level Description of Turkish Morphology

Kemal Oflazer

This paper describes a full two-level morphological description of Turkish word structures. The description has been implemented using the PC-KIMMO environment and is based on a root word lexicon of about 23,000 roots words. The phonetic rules of contemporary Turkish (spoken in Turkey) have been encoded using 22 two-level rules while the morphotactics of the agglutinative word structures have been encoded as finite-state machines for verbal, nominal paradigms and other categories. Almost all the special cases of, and exceptions to phonological and morphological rules have been taken into account. In this paper, we describe the rules and the finite state machines along with examples and a discussion of how various special cases were handled. We also describe some known limitations and problems with this description.

BU-CEIS-9305: PDF

Genetic Synthesis of Unsupervised Learning Algorithms

Ali Daşdan and Kemal Oflazer

This paper presents new unsupervised learning algorithms that have been synthesized using a genetic approach. A set of such learning algorithms has been compared with the classical Kohonen's Algorithm on the Self-Organizing Map and has been found to provide a better performance measure. This study indicates that there exist many unsupervised learning algorithms that lead to an organization similar to that of Kohonen's Algorithm, and that genetic algorithms can be used to search for optimal algorithms and optimal architectures for the unsupervised learning.

BU-CEIS-9306: PDF

A Genetic Algorithm for Classification by Feature Partitioning

H. Altay Güvenir and İzzet Şirin

In this paper we describe a method for hybridizing a genetic algorithm and a feature partitioning (FP) classification algorithm. Learning in FP is accomplished by storing the objects separately in each feature dimension as partitions of the set of values. A partition is expanded through generalization, which is limited by a generalization limit. Prediction in FP is done through a voting among the class predictions of each feature. The effect of the prediction of a feature in the voting is proportional with the weight of that feature. Feature partitioning method is implemented in the CFP (Classification by Feature Partitioning) algorithm. We use the genetic algorithm and a training data set to learn real-valued weights and generalization limits associated with each feature in that domain. The experimental results indicate that the genetic algorithm can be used to learn the domain dependent parameters of a feature partitioning classifier, and the resulting hybrid system can be used to create compact representations with very good predictive accuracy.

BU-CEIS-9308: PDF

Complexity of the CFP, a Method for Classification Based on Feature Partitioning

H. Altay Güvenir and İzzet Şirin

This paper presents a new methodology for learning from examples, called Classification by Feature Partitioning (CFP). Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. A partition is expanded through generalization or specialized by subdividing it into sub-partitions. It is shown that the CFP algorithm has a low sample and training complexity.

BU-CEIS-9310: PDF

Empirical Evaluation of the CFP Algorithm

İzzet Şirin and H. Altay Güvenir

This paper presents a new methodology for concept learning from examples, called Classification by Feature Partitioning (CFP), which is an inductive, incremental and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. A partition is expanded through generalization or specialized by dividing it into subpartitions. An empirical evaluation of the CFP on real-world data sets is presented.

BU-CEIS-9311: PDF

Parallel Classification by Feature Partitioning

Hüseyin Simitci and H. Altay Güvenir

This work presents a parallel method for learning from examples using parallel feature partitioning (PFP). Feature partitioning (FP) is an inductive, incremental and supervised learning method proposed by Sirin and Güvenir. PFP assigns feature dimensions to separate nodes. Learning in PFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. Every node expands a partition, which is initially a point in the feature dimension through generalization. The CFP algorithm specializes a partition by subdividing it into sub-partitions. PFP is implemented in the PCFP (Parallel Classification by Feature Partitioning) algorithm. PCFP is tested in six different domains, and results are compared with CFP of Sirin and Güvenir.

BU-CEIS-9312: PDF

Learning with Feature Partitioning (M. Sc. Thesis)

İzzet Şirin

This report presents a new methodology of learning from examples, based on feature partitioning Classification by Feature Partitioning (CFP) is a particular implementation of this technique, which is an inductive, incremental, and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. A partition, a basic unit of representation which is initially a point in the feature dimension, is expanded through generalization. The CFP algorithm specializes a partition by subdividing it into two subpartitions. Theoretical (with respect to PAC--model) and empirical evaluation of the CFP is presented and compared with some other similar techniques. Keywords: Machine learning, inductive learning, incremental learning, supervised learning, feature partitioning.

BU-CEIS-9313: PDF

Design and Implementation of a Spelling Checker for Turkish

Ayşın Solak and Kemal Oflazer

This paper presents the design and implementation of a spelling checker for Turkish. Turkish is an agglutinative language in which words are formed by affixing a sequence of morphemes to a root word. Parsing agglutinative word structures has attracted relatively little attention except for applications areas for general purpose morphological processors. Parsing words in such languages even for spelling checking purposes requires substantial morphological and morphophonemic analysis techniques, and spelling correction (not addressed in this paper) is significantly more complicated. In this paper, we present the design and implementation of a morphological root-driven parser for Turkish word structures which has been incorporated into a spelling checking kernel for on-line Turkish text. The agglutinative nature of the language complex word formations, various phonetic harmony rules, and subtle exceptions present certain difficulties not usually encountered in the spelling checking of languages like English and make this a very challenging problem.

BU-CEIS-9314: PDF

Heterogeneous Inference in Design

Varol Akman

For those of us involved in the attempt to construct formal models and environments in which the world of design can be subjected to scientific experimentation, the raison d'etre of logic has been rather well-understood. The aim of this position paper is not really to challenge this view but rather to complement and extend it. Specifically, we discuss why heterogeneous inference - inference that proceeds from information represented in more than one form - is crucial.

BU-CEIS-9315: PDF

Computational Situation Theory

Erkan Tın and Varol Akman

Situation theory has been developed over the last decade and various versions of the theory have been applied to a number of linguistic issues. However, not much work has been done in regard to its computational aspects. In this paper, a medium (called BABY-SIT) which uses situation-theoretic constructs as its computational foundation is proposed. The features of this experimental environment are compared to those of the existing approaches towards `computational situation theory.'

BU-CEIS-9316: PDF

HYPERSOLVER: A Graph-based Tool for Modeling with Sets

Varol Akman and Müjdat Pakkan

This paper investigates an alternative set theory (due to Peter Aczel) which uses a graphical representation for sets and thereby allows the representation of non-well-founded sets. A program, called HYPERSOLVER, which can solve systems of equations defined in terms of sets in the universe of this new theory is presented.

BU-CEIS-9317: PDF

Artificial Minds: Fact or Fantasy?

David Davenport

Trying to fathom the inner workings of the human mind has been one of the main preoccupation's of philosophy and psychology. The advent of the electronic digital computer provided both a new tool and a new perspective for this quest, and spawned the research field known today as artificial intelligence (A.I.). Yet, is an artificial intelligence a real possibility, or is it simply a myth, unreachable in principle? Would an A.I. be conscious, would it literally be an artificial mind, or is there something intensely human about minds which preclude our constructing one? Despite years of research, we still have no real answer to these most basic of questions. Speculation and controversy abound, rekindled of late by the resurgence of the connectionist paradigm. This paper examines some of the key arguments in the debate and suggests a new theory based on "inscriptors" may offer plausible solutions. Keywords: Artificial Intelligence, Chinese room experiment, Turing test, symbol grounding, Inscriptors.