Technical Reports Published in 1994


Spelling Correction in Agglutinative Languages

Kemal Oflazer and Cemaleddin Güzey

This paper presents an approach to spelling correction in agglutinative languages that is based on two-level morphology and a dynamic programming based search algorithm. Spelling correction in agglutinative languages is significantly different than in languages like English, because the concept of a word in such languages is much wider that the entries found in a dictionary, owing to productive word formation by derivational and inflectional affixations. After an overview of certain issues and relevant mathematical preliminaries, we formally present the problem and the our proposed solution. We then present results from our experiments with spelling correction in Turkish, a Ural--Altaic agglutinative language.


Parsing Turkish with the Lexical Functional Grammar Formalism

Zelal Güngördü and Kemal Oflazer

This paper describes our work on parsing Turkish using the lexical-functional grammar formalism. This work represents the first effort for parsing Turkish. Our implementation is based on Tomita's parser developed at Carnegie-Mellon University Center for Machine Translation. The grammar covers a substantial subset of Turkish including simple and complex sentences, and deals with a reasonable amount of word order freeness. The complex agglutinative morphology of Turkish lexical structures is handled using a separate two-level morphological analyzer. After a discussion of key relevant issues regarding Turkish grammar, we discuss aspects of our system and present results from our implementation. Our initial results suggest that our system can parse about 82 % of the sentences directly and almost all the remaining with very minor pre-editing.


Efficiency of Randomly Assigning Tasks to Processors

Wm. Randolph Franklin, Chandrasekhar Narayanaswami, Mohan S. Kankanhalli, and Varol Akman

We have implemented the uniform grid technique - due to the first author - on several parallel machines. It is complicated to globally equalize the load across the processors to correct for the fact that some grid cells have more data than others. Therefore, in our implementations we randomly assigned grid cells to different processors. The actual performance observed was very good and we obtained a near-linear speedup for many different algorithms. In this paper, we analyze the efficiency of this random assignment method of load balancing for parallel machines. The problem of analyzing the efficiency of leaving the loads uneven may be abstracted as follows. Suppose that we have N independent tasks of unit weight and P processors to execute them. Assume that we are randomly, uniformly, and independently assigning each job to one of the processors. Since some processors will have more tasks than others, the efficiency, or the optimal time, N/P, divided by the expected time until the last processor finishes will be less than unity. This paper analyzes this and presents finite numbers and asymptotic equations. For example, N = 1000, P = 100 gives 53% efficiency. N = 1000, P = 1000 gives 18% efficiency. These efficiency figures are surprisingly high and they explain the good performance of our implementations.


Review of Action Semantics by Peter D. Mosses

Varol Akman

This is a review of Peter D. Mosses' recent book ACTION SEMANTICS (Cambridge Tracts in Theoretical Computer Science 26, Cambridge University Press, U.K., 372 + xx pages, 1992).


The Logic of Counteractions

Erkan Tın and Varol Akman

We extend causal theories and study actions in domains involving multiple agents. Causal theories, invented by Yoav Shoham, are based on a temporal nonmonotonic logic and have computationally tractable aspects. Since Shoham's formalism does not provide an adequate mechanism for representing simultaneous actions and specifying their consequences, we introduce the notion of counteractions while preserving the efficiency and model-theoretic properties of causal theories.


Issues in Commonsense Set Theory

Müjdat Pakkan and Varol Akman

The success of set theory as a foundation for mathematics inspires its use in artificial intelligence, particularly in commonsense reasoning. In this survey, we briefly review classical set theory from an AI perspective, and then consider alternative set theories. Desirable properties of a possible commonsense set theory are investigated, treating different aspects like cumulative hierarchy, self-reference, cardinality, etc. Assorted examples from the ground-breaking research on the subject are also given.


Commonsense Aspects of Buying and Selling

Murat Ersan, Ebru Ersan, and Varol Akman

We describe an experimental program that implements a commonsense micro-theory for buying and selling. Our system characterizes how intelligent agents hold items and money, how they buy and sell items, and the way money and items are transferred. The ontology of the system includes money (cash, check, credit card), agents (people or organizations), items (movable, real estate, service), barter, and the notions of transfer, loan, buying by installments, profit, and loss. Our work has been motivated by the Cyc project of Douglas Lenat.


Representing Emotions in Terms of Object Directedness

Varol Akman and Hakime G. Ünsal

A logical formalization of emotions is considered to be tricky because they appear to have no strict types, reasons, and consequences. On the other hand, such a formalization is crucial for commonsense reasoning. Here, the so-called "object directedness" of emotions is studied by using Helen Nissenbaum's influential ideas.


Nonstandard Set Theories and Information Management

Varol Akman and Müjdat Pakkan

The merits of set theory as a foundational tool in mathematics stimulate its use in various areas of artificial intelligence, in particular intelligent information systems. In this paper, a study of various nonstandard treatments of set theory from this perspective is offered. Applications of these alternative set theories to information or knowledge management are surveyed.


Sets in Context and Related Issues

Varol Akman and Mehmet Surav

We formalize the notion of "context" for sets. We employ two predefined constructs for this - an extension predicate and a context sensitive membership relation. Ext returns the members of its first argument, with the context C specified as its second argument. The new context sensitive membership relation holds when the left hand side of the relation is in the extension of the right hand side. We show that the new membership relation is useful for a "commonsense set theory" a la Perlis and Zadrozny, and discuss related issues.


Axiomatic Set Theories and Common Sense

Mehmet Surav and Varol Akman

Various axiomatic set theories (ZF, NBG, NF, and KPU) are studied with a critical eye. The basic mathematical and philosophical reasons behind their axioms are given, as well as their review from the commonsense point of view. An introduction to a "commonsense set theory" is attempted at the end.


Real-time Transaction Scheduling in Database Systems

Özgür Ulusoy and Geneva G. Belford

A database system supporting a real-time application, which can be called `a real-time database system (RTDBS)', has to provide real-time information to the executing transactions.Each RTDB transaction is associated with a timing constraint, usually in the form of a deadline. Efficient resource scheduling algorithms and concurrency control protocols are required to schedule the transactions so as to satisfy both timing constraints and data consistency requirements. In this paper we concentrate on the concurrency control problem in RTDBS's. Our work has two basic goals: real-time performance evaluation of existing concurrency control approaches in RTDBS's, and proposing new concurrency control protocols with improved performance. One of the new protocols is locking-based, and it prevents the priority inversion problem by scheduling the data lock requests based on prioritizing data items. The second new protocol extends the basic timestamp-ordering method by involving real-time priorities of transactions in the timestamp assignment procedure. Performance of the protocols is evaluated through simulations by using a detailed model of a single-site RTDBS. The relative performance of the protocols is examined as a function of transaction load, data contention (which is determined by a number of system parameters), and resource contention. The protocols are also tested under various real-time transaction processing environments. The performance of the proposed protocols appears to be good, especially under conditions of high transaction load and high data contention. Keywords: Real-time database systems, transaction scheduling, concurrency control, performance evaluation.


Processing Real-time Transactions in a Replicated Database System

Özgür Ulusoy

A database system supporting a real-time application has to provide real-time information to the executing transactions. Each real-time transaction is associated with a timing constraint, typically in the form of a deadline. It is difficult to satisfy all timing constraints due to the consistency requirements of the underlying database. In scheduling the transactions it is aimed to process as many transactions as possible within their deadlines. Replicated database systems possess desirable features for real-time applications, such as a high level of data availability, and potentially improved response time for queries. On the other hand, multiple copy updates lead to a considerable overhead due to the communication required among the data sites holding the copies. In this paper, we investigate the impact of storing multiple copies of data on satisfying the timing constraints of real-time transactions. A detailed performance model of a distributed database system is employed in evaluating the effects of various workload parameters and design alternatives on the system performance. The performance is expressed in terms of the fraction of satisfied transaction deadlines. A comparison of several real-time concurrency control protocols, which are based on different approaches in involving timing constraints of transactions in scheduling, is also provided in performance experiments. Index Terms -- Real-time database systems, data replication, transaction scheduling, concurrency control, performance evaluation.


Excursions to a Topological Picturebook

Ahmet Arslan and Varol Akman

George K. Francis' A TOPOLOGICAL PICTUREBOOK [Springer-Verlag, New York (1987)] is full of complicated topological figures, mostly drawn manually. Our goal is to automate the production of such illustrations and to obtain publication-quality hardcopy using assorted techniques of computer graphics. To that end, sweeping is discussed as a major surface modelling tool. Some interactive methods are given to produce interesting topological surfaces. The program Tb (which stands for `Topologybook') is described and various pictures generated by this software are presented. Tb is a free-form surface modeller and produces topological shapes with little effort. Central to the implementation of Tb is a paradigm of solid modelling in which computation of a shape is regarded as sweeping with some parametric variations, viz. shape = sweep + control.

CAVEAT: The final copy of this manuscript includes some color pictures which are absent from this version.


BABY-SIT: A Computational Medium Based on Situations

Erkan Tın and Varol Akman

While situation theory and situation semantics provide an appropriate framework for a realistic model-theoretic treatment of natural language, serious thinking on their `computational' aspects has just started. Existing proposals mainly offer a Prolog- or Lisp-like programming environment with varying degrees of divergence from the ontology of situation theory. In this paper, we introduce a computational medium (called BABY-SIT) based on situations. The primary motivation underlying BABY-SIT is to facilitate the development and testing of programs in domains ranging from linguistics to artificial intelligence in a unified framework built upon situation-theoretic constructs. CAVEAT: The final copy of this manuscript includes Figure 4 (a screen dump) which is absent from this version.


A Tool for Tagging Turkish Text

Kemal Oflazer and İlker Kuruöz

Automatic text tagging is an important component in higher level analysis of text corpora, and its functionality output can also be used in many natural language processing applications. This paper describes a part-of-speech (POS) tagger for Turkish text. It is based on a full scale two-level morphological specification of Turkish implemented on the PC-KIMMO environment, augmented with statistical information compilation and use, multi-word construct recognition and constraint- and heuristics-based morphological, and POS ambiguity resolution. The tagger also has additional functionality for fine tuning of the morphological analyzer, such as logging erroneous parses, commonly used roots, etc. The output of the tagger can be used in further syntactic and semantic analysis.


Situated Modeling of Epistemic Puzzles

Murat Ersan and Varol Akman

Situation theory is a mathematical theory of meaning introduced by Jon Barwise and John Perry. It has evoked great theoretical and practical interest and motivated the framework of a few `computational' systems. Unfortunately, there is a lack of `real life' applications on these systems and this paper is a preliminary attempt to remedy this situation. Here, we solve a class of epistemic puzzles, introduced by Raymond Smullyan, using the situation-theoretic programming language PROSIT.


Symbols, Signs and Cognition

David Davenport

Cognition is inextricably linked with the concepts of signs and symbols. The classical, computational, view of cognition, explicitly states this in its "physical symbol system hypothesis." Its major competitor, connnectionism, while claiming to be subsymbolic, implicitly acknowledges both signs and symbols. Unfortunately, both theories have their problems. The computational paradigm fails to explain where symbols come from and how they acquire meaning. The connectionist paradigm fails to provide sufficient representational power and anyway is based on the discredited associationalist philosophy. While the problems are different, the precise nature of the two approaches and the essential difference between them, remains something of an enigma. This paper outlines a theory of "symbols" which not only appears to provide a clear basis for distinguishing between the two paradigms, but also points the way to a new architecture which may help solve the mystery of cognition. Keywords: Artificial intelligence, symbol systems, connectionism, cognition


Situated Processing of Pronominal Anaphora

Erkan Tın and Varol Akman

We describe a novel approach to the analysis of pronominal anaphora in Turkish. A computational medium which is based on situation theory is used as our implementation tool. The task of resolving pronominal anaphora is demonstrated in this environment which employs situation-theoretic constructs for processing.


Effects of Stemming on Turkish Text Retrieval

Ayşın Solak and Fazlı Can

A stemming algorithm developed for the Turkish language is presented. The retrieval performance of the algorithm is then measured in terms of its effect on retrieval performance of a best-match retrieval system. The results show that the stemming algorithm reduces the index dictionary size significantly and also improves the retrieval effectiveness. Keywords: text retrieval


A Performance Evaluation Model for Distributed Real-Time Databases

Özgür Ulusoy and Geneva G. Belford

A real-time database system (RTDBS) is designed to provide timely response to the transactions of data-intensive applications. Each transaction processed in a RTDBS is associated with a timing constraint in the form of a deadline. In this paper, a performance evaluation model is provided to enable distributed RTDBS designers to analyze transaction scheduling algorithms. The model is developed progressing from a simple mathematical analysis to complicated simulations. The performance is expressed in terms of the fraction of satisfied transaction deadlines. The paper also provides an example simulation experiment implemented using the model presented.


A Study of Two Transaction Processing Architectures for Distributed Real-Time Database Systems

Özgür Ulusoy

A real-time database system (RTDBS) is designed to provide timely response to the transactions of data-intensive applications. Processing a transaction in a distributed RTDBS environment presents the design choice of how to provide access to remote data referenced by the transaction. Satisfaction of the timing constraints of transactions should be the primary factor to be considered in scheduling accesses to remote data. In this paper, we describe and analyze two different alternatives to this fundamental design decision. With the first alternative, transaction operations are executed at the sites where required data pages reside. The other alternative is based on transmitting data pages wherever they are needed. Although the latter approach is characterized by large message volumes carrying data pages, it is shown in our experiments to perform better than the other approach under most of the workloads and system configurations tested. The performance metric used in the evaluations is the fraction of transactions that satisfy their timing constraints.

Keywords: Real-time database systems, distributed transaction processing, performance evaluation.


Using a Corpus for Teaching Turkish Morphology

Altay Güvenir and Kemal Oflazer

This paper reports on the preliminary phase of our ongoing research towards developing an intelligent tutoring environment for Turkish grammar. One of the components of this environment is a corpus search tool which, among other aspects of the language, will be used to present the learner sample sentences along with their morphological analyses. Following a brief introduction to the Turkish language and its morphology, the paper describes the morphological analysis and ambiguity resolution used to construct the corpus used in the search tool. Finally, implementation issues and details involving the user interface of the tool are discussed.


Object-Space Parallel Polygon Rendering on Hypercube-connected Multicomputers

Tahsin M. Kurç, Cevdet Aykanat, and Bülent Özgüç

This paper presents algorithms developed for object-space parallelism for polygon rendering on hypercube-connected multicomputers. In object-space parallelism, each processor is given the responsibility to render a portion of the input polygons. Each processor shades and z-buffers the locally generated pixels. After this local rendering, the remaining pixels in each processor should be globally z-buffered. Efficient parallelization of this pixel merging operation is important. This is the only place where an overhead is incurred due to parallelization. In this paper, efficient algorithms are presented to perform local rendering and global pixel merging operations. A modified scanline based z-buffer algorithm is proposed. This algorithm reduces the number of pixels sent and received in the pixel merging operation and avoids the re-initialization of the z-buffer for each scanline. Various algorithms are proposed for pixel merging phase. These algorithms use different communication characteristics of the hypercube multicomputers. Efficient algorithms for load balancing in the pixel merging operation is also proposed and presented. Experimental results obtained on a 16-processor Intel's iPSC/2 hypercube multicomputer are also presented.


An Evaluation of the Validity of the Dexter Hypertext Reference Model: A Case Study

Bilge (Karal) Say and Fazlı Can

Various formal models have been proposed for defining hypertext systems. Among them, the Dexter Hypertext Reference Model is gaining wide acceptance as a base for the design of future hypertext systems and interoperability tools. However, this model has yet to be extended and validated against the hypertext systems it took as its basis. This paper is an attempt in the latter aspect, namely considering a widely used hypertext system and validating it against the original model. For this purpose, HyperCard, which was indeed one of the Dexter Hypertext Reference Model's "reference" systems, is chosen. For validation, we use the concept of refinement with Z, a formal specification language.


An Analysis of Frame Sliced Signature Files

Bülent Tezcan, Gökhan Tür and Fazlı Can

Signature files provide an efficient access tool for information retrieval. They reflect the contents of data objects in terms of bit patterns and act as a filter during query processing. In this study, an exact formula for the false drop probability estimation of multiterm queries for Frame Sliced Signature Files (FSSF) is given. The experimental results, which are obtained using the INSPEC database of 12,684 documents, are provided and compared with the theoretical result


Incremental Query Evaluation for Vertically Partitioned Signature Files in Very Large Databases

Seyit Koçberber and Fazlı Can

Signature files provide efficient retrieval of formatted and unformatted data with a small space overhead. The main idea of all signature based schemes is to reflect the essence of the data objects into bit patterns. Vertical partitioning of signature files provides fast filtering of unqualifying objects by accessing only those bits of the object signatures that are set in the query signature. For very large databases even one slice of database signatures may occupy several disk blocks. An incremental query evaluation method, called INCBIT, is proposed which dynamically avoids the unnecessary blocks of bit slices which need to be accessed by utilizing the conjunctive nature of the queries. Three methods to enhance the performance of INCBIT are introduced. In the first method, bit slices are divided into vertical fragments in which the density of on-bits vary. For high weight queries, after reducing the false drop probability to a negligible level, the remaining on-bits of the query are not used due to the incremental nature of INCBIT. Based on this observation, first using the query bits in the fragments with low on-bit density increases the performance. The second method originates from the fact that clustering the signatures of the records containing discriminating terms obtains a few on-blocks at the end of query processing. This prevents the deterioration of the performance for the majority of the non-zero hit user queries and improves the performance. In the third method, a bit-map for signature blocks is kept in memory for the discriminating terms used in signature clustering. The block bit-map of a term shows the positions of the signature blocks which contain at least one record in which the term exists. This data structure eliminates the block accesses produced due to high false drops at the beginning of query evaluation. The terms with lower database occurrence frequency are specified more frequently in the queries. Such discriminating terms constitute about 20% of all terms; therefore, the bit-map space overhead is small.


Parallelization of an Interior Point Algorithm for Linear Programming (M.Sc. Thesis)

Hüseyin Simitci

In this study, we present the parallelization of Mehrotra's predictor-corrector interior point algorithm, which is a Karmarkar-type optimization method for linear programming. Computation types needed by the algorithm are identified and parallel algorithms for each type are presented. The repeated solution of large symmetric sets of linear equations, which constitutes the major computational effort in Karmarkar-type algorithms, is studied in detail. Several forward and backward solution algorithms are tested, and buffered backward solution algorithm is developed. Heuristic bin packing algorithms are used to schedule sparse matrix-vector product and factorization operations. Algorithms having the best performance results are used to implement a system to solve linear programs in parallel on multicomputers. Design considerations and implementation details of the system are discussed, and some performance results are presented from a number of real problems.


Focusing for Pronoun Resolution in English Discourse: An Implementation (*)

Ebru Ersan and Varol Akman

Anaphora resolution is one of the most active research areas in natural language processing. This study examines focusing as a tool for the resolution of pronouns which are a kind of anaphora. Focusing is a discourse phenomenon like anaphora. Candy Sidner formalized focusing in her 1979 MIT PhD thesis and devised several algorithms to resolve definite anaphora including pronouns. She presented her theory in a computational framework but did not generally implement the algorithms. Her algorithms related to focusing and pronoun resolution are implemented in this thesis. This implementation provides a better comprehension of the theory both from a conceptual and a computational point of view. The resulting program is tested on different discourse segments, and evaluation and analysis of the experiments are presented together with the statistical results.

(*) Revised version of the first author's M.S. thesis.


Situated Modeling of Epistemic Puzzles (*)

Murat Ersan and Varol Akman

Situation theory is a mathematical theory of meaning introduced by Jon Barwise and John Perry. It has evoked great theoretical and practical interest and motivated the framework of a few `computational' systems. PROSIT is the pioneering work in this direction. Unfortunately, there is a lack of real-life applications on these systems and this study is a preliminary attempt to remedy this deficiency. Here, we examine how much PROSIT reflects situation-theoretic concepts and solve a group of epistemic puzzles using the constructs provided by this programming language.

(*) Revised version of the first author's M.S. thesis. This report supercedes BU-CEIS-9417.


Tagging and Morphological Disambiguation of Turkish Text (M.Sc. Thesis)

İlker Kuruöz

A part-of-speech (POS) tagger is a system that uses various sources information to assign possibly unique POS to words. Automatic text tagging is an important component in higher level analysis of text corpora. Its output can also be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging as the structures of many lexical forms are morphologically ambiguous. This thesis presents a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology. The tagger is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local lexical neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has additional functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Test results indicate that the tagger can tag about 97% to 99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, on the average, generates 50% less ambiguous parses and parses almost 2.5 times faster.


Research Issues in Real-Time Database Systems

Özgür Ulusoy

Today's real-time systems are characterized by managing large volumes of data. Efficient database management algorithms for accessing and manipulating data are required to satisfy timing constraints of supported applications. Real-time database systems is a new research area investigating possible ways of applying database systems technology to real-time systems. Management of real-time information through a database systems requires the integration of concepts from both real-time systems and database systems. Some new criteria need to be developed to involve timing constraints of real-time applications in many database systems design issues, such as transaction/query processing, data buffering, CPU and IO scheduling. In this paper, a basic understanding of the issues in real-time database systems is provided and the research efforts in this area are introduced. Different approaches to various problems of real-time database systems are briefly described, and possible future research directions are discussed.


Contexts and Situations (*)

Mehmet Surav and Varol Akman

The issue of context arises in assorted areas of Artificial Intelligence, Mathematical Logic, and Natural Language Semantics. Although its importance is realized by various researchers, there is not much work towards a useful formalization. In this report, we will try to identify the problem, and decide what we need for an acceptable (formal) account of the notion of context. We will present a preliminary model (based on Situation Theory) and give examples to show the use of context in various fields, and the advantages gained by the acceptance of our proposal.

(*) Revised version of the first author's M.S. thesis


An Efficient Mapping Heuristic for Mesh-Connected Parallel Architectures Based on Mean Field Annealing

İsmail Haritaoğlu, Cevdet Aykanat

A new Mean Field Annealing (MFA) formulation is proposed for the mapping problem for mesh-connected architectures. The proposed MFA heuristic exploits the conventional routing scheme used in mesh interconnection topologies to introduce an efficient encoding scheme. An efficient implementation scheme which decreases the complexity of the proposed algorithm by asymptotical factors is also developed.Experimental results also show that the proposed MFA heuristic approaches the speed performance of the fast Kernighan-Lin heuristic while approaching the solution quality of the powerful simulated annealing heuristic.


A Global Routing Heuristic for FPGAs Based on Mean Field Annealing

İsmail Haritaoğlu, Cevdet Aykanat

In this work, we propose an order-independent global routing algorithm for SRAM type FPGAs based on Mean Field Annealing. The performance of the proposed global routing algorithm is evaluated in comparison with LocusRoute global router on ACM/SIGDA Design Automation benchmarks. Experimental results indicate that the proposed MFA heuristic performs better than the LocusRoute in terms of the distribution of the channel densities.


Mapping and FPGA Global Routing Using Mean Field Annealing (M. Sc. Thesis)

İsmail Haritaoğlu

Mean Field Annealing algorithm which was proposed for solving combinatorial optimization problems combines the properties of neural networks and Simulated Annealing. In this thesis, MFA is formulated for mapping problem in parallel processing and global routing problem in physical design automation of Field Programmable Gate Array (FPGAs) A new Mean Field Annealing (MFA) formulation is proposed for the mapping problem for mesh-connected and hypercube architectures. The proposed MFA heuristic exploits the conventional routing scheme used in mesh and hypercube interconnection topologies to introduce an efficient encoding scheme. An efficient implementation scheme which decreases the complexity of the proposed algorithm by asymptotical factors is also developed. Experimental results also show that the proposed MFA heuristic approaches the speed performance of the fast Kernighan-Lin heuristic while approaching the solution quality of the powerful simulated annealing heuristic. Also, we propose an order-independent global routing algorithm for SRAM type FPGAs based on Mean Field Annealing. The performance of the proposed global routing algorithm is evaluated in comparison with LocusRoute global router on ACM/SIGDA Design Automation benchmarks. Experimental results indicate that the proposed MFA heuristic performs better than the LocusRoute.


Design and Implementation of a Verb Lexicon and a Verb-sense Disambiguator for Turkish (M. Sc. Thesis)

Okan Yılmaz

The lexicon has a crucial role in all natural language processing systems and has special importance in machine translation systems. This thesis presents the design and implementation of a verb lexicon and a verb sense disambiguator for Turkish. The lexicon contains only verbs because verbs encode events in sentences and play the most important role in natural language processing systems, especially in parsing (syntactic analyzing) and machine translation. The verb sense disambiguator uses the information stored in the verb lexicon that we developed. The main purpose of this tool is to disambiguate senses of verbs having several meanings, some of which are idiomatic. We also present a tool implemented in Lucid Common Lisp under X-Windows for adding, accessing, modifying, and removing entries of the lexicon, and a semantic concept ontology containing semantic features of commonly used Turkish nouns.


Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction

Kemal Oflazer

(Updated Feb 15 1995)

This paper presents a notion of error-tolerant recognition with finite state recognizers. Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite state recognizer. Such recognition has applications in error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the basic concepts and algorithms involved, we give examples from two applications: In the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected, and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology is fully described by two--level finite state transducers, regardless of the word formation processes. In the context of spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. Again it can be applied to any language with a word list comprising all inflected forms, or whose morphology is fully described by two--level finite state transducers. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages such as English, Dutch, French, German, Italian (and others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a SparcStation 10/41. For spelling correction in Turkish, error-tolerant recognition operating with an (circular) recognizer of Turkish words (with about 29,000 states and 119,000 transitions) can generate all candidate words in less than 20 milliseconds, with edit distance 1.


On the Use of Inductive Reasoning in Program Synthesis: Prejudice and Prospects

Pierre Flener, Lubos Popelinsky

In this position paper, we give a critical analysis of the deductive and inductive approaches to program synthesis, and of the current research in these fields. From the shortcomings of these approaches and works, we identify future research directions for these fields, as well as a need for cooperation and cross-fertilization between them.

Published in: L. Fribourg and F. Turini (eds), Proc. of LOPSTR/META'94, LNCS, Springer-Verlag, 1994.


ILP and Automatic Programming: Towards Three Approaches

Pierre Flener, Lubos Popelinsky, Olga Stepankova

The prospects of inductive logic programming (ILP) with respect to automatic programming (program synthesis) are discussed. We argue that logic program synthesis from incomplete information is but a niche of ILP, and study consequences of this statement. Then, three approaches are described: schema-driven synthesis of logic programs from incomplete specifications, the role of transformation techniques in ILP, and interactive assumption-based inductive learning.

Published in: S. Wrobel (ed), Proc. of ILP'94 (GMD-Studien Nr. 237), pp. 351--364, Bad Honnef, Germany, September 1994.


An Application of Genetic Programming to the 4-Op Problem Using Map-Trees

Tevfik Aytekin, E. Erkan Korkmaz and H. Altay Güvenir

In Genetic programming (GP) applications the programs are expressed as parse trees. A node of a parse tree is an element either from the function-set or terminal-set, and an element of a terminal set can be used in a parse tree more than once. However, when we attempt to use the elements in the terminal set at most once, we encounter problems in creating the initial random population and in crossover and mutation operations. 4-Op problem is an example for such a situation. We developed a technique called map-trees to overcome these anomalies. Experimental results on 4-Op using map-trees are presented.


Learning Soil Classification Using CFP

Hakime Gülay Ünsal and H. Altay Güvenir

The paper presents an application of the Classification by Feature Partitioning (CFP) algorithm to the problem of soil classification. CFP is an exemplar based, incremental and supervised learning algorithm. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. Application of the CFP algorithm to soil classification, one of the important problems of soil engineering and civil engineering, is described. The classification helps the soil engineer by giving general guidance. In many soil engineering problems there are no rational expression available for the analysis for the solution.