Technical Reports Published in 1995


Vertical Fragmentation of Superimposed Signature Files Using Partial Evaluation of Queries

Seyit Koçberber and Fazlı Can

Our research extends the bit-sliced signature organization by introducing a partial evaluation approach for queries and dividing the signatures into variable sized vertical fragments. This approach, MFSF, relaxes the optimality condition used in superimposed signature generation. The analysis, which incorporates multi-term query schemes, show that, with no space overhead, MFSF provides a query processing time improvement of more than 70% with respect to the optimal bit-sliced organization using partial query evaluation for both cases. Under the sequentiality assumption of disk blocks, MFSF provides a desirable response time of 1.5 seconds for a database size of one million records. Due to partial evaluation, the desirable response time is guaranteed for queries with several terms.


Logic Program Schemata: Synthesis and Analysis

Pierre Flener

A program schema is a template program with a fixed control and data flow, but without specific indications about the actual parameters or the actual computations. A program schema and the constraints on its place-holders encode a programming methodology, and they are thus an interesting means of guiding many software engineering tasks. First, I propose a notation and terminology for schemata and their manipulations. Second, from a series of divide-and-conquer programs, I synthesize four increasingly powerful divide-and-conquer schemata and their constraints. Third, I outline a vision of schema-guided program synthesis and report on particular synthesis strategies that are guided by various schemata. Finally, I discuss the related work and outline directions for future work.


A Network Access Protocol for Hard Real-Time Communication Systems

Özgür Ulusoy

Distributed hard real-time systems are characterized by communication messages associated with timing constraints, typically in the form of deadlines. A message should be received at the destination before its deadline expires. Carrier sense multiple access with collision detection CSMA/CD appears to be one of the most common communication network access schemes that can be used in distributed hard real-time systems. In this paper, we propose a new real-time network access protocol which is based on the CSMA/CD scheme. The protocol classifies the messages into two classes as `critical' and `noncritical' messages. The messages close to their deadlines are considered to be critical. A critical message is given the right to access the network by preempting a noncritical message in transmission. It is shown that the protocol can provide considerable improvement over the virtual time CSMA/CD protocol proposed for hard real-time communication by Zhao et al.


A Genetic Algorithm for Multicriteria Inventory Classification

H. Altay Güvenir

One of application areas of the genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called continuous uniform crossover, is proposed, such that it produces valid chromosomes given that the parent chromosomes are valid. The new crossover technique is applied to the problem of multicriteria inventory classification. The results are compared with the classical inventory classification technique using Analytical Hierarchy Process.


Exploring States of Sparse, Large Markov Chains

Tuğrul Dayar and William J. Stewart

Dynamic state exploration is an effective technique to explore states of sparse, large Markov chains having highly unbalanced stationary probabilities and to compute related performance measures for them. States of interest are explored starting from a particular state until a predetermined stopping criterion is met. In this paper, the dynamic state exploration idea is extended to the case where more than one starting state is considered simultaneously. This approach easily lends itself to parallel implementation and reduces the number of operations to be carried out for each starting state. Furthermore, it is shown that this approach gives quite an accurate feeling for the nature of the states without having to solve the complete Markov chain. Experiments are carried out on a Markov chain arising from an Asynchronous Transfer Mode (ATM) traffic queue.


Situations and Computation: An Overview of Recent Research

Erkan Tın and Varol Akman

Situation theory has been developed over the last decade and various versions of the theory have been applied to a number of linguistic issues. However, not much work has been done in regard to its computational aspects. In this paper, we review the existing approaches towards `computational situation theory' with considerable emphasis on our own research.

NOTE: This report will also appear in the University of Tuebingen Seminer fuer Sprachwissenschaft's 1995 technical report series.


Modeling Context with Situations

Mehmet Surav and Varol Akman

The issue of context arises in assorted areas of Artificial Intelligence. Although its importance is realized by various researchers, there is not much work towards a useful formalization. In this paper, we will present a preliminary model (based on Situation Theory) and give examples to show the use of context in various fields, and the advantages gained by the acceptance of our proposal.

NOTE: This work will be presented in IJCAI-95, Workshop on "Modeling Context in Knowledge Representation and Reasoning," Montreal, Canada (August 20, 1995).


Classification with Overlapping Feature Intervals (M. Sc. Thesis)

Hakime Ünsal Koç

This thesis presents a new form of exemplar-based learning method, based on overlapping feature intervals. Classification with Overlapping Feature Intervals (COFI) is a particular implementation of this technique. In this incremental, inductive and supervised learning method, the basic unit of the representation is an interval. The COFI algorithm learns the projections of the intervals in each class dimension for each feature. An interval is initially a point on a class dimension, then it can be expanded through generalization. No specialization of intervals is done on class dimensions by this algorithm. Classification in the COFI algorithm is based on a majority voting among the local predictions that are made individually by each feature.


Predicate Invention in Inductive Program Synthesis

Pierre Flener

In Inductive Logic Programming, predicate invention is the process of introducing a hitherto unknown predicate, and its description, into the description of the currently learned predicate. This is only necessary when a finite axiomatization of the current predicate is otherwise impossible, in which case the description of the invented predicate is recursive. So necessary predicate invention is a program synthesis task, and had thus best be done by a synthesizer rather than by a general purpose learner, because one can then re-use the vast body of knowledge about recursive algorithm design. Taking a schema-guided approach, I show how predicate invention can be performed (if not avoided) by intelligent re-use from structured background knowledge, by necessarily successful intelligent dialogue with the user, and by structural or computational generalization of the original problem.


Contexts, Oracles, and Relevance

Mehmet Surav and Varol Akman

We focus on how we should define the relevance of information to a context for information processing agents, such as oracles. We build our formalization of relevance upon works in pragmatics which refer to contextual information without giving any explicit representation of context. We use a formalization of context (due to us) in Situation Theory, and demonstrate its power in this task. We also discuss some computational aspects of this formalization.

NOTE: A revised version of this work will be presented in the AAAI-95 Fall Symposium on "Formalizing Context", MIT, Cambridge, MA (Nov. 1995).


A Constraint-based Case Frame Lexicon Architecture

Kemal Oflazer and Okan Yılmaz

Recent advances in theoretical and implementational aspects of feature and constraint-based formalisms for representing linguistic information have fostered research on the use of such formalisms in the design and implementation of computational lexicons. Case-frame approach has been the representation of choice especially for languages with free constituent order, explicit case marking of noun phrases and embedded clauses filling nominal syntactic roles. The semantics of such syntactic role fillers are usually determined by their lexical semantic and morphosyntactic properties, instead of position in the sentence. In this paper we present our approach to building a constraint-based case frame lexicon for use in natural language processing in Turkish. A number of observations that we have made on Turkish have indicated that we have to go beyond the traditional transitive and intransitive distinction, and utilize a framework where verb valence is considered as the obligatory co-existence of an arbitrary subset of possible arguments along with the obligatory exclusion of certain others, relative to a verb sense. Additional morphosyntactic, lexical and semantic constraints are utilized to map a given syntactic structure to a specific verb sense.

NOTE: To appear in Proceedings of the Workshop on the Computational Lexicon, in European Summer School on Logic, Language and Information, Barcelona, Spain, August 1995.


Adaptive Source Routing And Route Generation For Multicomputers (M. Sc. Thesis)

Yücel Aydoğan

Scalable multicomputers are based upon interconnection networks that typically provide multiple communication routes between any given pair of processor nodes. In such networks, the selection of the routes is an important problem because of its impact on the communication performance. We propose the adaptive source routing (ASR) scheme which combines adaptive routing and source routing into one which has the advantages of both schemes. In ASR, the degree of adaptivity of each packet is determined at the source processor. Every packet can be routed in a fully adaptive or partially adaptive or non-adaptive manner, all within the same network at the same time. The ASR scheme permits any network topology to be used provided that deadlock constraints are satisfied. We evaluate and compare performance of the adaptive source routing and non-adaptive randomized routing by simulations. Also we propose an algorithm to generate adaptive routes for all pairs of processors in any multistage interconnection network. Adaptive routes are stored in a route table in each processor's memory and provide high bandwidth and reliable interprocessor communication. We evaluate the performance of the algorithm on IBM SP2 networks in terms of obtained bandwidth, time to fill in the route tables, and efficiency exploited by the parallel execution of the algorithm.


Generalized Vertical Partitioning of Signature Files

Seyit Koçberber and Fazlı Can

A new signature file method, called Generalized Multi-Fragmented Signature File (GMFSF), is presented. The new method provides a unified framework for other vertical partitioning signature files. The performance of GMFSF is measured with a prototype information retrieval system using a database of 152,850 MARC records. The experimental results agree with the theoretical analysis. The response time of GMFSF decreases with an increasing number of query terms and is independent of the number of hits to the query. These features make the method competitive with inverted files, and the experiments further show that GMFSF performs better than inverted files for the non-zero hit conjunctive queries with more than three terms.


Analysis of Concurrency Control Protocols for Real-Time Database Systems

Özgür Ulusoy

This paper provides an approximate analytic solution method for evaluating the performance of concurrency control protocols developed for real-time database systems (RTDBSs). Transactions processed in a RTDBS are associated with timing constraints typically in the form of deadlines. The primary consideration in developing a RTDBS concurrency control protocol is the fact that satisfaction of the timing constraints of transactions is as important as maintaining the consistency of the underlying database. The proposed solution method provides the evaluation of the performance of concurrency control protocols in terms of the satisfaction rate of timing constraints. As a case study, a RTDBS concurrency control protocol, called High Priority, is analyzed using the proposed method. The accuracy of the performance results obtained is ascertained via simulation. The solution method is also used to investigate the real-time performance benefits of the High Priority over the ordinary Two-Phase Locking.


Error-tolerant Tree Matching

Kemal Oflazer

This paper presents an efficient algorithm for retrieving from a database of trees, all trees that match a given query tree approximately, that is, within a certain error tolerance. It has natural language processing applications in searching for matches in example-based translation systems, and retrieval from lexical databases containing entries of complex feature structures. The algorithm has been implemented on SPARCStations, and for large randomly generated synthetic tree databases (some having tens of thousands of trees) it can associatively search for trees with a small error, in a matter of tenths of a second to few seconds.