Technical Reports Published in 2005




BU-CE-0501: PDF

GUAP - A Strong User Authentication Protocol for GSM (M. Sc. Thesis)

Özer Aydemir

Traditionally, the authentication protocols for cellular phone networks have been designed for device authentication rather than user authentication, which brings limitations and restrictions on the functionality of the system. In this thesis we propose a user authentication protocol for Global Standards for Mobile (GSM). Our protocol permits the use of weak secrets (e.g. passwords or PINs) for authentication and provides certain flexibilities for GSM users. The simulation results on currently established user authentication protocols and GUAP are presented. The protocol also has a capture resilience extension to disable captured cellular phones securely.

Keywords: Wireless network security, user authentication, GSM, strong password protocols, capture resilient

BU-CE-0502: PDF

Harbinger Machine Learning Toolkit Manual

Berkant Barla Cambazoğlu and Cevdet Aykanat

This manual is the primary guide to the Harbinger Machine Learning Toolkit (HMLT), which provides implementations for some well-known and frequently used machine learning classifiers. The main concerns in development of HMLT are correctness, effectiveness, transparency, modularity, and re-usability. At the moment, efficiency is not claimed to be a primary concern in any part of the toolkit. This is basically due to the fact that all supported classifier implementations use common representations and data structures, preventing further utilization and employment of some classifier-specific optimizations. However, we believe that the toolkit is quite robust and supposed to successfully execute under most circumstances.

Keywords: Machine learning, toolkit, classification, validation.

BU-CE-0503: PDF

Hypergraph-Partitioning-Based Remapping Models for Image-Space-Parallel Direct Volume Rendering of Unstructured Grids

Berkant Barla Cambazoğlu and Cevdet Aykanat

In this work, image-space-parallel direct volume rendering (DVR) of unstructured grids is investigated for distributed-memory architectures. A hypergraph-partitioning-based model is proposed for the adaptive screen partitioning problem in this context. The proposed model aims to balance the rendering loads of processors while trying to minimize the amount of data replication. In the parallel DVR framework we adopted, each data primitive is statically owned by its home processor, which is responsible from replicating its primitives on other processors. Two appropriate remapping models are proposed by enhancing the above model for use within this framework. These two remapping models aim to minimize the total volume of communication in data replication while balancing the rendering loads of processors. Based on the proposed models, a parallel DVR algorithm is developed. The experiments conducted on a PC cluster show that the proposed remapping models achieve better speedup values compared to the remapping models previously suggested for image-space-parallel DVR.

Keywords: Direct volume rendering, unstructured grids, ray casting, image space parallelization, hypergraph partitioning, screen partitioning, remapping.

BU-CE-0504: PDF

Processing Count Queries over Event Streams at Multiple Time Granularities

Aykut Ünal, Yücel Saygın, Özgür Ulusoy

Management and analysis of streaming data has become crucial with its applications in web, sensor data, network traffic data, and stock market. Data streams consist of mostly numeric data but what is more interesting is the events derived from the numerical data that need to be monitored. The events obtained from streaming data form event streams. Event streams have similar properties to data streams, i.e., they are seen only once in a fixed order as a continuous stream. Events appearing in the event stream have time stamps associated with them in a certain time granularity, such as second, minute, or hour. One type of frequently asked queries over event streams is count queries, i.e., the frequency of an event occurrence over time. Count queries can be answered over event streams easily, however, users may ask queries over different time granularities as well. For example, a broker may ask how many times a stock increased in the same time frame, where the time frames specified could be hour, day, or both. This is crucial especially in the case of event streams where only a window of an event stream is available at a certain time instead of the whole stream. In this paper, we propose a technique for predicting the frequencies of event occurrences in event streams at multiple time granularities. The proposed approximation method efficiently estimates the count of events with a high accuracy in an event stream at any time granularity by examining the distance distributions of event occurrences. The proposed method has been implemented and tested on different real data sets and the results obtained are presented to show its effectiveness.

Keywords: Count queries, data streams, event streams, time granularity, association rules, data mining

BU-CE-0505: PDF

GnuSim: A General Purpose Simulator for Gnutella and Unstructured P2P Networks

Murat Karakaya, İbrahim Körpeoğlu, Özgür Ulusoy

Peer-to-peer networks have attracted a significant amount of interest as a popular and successful alternative to traditional client-server networks for resource sharing and content distribution. Since a P2P network consists of many nodes, thousands or millions, it is hard to evaulate the performance of a P2P network and some related protocols analytically using only mathematical models. Most often we need to do simulations. Therefore it is important to have a simulation tool specifically designed for P2P network and protocol simulation. We developed such a tool, and in this report we present the design and implementation of this simulation tool, which we call GnuSim. The report includes both the simulation model and the code details. This work will be useful for researchers doing simulation study in P2P systems domain.

BU-CE-0506: PDF

A Library for Parallel Sparse Matrix-Vector Multiplies

Bora Uçar and Cevdet Aykanat

We provide parallel matrix-vector multiply routines for 1D and 2Dpartitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the proper usage of compressed sparse row and compressed sparse column formats of the sparse matrices. We give pseudocodes for multiplication routines which benefit from such overlaps.

BU-CE-0507: PDF

Reducing Query Overhead Through Route Learning In Unstructured Peer-to-Peer Networks (M. Sc. Thesis)

Selim Çıracı

We provide parallel matrix-vector multiply routines for 1D and 2Dpartitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the proper usage of compressed sparse row and compressed sparse column formats of the sparse matrices. We give pseudocodes for multiplication routines which benefit from such overlaps.

BU-CE-0508: PDF

An Ontology for Computer-Aided Modeling of Cellular Processes (Ph. D. Thesis)

Emek Demir

Cellular processes form the hardware layer of living organisms. Malfunctions in cellular processes are responsible for most of the currently incurable diseases. Not surprisingly, knowledge about cellular processes are growing at an enormous rate. However, today's molecular biology suffers from lack of a formal representation system for cellular processes. Most of the knowledge is locked in literature, that are not accessible to computational analysis and modeling. Given the complexity of the system we are attacking, the need for a representation system and modeling tools for cellular processes are clear.

In this dissertation, we describe an ontology for modeling processes. Our ontology possesses several unique features, including ability to represent abstractions and multiple levels of detail, cellular compartments and molecular states. Furthermore, it was designed to meet several user and system requirements, including ease of integration, querying, analysis and visualization.

Based on this ontology we also implemented a set of software tools within the PATIKA project. Primary use cases of PATIKA are integration, querying and visualization, and we have obtained satisfactory results proving the feasibility of our ontology.

Compared with existing alternative methods of representing and querying information about cellular processes, PATIKA provides several advantages, including a regular representation system, powerful querying options, an advanced visualization. Moreover PATIKA models can be analyzed by computational methods such as flux analysis or pathway activity inference. Although it has a more steep learning curve compared to existing ad hoc representation systems, we believe that tools like PATIKA will be essential for molecular biology research in the future.

Keywords: Bioinformatics, ontology, pathway.

BU-CE-0509: PDF

Parallel Sparse Matrix-Vector Multiplies and Iterative Solvers (Ph. D. Thesis)

Bora Uçar

Sparse matrix-vector multiply (SpMxV) operations are in the kernel of many scientific computing applications. Therefore, efficient parallelization of SpMxV operations is of prime importance to scientific computing community. Previous works on parallelizing SpMxV operations consider maintaining the load balance among processors and minimizing the total message volume. We show that the total message latency (start-up time) may be more important than the total message volume. We also stress that the maximum message volume and latency handled by a single processor are important communication cost metrics that should be minimized. We propose hypergraph models and hypergraph partitioning methods to minimize these four communication cost metrics in one dimensional and two dimensional partitioning of sparse matrices. Iterative methods used for solving linear systems appear to be the most common context in which SpMxV operations arise. Usually, these iterative methods apply a technique called preconditioning. Approximate inverse preconditioning - which can be applied to a large class of unsymmetric and symmetric matrices - replaces an SpMxV operation by a series of SpMxV operations. That is, a single SpMxV operation is only a piece of a larger computation in the iterative methods that use approximate inverse preconditioning. In these methods, there are interactions in the form of dependencies between the successive SpMxV operations. These interactions necessitate partitioning the matrices simultaneously in order to parallelize a full step of the subject class of iterative methods efficiently. We show that the simultaneous partitioning requirement gives rise to various matrix partitioning models depending on the iterative method used. We list the partitioning models for a number of widely used iterative methods. We propose operations to build a composite hypergraph by combining the previously proposed hypergraph models and show that partitioning the composite hypergraph models addresses the simultaneous matrix partitioning problem. We strove to demonstrate how the proposed partitioning methods - both the one that addresses multiple communication cost metrics and the other that addresses the simultaneous partitioning problem - help in practice. We implemented a library and investigated the performances of the partitioning methods. These practical investigations revealed a problem that we call message ordering problem. The problem asks how to organize the send operations to minimize the completion time of a certain class of parallel programs. We show how to solve the message ordering problem optimally under reasonable assumptions.

Keywords: Sparse matrices, parallel matrix-vector multiplication, iterative methods, preconditioning, approximate inverse preconditioner, hypergraph partitioning.

BU-CE-0510: PDF

A Comprehensive Performance Evaluation of a Novel Framework Against Free Riding in Peer-to-Peer Networks

Murat Karakaya, İbrahim Körpeoğlu and Özgür Ulusoy

Peer-to-peer computing paradigm has attracted a signcant amount of interest as a popular and successful alternative to traditional client-server paradigm for resource sharing and content distribution. However, the existence of high degrees of free riding may be an important threat against the proper operation and high-performance of P2P networks.

In this report, we propose a distributed framework to reduce the degree of free riding and its adverse affects on P2P networks. Our solution primarily focuses on locating free riders and taking counter-actions against them. As part of our solution, we propose an approach in which each peer monitors its neighbors, makes decisions if they are free riders or not, and takes appropriate actions if it decides that one of the neighbors is a free rider. To be able to identify the peers as free riders, we define sample free riding types and describe how they are related with the protocol messages of existing P2P protocols such as Gnutella. As a result, we employ sample formulas to determine if a neighboring peer exhibits any kind of free riding or not. Furthermore, we present example counter action schemes that can be applied to neighbors that are detected as free riders. We then combine the mechanisms to detect free riders and the actions that can be taken against them into a practical framework that can be used in an unstructured P2P network. Unlike other distributed mechanisms against free riding, our framework does not require any permanent identcation of peers or security infrastructures for maintaining a global reputation system. In the work, we also discuss the probable attacks to the proposed framework and their effects on it.

We also performed a sample implementation of the proposed framework as part of a simulation model, and conducted extensive simulation tests. Our simulations results show that by reducing the amount of free riding in a P2P network, we can accomplish an increase in the performance of the P2P networks for contributor peers.

Keywords: Free Riding, Peer-to-Peer Networks, Gnutella, Distributed Computing, Performance Evaluation.

BU-CE-0511: PDF

Pattern Information Extraction From Crystal Structures (M. Sc. Thesis)

Erhan Okuyan

Determining crystal structure parameters of a material is a quite important issue in crystallography. Knowing the crystal structure parameters helps to understand physical behavior of material. For complex structures, particularly for materials which also contain local symmetry as well as global symmetry, obtaining crystal parameters can be quite hard. This work provides a tool that will extract crystal parameters such as primitive vectors, basis vectors and space group from atomic coordinates of crystal structures. A visualization tool for examining crystals is also provided. Accordingly, this work presents a useful tool that help crystallographers, chemists and material scientists to analyze crystal structures efficiently.

Keywords: crystal, crystallography, chemistry, material science, pattern recognition, primitive vectors, basis vectors, space group, symmetry.

BU-CE-0512: PDF

Automatic Detection of Salient Objects for a Video Database System (M. Sc. Thesis)

Tarkan Sevilmiş

Recently, the increase in the amount of multimedia data has unleashed the development of storage techniques. Multimedia databases is one of the most popular of these techniques because of its scalability and ability to be queried by the media features. One downside of these databases is the necessity for processing of the media for feature extraction prior to storage and querying. Ever growing pile of media makes this processing harder to be completed manually. This is the case with BilVideo Video Database System, as well. Improvements on computer vision techniques for object detection and tracking have made automation of this tedious manual task possible. In this thesis, we propose a tool for the automatic detection of objects of interest and deriving spatio-temporal relations between them in video frames. The proposed framework covers the scalable architecture for video processing and the stages for cut detection, object detection and tracking. We use color histograms for cut detection. Based on detected shots, the system detects salient objects in the scene, by making use of color regions and camera focus estimation. Then, the detected objects are tracked based on their location, shape and estimated speed.

Keywords: Video Databases, Video Object Detection, Object Tracking, Camera Focus Estimation.