Technical Reports 2001

Technical Reports Published in 2001

BU-CE-0101: PDF

Stereoscopic View-Dependent Visualization of Terrain Height Fields
Uğur Güdükbay and Türker Yılmaz
Visualization of large geometric environments has always been an important problem of computer graphics. In this paper, we present a framework for the stereoscopic view-dependent visualization of large scale terrain models. We use a quadtree based multiresolution representation for the terrain data. This structure is queried to obtain the view-dependent approximations of the terrain model at different levels of detail. In order not to loose depth information, which is crucial for the stereoscopic visualization, we make use of a different simplification criterion, namely distance-based angular error threshold. We also present an algorithm for the construction of stereo pairs in order to speed up the view-dependent stereoscopic visualization. The approach we use is the simultaneous generation of the triangles for two stereo images using a single draw-list so that the view frustum culling and vertex activation is done only once for each frame. The cracking problem is solved using the dependency information stored for each vertex. We eliminate the popping artifacts that can occur while switching between different resolutions of the data using morphing. We implemented the proposed algorithms on personal computers and graphics workstations. Performance experiments show that the second eye image can be produced approximately 45 % faster than drawing the two images separately and a smooth stereoscopic visualization can be achieved at interactive frame rates using continuous multi-resolution representation of height fields.
Keywords: Stereoscopic visualization, terrain height fields, multiresolution rendering, quadtrees.
BU-CE-0102: PDF

Analysis and Presentation of Interesting Rules
Türker Yılmaz and H. Altay Güvenir
Finding interesting rules from a large rule set is one of the most important aims of data mining research. In this paper, we present an algorithm and its application to finding and presenting interesting rules from a large set of previously discovered rules. Our domain of experimentation is the differential diagnosis of erythemato-squamous diseases. The application domain contains records patients with known diagnoses.
Keywords: Data mining, rule interestingness, rule surprisingness
BU-CE-0103: PDF

Categorization in a Hierarchically Structured Text Database (M. Sc. Thesis)
Ferhat Kutlu
Over the past two decades there has been a huge increase in the amount of data being stored in databases and the on-line flow of data by the effects of improvements in Internet. This huge increase brought out the needs for intelligent tools to manage that size of data and its flow. Hierarchical approach is the best way to satisfy these needs and it is so widespread among people dealing with databases and Internet. Usenet newsgroups system is one of the on-line databases that have built-in hierarchical structures. Our point of departure is this hierarchical structure which makes categorization tasks easier and faster. In fact most of the search engines in Internet also exploit inherent hierarchy of Internet. Growing size of data makes most of the traditional categorization algorithms obsolete. Thus we developed a brand-new categorization learning algorithm which constructs an index tree out of Usenet news database and then decides the related newsgroups of a new news by categorizing it over the index tree. In learning phase it has an agglomerative and bottom-up hierarchical approach. In categorization phase it does an overlapping and supervised categorization. K-Nearest Neighbor categorization algorithm is used to compare the complexity measure and accuracy of our algorithm. This comparison does not only mean comparing two different algorithms but also means comparing hierarchical approach vs. flat approach, similarity measure vs. distance measure and importance of accuracy vs. importance of speed. Our algorithm prefers hierarchical approach and similarity measure, and greatly outperforms k-Nearest Neighbor categorization algorithm in speed with minimal loss of accuracy.
Keywords: Learning, categorization, clustering, hierarchy, Usenet, newsgroup, top-level, header-line, posting, frequency, norm-scaling, similarity measure, distance measure, agglomerative, bottom-up, stemming, stopword, index.
BU-CE-0104: PDF

Application of K-NN and FPTC Based Text Categorization Algorithms to Turkish News Reports (M. Sc. Thesis)
Ufuk İlhan
New technological developments, such as easy access to Internet, optical character readers, high-speed networks and inexpensive massive storage facilities, have resulted in a dramatic increase in the availability of on-line text-newspaper articles, incoming (electronic) mail, technical reports, etc. The enormous growth of on-line information has led to a comparable growth in the need for methods that help users organize such information. Text Categorization may be the remedy of increased need for advanced techniques. Text Categorization is the classification of units of natural language texts with respect to a set of pre-existing categories. Categorization of documents is challenging, as the number of discriminating words can be very large. This thesis presents compilation of a Turkish dataset, called Anadolu Agency Newsgroup in order to study in Text Categorization. Turkish is an agglutinative language in which words contain no direct indication where the morpheme boundaries are, furthermore, morphemes take a shape dependent on the morphological and phonological context. In Turkish, the process of adding one suffix to another can result in a relatively long word, furthermore, a single Turkish word can give rise to a very large number of variants. Due to this complex morphological structure, Turkish requires text processing techniques different than English and similar languages. Therefore, besides converting all words to lower case and removing punctuation marks, some preliminary work is required such as stemming, removal of stop words and formation of a keyword list.
Keywords: Text categorization, classification, feature projections, stemming, wild card matching, stopword.
BU-CE-0105: PDF

Quasi-Birth-and-Death Processes with Level-Geometric Distribution
Tuğrul Dayar and Franck Quessette
A special class of homogeneous continuous-time quasi-birth-and-death (QBD) Markov Chains (MCs) which possess level-geometric (LG) stationary distribution are considered. Assuming that the stationary vector is partitioned by levels into subvectors, in an LG distribution all stationary subvectors beyond a finite level number are multiples of each other. Specifically, each pair of stationary subvectors that belong to consecutive levels are related by the same scalar, hence the term level-geometric. Necessary and sufficient conditions are specified for the existence of such a distribution and the results are elaborated on three examples.
Keywords: Markov chains, QBD processes, geometric distributions
BU-CE-0106: PDF

An Eager Regression Method Based On Best Feature Projections
Tolga Aydın and H. Altay Güvenir
This paper describes a machine learning method, called Regression by Selecting Best Feature Projections (RSBFP). In the training phase, RSBFP projects the training data on each feature dimension and aims to find the predictive power of each feature attribute by constructing simple linear regression lines, one per each continuous feature and number of categories per each categorical feature. Because, although the predictive power of a continuous feature is constant, it varies for each distinct value of categorical features. Then the simple linear regression lines are sorted according to their predictive power. In the querying phase of learning, the best linear regression line and thus the best feature projection are selected to make predictions.
Keywords: Prediction, feature projection, regression
BU-CE-0107: PDF

Detection of Abnormal ECG Recordings using Feature Intervals
H. Altay Güvenir
A new classification algorithm, called CFI (for Classification on Feature Intervals), is developed and applied to problem of detecting abnormal ECG signals. The domain contains records of patients with known diagnosis. Given a training set of such records the CFI algorithm learns how to detect arrhythmia. CFI represents a concept in the form of feature intervals on each feature dimension separately. Classification in the CFI algorithm is based on a real-valued voting. A genetic algorithm is used to select the set of relevant features. The CFI algorithm can decline to make a prediction its confidence level is low. The performance of the CFI classifier is evaluated empirically in terms of classification accuracy and running time.
Keywords: ECG, arrhythmia detection, feature intervals, feature selection, classification, machine learning
BU-CE-0108: PDF

SQL-TC: A Topic-Centric Query Language for Web-Based Information Resources
İsmail Sengör Altıngövde, Selma Ayşe Özel, Özgür Ulusoy, Gültekin Özsoyoğlu, Zehra Meral Özsoyoğlu
This report deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a "web information space" metadata model for web information resources, and a query language SQL-TC (Topic-Centric SQL) to query the model. The web information space model is composed of web-based information resources (XML or HTML documents on the web), expert advice repositories (domain-expert-specified metadata for information resources as XTM documents), and personalized information about users (captured as user profiles, that indicate users' preferences as to which expert advice they would like to follow, and which to ignore, etc., as XML documents). Expert advice is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Experts attach importance values and domains to topics and metalinks that they specify. And, users declare to accept, reject or "don-t-care" about the choices that experts make. The query language SQL-TC makes use of the metadata information provided in expert advice repositories and embedded in information resources, and employs user preferences to further refine the query output. Query output objects/tuples are ranked with respect to the (expert-judged and user-preference-revised) importance values of requested topics/metalinks, and the query output is limited by either top n-ranked objects/tuples, or objects/tuples with importance values above a given threshold, or both.Therefore, the query output of SQL-TC is expected to produce highly relevant and semantically related responses to user queries within short amounts of time.
BU-CE-0109: PDF

Fast Stereoscopic View-Dependent Visualization of Terrain Height Fields (M. Sc. Thesis)
Türker Yılmaz
Visualization of large geometric environments has always been an important problem of computer graphics. In this thesis, we present a framework for the stereoscopic view-dependent visualization of large scale terrain models. We use a quadtree based multiresolution representation for the terrain data. This structure is queried to obtain the view-dependent approximations of the terrain model at different levels of detail. In order not to loose depth information, which is crucial for the stereoscopic visualization, we make use of a different simplification criterion, namely distance-based angular error threshold. We also present an algorithm for the construction of stereo pairs in order to speed up the view-dependent stereoscopic visualization. The approach we use is the simultaneous generation of the triangles for two stereo images using a single draw-list so that the view frustum culling and vertex activation is done only once for each frame. The cracking problem is solved using the dependency information stored for each vertex. We eliminate the popping artifacts that can occur while switching between different resolutions of the data using morphing. We implemented the proposed algorithms on personal computers and graphics workstations. Performance experiments show that the second eye image can be produced approximately 45% faster than drawing the two images separately and a smooth stereoscopic visualization can be achieved at interactive frame rates using continuous multi-resolution representation of height fields.
Keywords: Stereoscopic visualization, terrain height fields, multiresolution rendering, quadtrees.
BU-CE-0110: PDF

On-Line New Event Detection and Tracking in a Multi-Resource Environment (M. Sc. Thesis)
Hakan Kurt
As the amount of electronically available information resources increase, the need for information also increases. Today, it is almost impossible for a person to keep track all the information resources and find new events as soon as possible. In this thesis, we present an on-line new event detection and tracking system, which automatically detects new events from multiple news resources and immediately start tracking events as they evolve. Since we implemented the on-line version of event detection approach, the novelty decision about a news story is done before processing the next one. We also present a new threshold, called support threshold, used in detection process to decrease the number of new event alarms that are caused by informative and one-time-only news. The support threshold can be used to tune the weights of news resources. We implemented the tracking phase as an unsupervised learning process, that is, detected events are automatically tracked by training the system using the first news story of an event. Since events evolve over time, an unsupervised adaptation is used to retrain the tracking system in order to increase the tracking system performance. Adaptation is achieved by adding predicted documents to the training process. From the corpus observations, we conclude that one news story can be associated to more than one event. For this reason, the tracking system can relate a news story to more than one event. The on-line new event detection and tracking system has been tested on the Reuters news feed, available on the Internet. The Reuters news feed, that we used, comprises four independent news resources. The news stories are in Turkish.
Keywords: Event detection, event tracking, information retrieval.
BU-CE-0111: PDF

A Rule-Based Video Database System Architecture
Mehmet Emin Dönderler, Özgür Ulusoy and Uğur Güdükbay
We propose a novel architecture for a video database system incorporating both spatio-temporal and semantic (annotation, event/activity and category-based) query facilities. The originality of our approach lies in the fact that we intend to provide full support for spatio-temporal, relative object-motion and similarity-based object-trajectory queries by a rule-based system utilizing a knowledge-base while using an object-relational database to answer semantic-based queries. Our method of extracting and modeling spatio-temporal relations is also a unique one such that we segment video clips into shots using spatial relationships between objects in video frames rather than applying a traditional scene detection algorithm. The technique we use is simple, yet novel and powerful in terms of effectiveness and user query satisfaction: video clips are segmented into shots whenever the current set of relations between objects changes and the video frames, where these changes occur, are chosen as key frames. The directional, topological and third-dimension relations used for shots are those of the key frames selected to represent the shots and this information is kept, along with frame numbers of the key frames, in a knowledge-base as Prolog facts. The system has a comprehensive set of inference rules to reduce the number of facts stored in the knowledge-base because a considerable number of facts, which otherwise would have to be stored explicitly, can be derived by rules with some extra effort.
Keywords: Content-Based Retrieval, spatio-temporal relations, video databases, knowledge representation, rule-based video modeling, inference rules, information systems.
BU-CE-0112: PDF

Mining User Access Patterns and Identity Information from Web Logs for Effective Personalization (M. Sc. Thesis)
Esra Şatıroğlu
Web is a huge source of data in terms of its usage as a result of being visited by millions of people on each day. Its usage data is stored in web server logs which contain a detailed description of every single hit taken by the corresponding web server. Recently, it has been discovered that analysis of this data for understanding the user behavioral patterns may have critical implications. Understanding the behavioral patterns of visitors is especially important for e-commerce companies which try to gain customers and sell products through the web. Interactive sites that recognize their customer and customize themselves accordingly may save lots of money to the companies. Usage Based Personalization is a study on designing such personalized sites. In this thesis, we present a new usage based personalization system. The system we designed and implemented is capable of guessing the web pages that may be requested by the on-line visitors during the rest of their visits. The system shows the subset of these pages with highest scores as recommendations to the visitors as being attached to the original pages. The system has two major modules. The off-line module mines the log files off-line for determining the behavioral patterns of the previous visitors of the web site considered. The information obtained by the off-line module is utilized by the on-line module of the system for recognizing new visitors and producing on-line recommendations. The first criterion for identifying on-line visitors is the paths followed by them. A path of a particular visitor consists of pages retrieved by him throughout his visit to the web site. Another criterion considered by the system is the identity information (IP address or domain name) of the visitors. By using identity information, it is possible to learn old preferences of the visitor himself or visitors from similar domains. The similarity between domains is determined by the help of the domain name hierarchy which is represented by a hierarchical tree structure. The leaves of the tree representing domain name hierarchy contain the domain names of the machines connecting to the Internet while the inner nodes contain the general domains such as com, edu.tr, etc. In domain name hierarchy, the similarity between two domains increases as the number of common parents of them increases. In the light of these observations, for guessing the navigational behavior of a particular visitor, our system makes use of the common behavioral trends of the visitors whose machines belong to the parent domains of the domain of that visitor. We have tested the system on the web site of CS department of Bilkent University. The results of the experiments show the efficiency and applicability of the system.
Keywords: Web log mining, access pattern, personalization.
BU-CE-0113: PDF

Active and Mobile Data Management Through Event History Mining (Ph. D. Thesis)
Yücel Saygın
An event history is a collection of events that have occurred in an event-based system over a period of time. There can be various types of events, among which are the temperature changes and power demands in a power management system, client requests for data items in a broadcast system, price increase of a stock in stock market, and so on. There is a lot of interesting information that can be extracted from an event history via data mining techniques. Our purpose in this thesis is to propose methods for extracting this useful information in the form of event sequences and event associations from a single or two correlated event histories. We also aim to show how the results of the mining process can be used for active and mobile data management. The results of the mining process demonstrates the relationships among the events which are generally captured as associations or sequences. The relationships among the events are shown to be a useful tool to enhance an event-based system via event organization, predictive event detection, and proactive rule execution.
We consider the mining of both a single event history and two correlated event histories. We first propose a method for mining binary relationships from a single event history. The binary relationships among events are used to organize the events into related groups of event. This organization is important because the number of events in an event-driven system may become very high and unmanageable. The groups of related events and the relationships among the events are exploited for predictive event detection and proactive rule execution in active database systems. We also consider the mining of two correlated event histories which are disjoint and the events in one history are related to the events in the other history. We describe how we can efficiently extract associations among the events spanning different event histories, which we call cross associations.
We have chosen data broadcast in mobile computing environments as a case study for active data management. One of the important facts in mobile computing environments with wireless communication medium is that the server-to-client (downlink) communication bandwidth is much higher than the client-to-server (uplink) communication bandwidth. This asymmetry makes the dissemination of data to client machines a desirable approach. However, the dissemination of data by broadcasting may induce high access latency in case the number of broadcast data items is large. Our purpose is to show how active data management can be used to improve mobile data management through broadcast data organization and prefetching from the broadcast medium. In order to achieve this, the client requests of data items at the server are considered as events and the chronological sequence of items that have been requested by clients is considered as an event history. We call the event history in broadcast medium a broadcast history. The first step in our work is to analyze the broadcast history to discover sequential patterns describing the client access behavior. The sequential patterns are used to organize the data items in the broadcast disk in such a way that the items requested subsequently are placed close to each other. Then we utilize the predictive event detection techniques to improve the cache hit ratio to be able to decrease the access latency. Prediction of future client access behavior enables clients to prefetch the data from the broadcast disk based on the rules extracted from the broadcast history.
BU-CE-0114: PDF

Animation of Human Motion with Inverse Kinematics using Nonlinear Programming (M. Sc. Thesis)
A. Sezgin Abalı
Animation of articulated figures has always been an interesting subject of computer graphics due to a wide range of applications, like military, ergonomic design, etc. An articulated figure is usually modelled as a set of segments linked with joints. Changing the joint angles brings the articulated figure to a new posture. An animator can define the joint angles for a new posture (forward kinematics). However, it is difficult to estimate the exact joint angles needed to place the articulated figure to a predefined position. Instead of this, an animator can specify the desired position for an end-effector, and then an algorithm computes the joint angles needed (inverse kinematics). In this thesis, we present the implementation of an inverse kinematics algorithm using nonlinear optimization methods. This algorithm computes a potential function value between the end-effector and the desired posture of the end-effector called goal. Then, it tries to minimize the value of the function. If the goal cannot be reached due to constraints then an optimum solution is found and applied by the algorithm. The user may assign priority to the joint angles by scaling initial values estimated by the algorithm. In this way, the joint angles change according to the animator?s priority.
Keywords: animation, human motion, inverse kinematics, nonlinear programming, optimization
BU-CE-0115: PDF

Realistic Rendering of Human Models (M. Sc. Thesis)
Gültekin Arabacı
Realistic rendering of human models has an increasing importance in computer graphics. Simulation of muscle bulging and proper deformations of skin at joints, makes human animation more realistic. In this thesis, we describe a layered human animation system in which muscles move with respect to the bones and the skin deforms according to the muscles. Muscles are modelled as ellipsoids and their shapes are deformed with the movements of the bones represented by sticks in the skeleton. The skin is anchored to the muscles and it changes its shape as the muscles bulge. Every muscle may have different user defined bulging ratios according to the joint movements.
Keywords: Human animation, skin and muscle deformation, 3D modelling, rendering.
BU-CE-0116: PDF

Web-based User Interface for Query Specification in a Video Database System (M. Sc. Thesis)
Ediz Şaykol
The enhancements in multimedia technology have accelerated the development of new techniques for querying and retrieval in video databases. Based on these new techniques, effective user interfaces are designed especially for query specification. Since the Internet is a suitable multiple client environment, most of the video database systems are designed to work over the Internet. In this thesis, a web-based user interface for a video database system is presented. The interface is composed of a hierarchical windows-based structure where logically separable sub-queries are handled in separate windows. For example, spatial and motion queries are handled separately. The separation of the interface aims to facilitate the query specification process. Furthermore, the user is allowed to get partial results for the sub-queries. The retrieval and result presentation process displays the query results in the form of video frame sequences on a separate window with the capability of playing each result separately. As another contribution of this thesis, considering the fact that most of the systems include querying by object features, a novel approach for color and shape querying and retrieval is proposed. The proposed approach is histogram-based and it is shown through the performance experiments that it overcomes the deficiencies of the existing methods. This new approach is supported by an auxiliary object extraction tool in the query-by-feature sub-system of the video database system. The object extraction tool is semi-automatic, hence it can successfully capture salient object regions for most of the images and/or video frames.
Keywords: Web-based query specification interface, object extraction, flood fill for extraction, distance histogram, color histogram, kinematics model for polygon simplification.