Abstract: In this thesis, a novel context-sensitive segmentation and recognition method for connected letters in Ottoman script is proposed. This method first extracts a set of possible segments from a connected script and determines the candidate letters to which extracted segments are most similar. Next, a function is defined for scoring each different syntactically correct sequence of these candidate letters. To find the candidate letter sequence that maximizes the score function, a directed acyclic graph is constructed. The letters are finally recognized by computing the longest path in this graph. Experiments using a collection of printed Ottoman documents reveal that the proposed method provides very high precision and recall figures in terms of character recognition. In a further set of experiments we also demonstrate that the framework can be used as a building block for an information retrieval system for digital Ottoman archives.
Keywords:
Optical character recognition (OCR), segmentation and recognition of
connected scripts, connected scripts, information retrieval (IR).
(PDF copy)
Abstract: It is very important to inform the patients and their relatives about the risk of mortality before a cardiovascular operation. For this respect, a model called EuroSCORE (The European System for Cardiac Operative Risk Evaluation) has been developed by European cardiovascular surgeons. This system gives the risk of mortality during or 30 days after the operation, based on the values of some parameters measured before the operation. The model used by EuroSCORE has been developed by statistical data gathered from large number of operations performed in Europe.
Even though due to the surgical techniques that have been developed recently and the risk of mortality has been reduced in a large extent, predicting that risk as accurately as possible is still primary concern for the patients and their relatives in cardiovascular operations. The risk of operation also essentially tells the surgeon how a patient with similar comorbidity would be expected to fare based on a standard care. The risk of patient is also important for the health insurance companies, both public or private. In the context of this project, a model that can be used for mortality is developed.
In this research project, a database system for storing data about cardiovascular operations performed in Turkish hospitals, a web application for gathering data, and a machine learning system on this database to learn a risk model, similar to EuroSCORE, are developed. This thesis proposes a risk estimation system for predicting the risk of mortality in patients undergoing cardiovascular operations by maximizing the Area under the Receiver Operating Characteristic (ROC) Curve (AUC).
When the genetic characteristics and life styles of Turkish patients are taken into consideration, it is highly probable that the mortality risks of Turkish patients may be dierent than European patients. This thesis also intends to investigate this issue.
Keywords:
Machine learning, ROC, AUC, risk estimation, cardiovascular operation,
data mining.
(PDF copy)
Abstract: Based on the recent advancements in multimedia, communication, and storage technologies, the amount of audio-visual content stored is increased dramatically. The need to organize and access the growing multimedia content led researchers to develop multimedia database management systems. However, each system has its own way of describing the multimedia content that disables interoperability among other systems. To overcome this problem and to be able to standardize the description of audio-visual content stored in those databases, MPEG-7 standard has been developed by MPEG (Moving Picture Experts Group).
In this thesis, a query language and a query processor for an MPEG-7 compliant video database system is proposed. The query processor consists of three main modules: query parsing module, query execution module, and result fusion module. The query parsing module parses the XML based query and divides it into subqueries. Each sub-query is then executed with related query execution module and the final result is obtained by fusing the results of the sub-queries according to user defined weights. The prototype video database system BilVideo v2.0, which is formed as a result of this thesis work, supports spatio-temporal and low level feature queries that contain any weighted combination of keyword, temporal, spatial, trajectory, and low level visual feature (color, shape and texture) queries. Compatibility with MPEG-7, low-level visual query support, and weighted result fusion feature are the major factors that highly differentiate between BilVideo v2.0 and its predecessor, BilVideo.
Keywords:
MPEG-7, XML databases, video databases, multimedia
databases, query processing, content-based retrieval, video query languages.
(PDF copy)
Abstract: In this thesis, we present an automatic building generation method based on procedural modeling approach, and a crowd animation system that simulates a crowd of pedestrians inside a city. While modeling the buildings, to achieve complex and consistent geometries we use shape grammars. The derivation process incorporates randomness so the produced models have the desired variation. The end shapes of the building models could be defined in a certain extent by the derivation rules. The behavior of human crowds inside a city is affected by the simulation scenario. In this thesis, we specifically intend to simulate the virtual crowds in emergency situations caused by an incident, such as a fire, an explosion, or a terrorist attack. We prefer to use a continuum dynamics-based approach to simulate the escaping crowd, which produces more efficient simulations than the agent-based approaches. Only the close proximity of the incident region, which includes the crowd affected by the incident, is simulated. In order to speed up the animation and visualization of the resulting simulation, we employ an offline occlusion culling technique. During runtime, we animate and render a pedestrian model only if it is visible to the user. In the pre-processing stage, the navigable area of the scene is decomposed into a grid of cells and the from-region visibility of these cells is computed with the help of hardware occlusion queries.
Keywords:
Procedural modeling, emergency, crowd simulation, crowd animation,
occlusion culling, from-region visibility
(PDF copy)
Abstract: Reuse and blending of captured motions for creating realistic motions of human body is considered as one of the challenging problems in animation and computer graphics. Locomotion (walking, running and jogging) is one of the most common types of daily human motion. Based on blending of multiple motions, we propose a two-stage approach for generating locomotion according to user-specified parameters, such as linear and angular velocities. Starting from a large dataset of various motions, we construct a motion graph of similar short motion segments. This process includes the selection of motions according to a set of predefined criteria, the correction of errors on foot positioning, pre-adjustments, motion synchronization, and transition partitioning. In the second stage, we generate an animation according to the specified parameters by following a path on the graph during run-time, which can be performed in real-time. Two different blending techniques are used at this step depending on the number of the input motions: blending based on scattered data interpolation and blending based on linear interpolation. Our approach provides an expandable and efficient motion generation system, which can be used for real time applications.
Keywords:
Animation, data scattered interpolation, blending, locomotion
(PDF copy)
Abstract: Mesh saliency identifies the visually important parts of a mesh. Mesh simplification algorithms using mesh saliency as simplification criterion preserve the salient features of a static 3D model. In this thesis, we propose a saliency measure that will be used to simplify animated 3D models. This saliency measure uses the acceleration and deceleration information about a dynamic 3D mesh in addition to the saliency information for static meshes. This provides the preservation of sharp features and visually important cues during animation. Since oscillating motions are also important in determining saliency, we propose a technique to detect oscillating motions and incorporate it into the saliency based animated model simplification algorithm. The proposed technique is experimented on animated models making oscillating motions and promising visual results are obtained.
Keywords:
Simplification, animation, mesh saliency, deformation.
(PDF copy)
Abstract: Two new methods for combined filtering and key-frame reduction of motion capture data are proposed. Filtering of motion capture data is necessary to eliminate any jitter introduced by a motion capture system. Although jitter removal is needed to obtain a more realistic animation, it may result in an over-smoothed motion data if it is not done properly. Key-frame reduction, on the other hand, allows animators to easily edit motion data by representing animation curves with a significantly smaller number of key frames. One of the proposed techniques achieves key frame reduction and jitter removal simultaneously by fitting a Hermite curve to motion capture data using dynamic programming. Another method is to use curve simplification algorithms on the motion capture data until the desired reduction is reached. In this research, the results of these algorithms are evaluated and compared. Both subjective and objective results are presented.
Keywords:
Motion capture, keyframe reduction, curve fitting, curve
simplification, noise filtering.
(PDF copy)
Abstract: This thesis extends two approximative fixed-point iterative methods based on decomposition for closed queueing networks (QNs) with Coxian service distributions and arbitrary buffer sizes from the literature to include phase-type service distributions. It shows how the irreducible Markov chain associated with each subnetwork in the decomposition can be represented hierarchically using Kronecker products. The proposed methods are implemented in a software tool, which is capable of computing the steadyby a multilevel method at each fixedcompared with others, one being the multilevel method for the closed QN itself, for accuracy and efficiency on a number of examples using the tool, and their convergence properties are discussed. Numerical results indicate that there is a niche among the problems considered which is filled by the two approximative fixed point iterative methods.
Keywords:
Closed queueing networks, Phase-type service distributions, Kronecker
representations, Network decomposition, Fixed-point iteration,
Multilevel methods.
(PDF copy)
Abstract: Human hair modeling and rendering have always been a challenging topic in computer graphics. The techniques for human hair modeling consist of explicit geometric models as well as volume density models. Recently, hybrid cluster models have also been successful in this subject. In this study, we present a novel three dimensional texture model called 3D Fuzzy Textures and algorithms to generate them. Then, we use the developed model along with a cluster model to give human hair complex hairstyles such as curly and wavy styles. Our model requires little user effort to model curly and wavy hair styles. With this study, we aim at eliminating the drawbacks of the volume density model and the cluster hair model with 3D fuzzy textures. A three dimensional cylindrical texture mapping function is introduced for mapping purposes. Current generation graphics hardware is utilized in the design of rendering system enabling high performance rendering.
Keywords:
Hair modeling, hair rendering, curly hair, wavy hair, cluster hair
model, volume density model, 3D texture, 3D texture mapping
(PDF copy)
Abstract: 3D human pose reconstruction is a popular research area since it can be used in various applications. Currently most of the methods work for constrained environments, where multi camera views are available and camera calibration is known, or a single camera view is available, which requires intensive user effort. However most of the currently available data do not satisfy these constraints, thus they cannot be processed by these algorithms. In this thesis a framework is proposed to reconstruct 3D pose of a human for animation from a sequence of single view video frames. The framework for pose construction starts with background estimation. Once the image background is estimated, the body silhouette is extracted by using image subtraction for each frame. Then the body silhouettes are automatically labeled by using a model-based approach. Finally, the 3D pose is constructed from the labeled human silhouette by assuming orthographic projection. The proposed approach does not require camera calibration. The proposed framework assumes that the input video has a static background and it has no significant perspective effects and the performer is in upright position.
Keywords:
Motion capture, framework, single camera, uncalibrated camera,
vision-based, animation.
(PDF copy)
Abstract: Recently, the increase in the amount of multimedia data has unleashed the development of storage techniques. Multimedia databases is one of the most popular of these techniques because of its scalability and ability to be queried by the media features. One downside of these databases is the necessity for processing of the media for feature extraction prior to storage and querying. Ever growing pile of media makes this processing harder to be completed manually. This is the case with BilVideo Video Database System, as well. Improvements on computer vision techniques for object detection and tracking have made automation of this tedious manual task possible. In this thesis, we propose a tool for the automatic detection of objects of interest and deriving spatio-temporal relations between them in video frames. The proposed framework covers the scalable architecture for video processing and the stages for cut detection, object detection and tracking. We use color histograms for cut detection. Based on detected shots, the system detects salient objects in the scene, by making use of color regions and camera focus estimation. Then, the detected objects are tracked based on their location, shape and estimated speed.
Keywords:
Video Databases, Video Object Detection, Object Tracking, Camera Focus Estimation.
(PDF copy)
Abstract: Determining crystal structure parameters of a material is a quite important issue in crystallography. Knowing the crystal structure parameters helps to understand physical behavior of material. For complex structures, particularly for materials which also contain local symmetry as well as global symmetry, obtaining crystal parameters can be quite hard. This work provides a tool that will extract crystal parameters such as primitive vectors, basis vectors and space group from atomic coordinates of crystal structures. A visualization tool for examining crystals is also provided. Accordingly, this work presents a useful tool that help crystallographers, chemists and material scientists to analyze crystal structures efficiently.
Keywords:
crystal, crystallography, chemistry, material science, pattern
recognition, primitive vectors, basis vectors, space group, symmetry
(PDF copy)
Abstract: In unstructured peer-to-peer networks, such as Gnutella, peers propagate query messages towards the resource holders by flooding them through the network. This is, however, a costly operation since it consumes node and link resources excessively and most of the time unnecessarily. There is no reason, for example, for a peer to receive a query message if the peer does not own any matching resource or if the peer is not on the path reaching to a peer holding a matching resource.
Semantic routing is a technique that tries to forward the queries only
to those nodes where replies are likely to come from. In this thesis,
we present ``Route Learning", a semantic routing scheme, which aims to
reduce query traffic in unstructured peer-to-peer networks by
utilizing a well-known estimation technique called ``Parzen Windows".
In Route Learning, peers try to find the most likely neighbors through
which replies can be obtained for submitted queries. In this way, a
query is
forwarded only to a subset of the neighbors of a peer, or is dropped
if no neighbor, which is likely to return a reply, is found. The
scheme has also mechanisms to cope with variations in user submitted
queries, like changes in the keywords. This way the scheme can also
evaluate the route for a query for which it is not trained. The
proposed scheme consists of three phases: training, evaluation, and
recursive learning. The last phase enables the scheme to adapt itself
to changes in a dynamic peer-to-peer network. Our simulation results
show that our scheme reduces the bandwidth overhead significantly
without scarifying user satisfaction compared to a pure flooding
based querying approach.
(PDF copy)
Abstract: Traditionally, the authentication protocols for cellular phone networks have been designed for device authentication rather than user authentication, which brings limitations and restrictions on the functionality of the system. In this thesis we propose a user authentication protocol for Global Standards for Mobile (GSM). Our protocol permits the use of weak secrets (e.g. passwords or PINs) for authentication and provides certain flexibilities for GSM users. The simulation results on currently established user authentication protocols and GUAP are presented. The protocol also has a capture resilience extension to disable captured cellular phones securely.
Keywords:
Wireless network security, user authentication, GSM, strong password
protocols, capture resilient.
(PDF copy)
Abstract: Video surveillance has long been in use to monitor security sensitive areas such as banks, department stores, highways, crowded public places and borders. The advance in computing power, availability of large-capacity storage devices and high speed network infrastructure paved the way for cheaper, multi sensor video surveillance systems. Traditionally, the video outputs are processed online by human operators and are usually saved to tapes for later use only after a forensic event. The increase in the number of cameras in ordinary surveillance systems overloaded both the human operators and the storage devices with high volumes of data and made it infeasible to ensure proper monitoring of sensitive areas for long times. In order to filter out redundant information generated by an array of cameras, and increase the response time to forensic events, assisting the human operators with identification of important events in video by the use of "smart" video surveillance systems has become a critical requirement. The making of video surveillance systems "smart" requires fast, reliable and robust algorithms for moving object detection, classification, tracking and activity analysis.
In this thesis, a smart visual surveillance system with real-time moving object detection, classification and tracking capabilities is presented. The system operates on both color and gray scale video imagery from a stationary camera. It can handle object detection in indoor and outdoor environments and under changing illumination conditions. The classification algorithm makes use of the shape of the detected objects and temporal tracking results to successfully categorize objects into pre-defined classes like human, human group and vehicle. The system is also able to detect the natural phenomenon fire in various scenes reliably. The proposed tracking algorithm successfully tracks video objects even in full occlusion cases. In addition to these, some important needs of a robust smart video surveillance system such as removing shadows, detecting sudden illumination changes and distinguishing left/removed objects are met.
Keywords:
Video-Based Smart Surveillance, Moving Object Detection, Background
Subtraction, Object Tracking, Silhouette-Based Object Classification,
Fire Detection.
(PDF copy)
Abstract: In this thesis study, a 3D graphics environment for virtual garment design and simulation is presented. The proposed system enables the three dimensional construction of a garment from its two dimensional cloth panels, for which the underlying structure is a mass-spring model. Construction of the garment is performed through cutting, boundary smoothing , seaming and scaling. Afterwards, it is possible to do fitting on virtual mannequins like in the real life as if in a tailor's workshop. The behavior of cloth under different environmental conditions is implemented applying a physically-based approach. As well as the simulation of the draping of garments, efficient and realistic visualization of garments is an important issue in cloth modelling. There are various material types and reflectance properties for fabrics. We have implemented a number of material and rendering options such as knitwear, woven cloth and standard shading methods such as Gouraud shading. Performance results of the system are presented at the end.
Keywords: Garment design, Garment simulation, Fabric
rendering, Physically-based modeling.
(PDF copy)
Abstract: A legacy information system is an old system that typically has been developed several years ago, and remains in operation within an organization. Since the software requirements change, legacy systems must be evolved accordingly. Various approaches such as wrapping, migration and redevelopment have been proposed to maintain legacy information systems. Unfortunately, these approaches have not explicitly considered the concerns that are dicult to capture in single components, and tend to crosscut many components. Examples of such crosscutting concerns include distribution, synchronization, persistence, security, logging and real-time behavior. The crosscutting property of concerns seriously complicates the maintenance of legacy systems because the code of the system needs to be changed at multiple places, and conventional maintenance techniques fall short to do this effectively.
Aspect-Oriented Software Development (AOSD) provides explicit mechanisms for coping with these crosscutting concerns. However, current AOSD approaches have primarily focused on coping with crosscutting concerns in software systems that are developed from scratch. Hereby, the crosscutting concerns are implemented as aspects at the beginning, hence localized in single modules. In this way the implementation and maintenance of crosscutting concerns can be prepared to a large extent so that the maintenance of these systems will be easier later on. Unfortunately, legacy systems impose harsher requirements, because crosscutting concerns in legacy systems are neither explicitly identifyinged nor have been prepared before.
We provide a systematic process for analyzing the impact of crosscutting concerns on legacy systems. The process, which is called Aspectual Legacy Analysis Process (ALAP), consists of three sub-processes, Feasibility Analysis, Aspectual Analysis and Maintenance Analysis. All the three sub-processes consist of a set of heuristic rules and the corresponding control. Feasibility Analysis, which consists of two phases, describes rules for categorizing legacy systems, in the correspondingrst phase; and describes the rules for evaluating legacy systems with respect to the ability to implement static crosscutting and ability to implement dynamic crosscutting, in the second phase. The rules of the secondrst phase are based on the categories of legacy systems that we have describesned after a thorough study to legacy information systems, and the rules of the second phase are based on our discussion of these categories with respect to crosscutting implementation. Once the legacy system has been categorized and evaluated with respect to crosscutting implementation, the Aspectual Analysis sub-process describes rules for identifying and specifying aspects in legacy systems. Based on the results of the Feasibility Analysis and Aspectual Analysis sub-processes, the Maintenance Analysis describes the rules for the selection of the appropriate legacy maintenance approach.
ALAP has been implemented in the Aspectual Legacy Analysis Tool (ALAT), which implements the rules of the three sub-processes and as such helps to support the legacy maintainer in analyzing the legacy system and identifying the appropriate maintenance approach.
Keywords: Legacy Information Systems, Aspect-Oriented
Software Development, Heuristic Rule Modelling.
(PDF copy)
Abstract: Mobility prediction is one of the most essential issues that need to be explored for mobility management in mobile computing systems. In this thesis, we propose a new algorithm for predicting the next inter-cell movement of a mobile user in a Personal Communication Systems network. In the first phase of our three-phase algorithm, user mobility patterns are mined from the history of mobile user trajectories. In the second phase, mobility rules are extracted from these patterns, and in the last phase, mobility predictions are accomplished by using these rules. The performance of the proposed algorithm is evaluated through simulation as compared to two other prediction methods. The performance results obtained in terms of Precision and Recall indicate that our method can make more accurate predictions than the other methods.
Keywords:
Location prediction, data mining, mobile computing, mobility patterns,
mobility prediction.
(PDF copy)
Abstract: Onine automatic recognition of handwritten text has been an onoing research problem for four decades. It is used in automated postal address and ZIP code and form reading, data acquisition in bank checks, processing of archived institutional records, automatic validation of passports, etc. It has been gaining more interest lately due to the increasing popularity of handeld computers, digital notebooks and advanced cellular phones. Traditionally, human-machine communication has been based on keyboard and pointing devices. Onine handwriting recognition promises to provide a dynamic means of communication with computers through a pen like stylus, not just an ordinary keyboard. This seems to be a more natural way of entering data into computers.
In this thesis, we develop a character recognition system that combines the advantage of both on-line and off-line systems. Using an USB CCD Camera, positions of the pen-tip between frames are detected as they are written on a sheet of regular paper. Then, these positions are used for calculation of directional information. Finally, handwritten character is characterized by a sequence of writing directions between consecutive frames. The directional information of the pen movement points is used for character pre-classification and positional information is used for fine classification. After characters are recognized they are passed to LaTeX code generation subroutine. Supported LaTeX environments are array construction, citation, section, itemization, equation, verbatim and normal text environments. During experiments a recognition rate of 90% was achieved. The main recognition errors were due to the abnormal writing and ambiguity among similar shaped characters.
Keywords:
pattern recognition, character Recognition, on-line recognition
systems, LaTeX
(PDF copy)
Abstract: Articulated figure animation receives particular attention of the computer graphics society. The techniques for animation of articulated figures range from simple interpolation between keyframes methods to motion-capture techniques. One of these techniques, inverse kinematics, which is adopted from robotics, provides the animator the ability to specify a large quantity of motion parameters that results with realistic animations. This study presents an interactive hierarchical motion control system used for the animation of human figure locomotion. We aimed to develop an articulated figure animation system that creates movements using motion control techniques at different levels, like goal-directed motion and walking. Inverse Kinematics using Analytical Methods (IKAN) software, which was developed at the University of Pennsylvania, is utilized for controlling the motion of the articulated body using inverse kinematics.
Keywords:
kinematics, inverse kinematics,
articulated figure, motion control, spline, gait.
(PDF copy)
Abstract: In this thesis study, a framework is proposed and implemented for the realistic rendering of a multi-layered human body model while it is moving. The proposed human body model is composed of three layers: a skeleton layer, a muscle layer, and a skin layer. The skeleton layer, represented by a set of joints and bones, controls the animation of the human body model using inverse kinematics. Muscles are represented by action lines, which are defined by a set of control points. The action line expresses the force produced by a muscle on the bones and on the skin mesh. The skin layer is modeled in a 3D modeler and deformed during animation by binding the skin layer to both the skeleton layer and the muscle layer. The skin is deformed by a two-step algorithm according to the current state of the skeleton and muscle layers. In the first step, the skin is deformed by a variant of the skinning algorithm, which deforms the skin based on the motion of the skeleton. In the second step, the skin is deformed by the underlying muscular layer. Visual results produced by the implementation is also presented. Performance experiments show that it is possible to obtain real-time frame rates for a moderately complex human model containing approximately 33,000 triangles on the skin layer.
Keywords:
Human body modeling and animation, multi-layered modeling, articulated figure, kinematics, inverse
kinematics, action line, skinning, deformation.
(PDF copy)
Abstract: In this study, we use the concepts of the cover coefficient-based clustering methodology (C3M) for on-line new event detection and event clustering. The main idea of the study is to use the seed selection process of the C3M algorithm for the purpose of detecting new events. Since C3M works in a retrospective manner, we modify the algorithm to work in an on-line environment. Furthermore, in order to prevent producing oversized event clusters, and to give equal chance to all documents to be the seed of a new event, we employ the window size concept. Since we desire to control the number of seed documents, we introduce a threshold concept to the event clustering algorithm. We also use the threshold concept, with a little modification, in the on-line event detection. In the experiments we use TDT1 corpus, which is also used in the original topic detection and tracking study. In event clustering, we use both binary and weighted versions of TDT1 corpus. With the binary implementation, we obtain better results. When we compare our on-line event detection results to the results of UMASS approach, we obtain better performance in terms of false alarm rates.
Keywords:
Clustering, on-line event clustering, on-line event detection.
(PDF copy)
Abstract: The idea of subdivision surfaces was first introduced in 1978, and there are many methods proposed till now. A subdivision surface is defined as the limit of repeated recursive refinements. In this thesis, we studied the properties of approximating subdivision surface schemes. We started by modeling a complex surface with splines that typically requires a number of spline patches, which must be smoothly joined, making splines burdensome to use. Unlike traditional spline surfaces, subdivision surfaces are defined algorithmically. Subdivision schemes generalize splines to domains of arbitrary topology. Thus, subdivision functions can be used to model complex surfaces without the need to deal with patches. We studied four well-known schemes Catmull-Clark, Doo-Sabin, Loop and the sqrt(3)-subdivision. The first two of these schemes are quadrilateral and the other two are triangular surface subdivision schemes. Modeling sharp features, such as creases, corners or darts, using subdivision schemes requires some modifications in subdivision procedures and sometimes special tagging in the mesh. We developed the rules of sqrt(3)-subdivision to model such features and compared the results with the extended Loop scheme. We have implemented exact normals of Loop and p3-subdivision since using interpolated normals causes creases and other sharp features to appear smooth.
Keywords:
Computational geometry and object modeling, subdivision surfaces,
Loop, Catmull-Clark, Doo-Sabin, sqrt(3)-subdivision, modeling sharp features.
(PDF copy)
Abstract: A new approach to simplification of volumetric data over an unstructured tetrahedral mesh is presented. The data consist of sample values of a scalar field defined over a spatial domain, which is subdivided with a tetrahedral mesh. Simplification is performed by means of contraction of the tetrahedra and also of the edges. The simplification algorithm can provide a continuum of aproximate models of the given dataset with any desired degree of accuracy. Hence, the simplification method is suitable for multi-resolution modeling.
The novelty of the approach comes from the arbitrariness in the selection of the point to which a tetrahedron or an edge is contracted. Unlike most of the existing methods, the final vertex of the contraction need not be a vertex of the original mesh. The scalar value to be assigned to the final vertex of contraction is determined by an extremely simple method (both conceptually and computationally), which also provides an estimate of the error of simplification.
The proposed method is applied to two volumetric grids to illustrate its effectiveness in simplification of volumetric data.
Keywords:
Volumetric dataset, tetrahedral mesh, tetrahedron contraction,
edge contraction, scalar value assignment.
(PDF copy)
Abstract: Advances in compression techniques, decreasing cost of storage, and high-speed transmission have facilitated the way video is created, stored and distributed. As a consequence, video is now being used in many application areas. The increase in the amount of video data deployed and used in today's applications not only caused video to draw more attention as a multimedia data type, but also led to the requirement of efficient management of video data. Management of video data paved the way for new research areas, such as indexing and retrieval of videos with respect to their spatio-temporal, visual and semantic contents. In this thesis, semantic content of video is studied, where video metadata, activities, actions and objects of interest are considered within the context of video semantic content. A data model is proposed to model video semantic content, which is extracted from video data by a video annotation tool. A video query language is also provided to support semantic queries on video data.
The outcome of this thesis work will be used in a project, which proposes a video database system architecture with spatio-temporal, object-trajectory and object-apperance query facilities so as to build a complete video search system to query video data by its spatio-temporal, visual and semantic features.
Keywords:
Video databases, semantic video modeling, annotation of video data,
semantic querying of video data.
(PDF copy)
Abstract: A very important and challenging problem in computer graphics is modeling and animation of individualized face models. In this thesis, we describe a facial modeling and animation system attempting to address this problem. The system uses muscle-based generic face model and deforms it using deformation techniques to model individualized faces. Two orthogonal photos of the real faces are used for this purpose. Image processing techniques are employed to extract certain features on the photographs, which are then refined manually by the user through the facilities of the user interface of the system. The feature points located on the frontal and side views of a real face are used to deform the generic model. Then, the muscle vectors in the individualized face model are arranged accordingly. Individualized face models produced in this manner are animated using parametric interpolation techniques.
Keywords:
Facial animation, rendering, texture mapping, deformation, feature
extraction.
(PDF copy)
Abstract: The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems, however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented.
Keywords:
Video databases, query optimization, query tree, querying of video data.
(PDF copy)
Abstract: In recent years, very large collections of images and videos have grown rapidly. In parallel with this growth, content-based retrieval and querying the indexed collections are required to access visual information. Two of the main components of the visual information are texture and color. In this thesis, a content-based image retrieval system is presented that computes texture and color similarity among images. The underlying technique is based on the adaptation of a statistical approach to texture analysis. An optimal set of five second-order texture statistics are extracted from the Spatial Grey Level Dependency Matrix of each image, so as to render the feature vector for each image maximally informative, and yet to obtain a low vector dimensionality for efficiency in computation. The method for color analysis is the color histograms, and the information captured within histograms is extracted after a pre-processing phase that performs color transformation, quantization, and filtering. The features thus extracted and stored within feature vectors are later compared with an intersection-based method. The system is also extended for pre-processing images to segment regions with different textural quality, rather than operating globally over the whole image. The system also includes a framework for object-based color and texture querying, which might be useful for reducing the similarity error while comparing rectangular regions as objects. It is shown through experimental results and precision-recall analysis that the content-based retrieval system is effective in terms of retrieval and scalability.
Keywords:
Texture Analysis, Color Histograms, Texture Similarity
Measurement, Content-Based Image Retrieval, Image Databases
(PDF copy)
Abstract: For a long time, classification algorithms have focused on minimizing the quantity of prediction errors by assuming that each possible error has identical consequences. However, in many real-world situations, this assumption is not convenient. For instance, in a medical diagnosis domain, misdiagnosing a sick patient as healthy is much more serious than its opposite. For this reason, there is a great need for new classification methods that can handle asymmetric cost and benefit constraints of classifications. In this thesis, we discuss cost-sensitive classification concepts and propose a new classification algorithm called Benefit Maximization with Feature Intervals (BMFI) that uses the feature projection based knowledge representation. In the framework of BMFI, we introduce five different voting methods that are shown to be effective over different domains. A number of generalization and pruning methodologies based on benefits of classification are implemented and experimented. Empirical evaluation of the methods has shown that BMFI exhibits promising performance results compared to recent wrapper cost-sensitive algorithms, despite the fact that classifier performance is highly dependent on the benefit constraints and class distributions in the domain. In order to evaluate cost-sensitive classification techniques, we describe a new metric, namely benefit accuracy which computes the relative accuracy of the total benefit obtained with respect to the maximum possible benefit achievable in the domain.
Keywords:
Machine learning, classification, cost-sensitivity,
benefit maximization, feature intervals, voting, pruning.
(PDF copy)
Abstract: Access log file is like a box of treasure waiting to be exploited containing valuable information for the web usage mining system. We can convert this information hidden in the access log files into knowledge by analyzing them. Analysis of web server access data can help understand the user behavior and provide information on how to restructure a web site for increased effectiveness, thereby improving the design of this collection of resources. We designed and developed a new system in this thesis to make dynamic recommendation according to the interest of the visitors by recognizing them through the web. The system keeps all user information and uses this information to recognize the other user visiting the web site. After the visitor is recognized, the system checks whether she/he has visited the web site before or not. If the visitor has visited the web site before, it makes recommendation according to his/her past actions. Otherwise, it makes recommendation according to the visitors coming from the parent domain. Here, "parent domain" identifies the domain in which the identity belongs to. For instance, "bilkent.edu.tr" is the parent domain of the "cs.bilkent.edu.tr". The importance of the pages that the visitors are really interested in and the identity information forms the skeleton of the system. The assumption that the amount of time a user spends on page correlates to whether the page should be classified as a navigation or content page for that user. The other criterion, the identity information, is another important point of the thesis. In case of having no recommendation according to the past experiences of the visitor, the identity information is located into appropriate parent domain or class to get other recommendation according to the interests of the visitors coming from its parent domain or class because we assume that the visitors from the same domain will have similar interests. Besides, the system is designed in such a way that it uses the resources of the system efficiently. "Memory Management", "Disk Capacity" and "Time Factor" options have been used in our system in the scope of "Efficient Use of the Resources" concept. We have tested the system on the web site of CS Department of Bilkent University. The results of the experiments have shown the efficiency and applicability of the system.
Keywords:
Access log file, personalization, identity information, recommendation
(PDF copy)
Abstract: In recent years, large quantities of data have been amassed with advances in data acquisition capabilities. Automated detection of useful information is required for vast data obtained from scientific and business domains. Data Mining is the application of efficient algorithmic solutions on a variety of immense data for such knowledge discovery. Frequency mining discovers all frequent patterns in a transaction or relational database and it comprises the core of several data mining algorithms such as association rule mining and sequence mining. Frequency pattern discovery has become a challenge for parallel programming since it is a highly complex operation on huge datasets demanding efficient and scalable algorithms. In this thesis, we propose a new family of parallel frequency mining algorithms. We introduce a novel transaction set partitioning scheme that can be used to divide the frequency mining task in a top-down fashion. The method operates on the graph of frequent patterns with length two (GF2) from which a graph partitioning by vertex separator (GPVS) is mapped to a two-way partitioning on the transaction set. The two parts obtained can be mined independently and therefore can be utilized for concurrency. In order for this property to hold, there is an amount of replication dictated by the separator in GF2 which is minimized by the GPVS algorithm. A k-way partitioning is derived from recursive application of 2-way partitioning scheme which is used in the design of a generic parallel frequency mining algorithm. First we compute GF2 in parallel, succeeding that we designate a k-way partitioning of the database for k processors with a parallel recursive procedure. The database is redistributed such that each processor is assigned one part. Subsequent mining proceeds simultaneously and independently at each processor with a given serial mining algorithm. A complete implementation in which we employ FP-Growth as the sequential algorithm has been achieved. The performance study of the algorithm on a Beowulf system demonstrates favorable performance for synthetic databases. For hard instances of the problem, we have gained approximately twice the speedup of a state-of-the-art algorithm. We also present a correction and optimization to FP-Growth algorithm.
Keywords:
Parallel Data Mining, Frequency Mining
(Postscript copy)
Abstract: The enhancements in multimedia technology have accelerated the development of new techniques for querying and retrieval in video databases. Based on these new techniques, effective user interfaces are designed especially for query specification. Since the Internet is a suitable multiple client environment, most of the video database systems are designed to work over the Internet. In this thesis, a web-based user interface for a video database system is presented. The interface is composed of a hierarchical windows-based structure where logically separable sub-queries are handled in separate windows. For example, spatial and motion queries are handled separately. The separation of the interface aims to facilitate the query specification process. Furthermore, the user is allowed to get partial results for the sub-queries. The retrieval and result presentation process displays the query results in the form of video frame sequences on a separate window with the capability of playing each result separately. As another contribution of this thesis, considering the fact that most of the systems include querying by object features, a novel approach for color and shape querying and retrieval is proposed. The proposed approach is histogram-based and it is shown through the performance experiments that it overcomes the deficiencies of the existing methods. This new approach is supported by an auxiliary object extraction tool in the query-by-feature sub-system of the video database system. The object extraction tool is semi-automatic, hence it can successfully capture salient object regions for most of the images and/or video frames.
Keywords:
Web-based query specification interface, object
extraction, flood fill for extraction, distance histogram, color
histogram, kinematics model for polygon simplification.
(PDF copy)
Abstract: Realistic rendering of human models has an increasing importance in computer graphics. Simulation of muscle bulging and proper deformations of skin at joints, makes human animation more realistic. In this thesis, we describe a layered human animation system in which muscles move with respect to the bones and the skin deforms according to the muscles. Muscles are modelled as ellipsoids and their shapes are deformed with the movements of the bones represented by sticks in the skeleton. The skin is anchored to the muscles andit changes its shape as the muscles bulge. Every muscle may have different user defined bulging ratios according to the joint movements.
Keywords:
Human animation, skin and muscle deformation, 3D modelling,
rendering.
(PDF copy)
Abstract: Animation of articulated figures has always been an interesting subject of computer graphics due to a wide range of applications, like military, ergonomic design etc. An articulated figure is usually modelled as a set of segments linked with joints. Changing the joint angles brings the articulated figure to a new posture. An animator can define the joint angles for a new posture (forward kinematics). However, it is difficult to estimate the exact joint angles needed to place the articulated figure to a predefined position. Instead of this, an animator can specify the desired position for an end-effector, and then an algorithm computes the joint angles needed (inverse kinematics). In this thesis, we present the implementation of an inverse kinematics algorithm using nonlinear optimization methods. This algorithm computes a potential function value between the end-effector and the desired posture of the end-effector called goal. Then, it tries to minimize the value of the function. If the goal cannot be reached due to constraints then an optimum solution is found and applied by the algorithm. The user may assign priority to the joint angles by scaling initial values estimated by the algorithm. In this way, the joint angles change according to the animator?s priority.
Keywords:
Animation, human motion, inverse kinematics,
nonlinear programming, optimization.
(PDF copy)
Abstract: Web is a huge source of data in terms of its usage as a result of being visited by millions of people on each day. Its usage data is stored in web server logs which contain a detailed description of every single hit taken by the corresponding web server. Recently, it has been discovered that analysis of this data for understanding the user behavioral patterns may have critical implications. Understanding the behavioral patterns of visitors is especially important for e-commerce companies which try to gain customers and sell products through the web. Interactive sites that recognize their customer and customize themselves accordingly may save lots of money to the companies. Usage Based Personalization is a study on designing such personalized sites. In this thesis, we present a new usage based personalization system. The system we designed and implemented is capable of guessing the web pages that may be requested by the on-line visitors during the rest of their visits. The system shows the subset of these pages with highest scores as recommendations to the visitors as being attached to the original pages. The system has two major modules. The off-line module mines the log files off-line for determining the behavioral patterns of the previous visitors of the web site considered. The information obtained by the off-line module is utilized by the on-line module of the system for recognizing new visitors and producing on-line recommendations. The first criterion for identifying on-line visitors is the paths followed by them. A path of a particular visitor consists of pages retrieved by him throughout his visit to the web site. Another criterion considered by the system is the identity information (IP address or domain name) of the visitors. By using identity information, it is possible to learn old preferences of the visitor himself or visitors from similar domains. The similarity between domains is determined by the help of the domain name hierarchy which is represented by a hierarchical tree structure. The leaves of the tree representing domain name hierarchy contain the domain names of the machines connecting to the Internet while the inner nodes contain the general domains such as com, edu.tr, etc. In domain name hierarchy, the similarity between two domains increases as the number of common parents of them increases. In the light of these observations, for guessing the navigational behavior of a particular visitor, our system makes use of the common behavioral trends of the visitors whose machines belong to the parent domains of the domain of that visitor. We have tested the system on the web site of CS department of Bilkent University. The results of the experiments show the efficiency and applicability of the system.
Keywords:
Web log mining, access pattern, personalization.
(Postscript copy)
Abstract: As the amount of electronically available information resources increase, the need for information also increases. Today, it is almost impossible for a person to keep track all the information resources and find new events as soon as possible. In this thesis, we present an on-line new event detection and tracking system, which automatically detects new events from multiple news resources and immediately start tracking events as they evolve. Since we implemented the on-line version of event detection approach, the novelty decision about a news story is done before processing the next one. We also present a new threshold, called support threshold, used in detection process to decrease the number of new event alarms that are caused by informative and one-time-only news. The support threshold can be used to tune the weights of news resources. We implemented the tracking phase as an unsupervised learning process, that is, detected events are automatically tracked by training the system using the first news story of an event. Since events evolve over time, an unsupervised adaptation is used to retrain the tracking system in order to increase the tracking system performance. Adaptation is achieved by adding predicted documents to the training process. From the corpus observations, we conclude that one news story can be associated to more than one event. For this reason, the tracking system can relate a news story to more than one event. The on-line new event detection and tracking system has been tested on the Reuters news feed, available on the Internet. The Reuters news feed, that we used, comprises four independent news resources. The news stories are in Turkish.
Keywords:
Event detection, event tracking, information retrieval.
(Postscript copy)
Abstract: Visualization of large geometric environments has always been an important problem of computer graphics. In this thesis, we present a framework for the stereoscopic view-dependent visualization of large scale terrain models. We use a quadtree based multiresolution representation for the terrain data. This structure is queried to obtain the view-dependent approximations of the terrain model at different levels of detail. In order not to loose depth information, which is crucial for the stereoscopic visualization, we make use of a different simplification criterion, namely distance-based angular error threshold. We also present an algorithm for the construction of stereo pairs in order to speed up the view-dependent stereoscopic visualization. The approach we use is the simultaneous generation of the triangles for two stereo images using a single draw-list so that the view frustum culling and vertex activation is done only once for each frame. The cracking problem is solved using the dependency information stored for each vertex. We eliminate the popping artifacts that can occur while switching between different resolutions of the data using morphing. We implemented the proposed algorithms on personal computers and graphics workstations. Performance experiments show that the second eye image can be produced approximately 45 % faster than drawing the two images separately and a smooth stereoscopic visualization can be achieved at interactive frame rates using continuous multi-resolution representation of height fields.
Keywords:
Stereoscopic visualization, terrain height fields, multiresolution
rendering, quadtrees.
(Postscript copy)
Abstract: New technological developments, such as easy access to Internet, optical character readers, high-speed networks and inexpensive massive storage facilities, have resulted in a dramatic increase in the availability of on-line text-newspaper articles, incoming (electronic) mail, technical reports, etc. The enormous growth of on-line information has led to a comparable growth in the need for methods that help users organize such information. Text Categorization may be the remedy of increased need for advanced techniques. Text Categorization is the classification of units of natural language texts with respect to a set of pre-existing categories. Categorization of documents is challenging, as the number of discriminating words can be very large. This thesis presents compilation of a Turkish dataset, called Anadolu Agency Newsgroup in order to study in Text Categorization. Turkish is an agglutinative language in which words contain no direct indication where the morpheme boundaries are, furthermore, morphemes take a shape dependent on the morphological and phonological context. In Turkish, the process of adding one suffix to another can result in a relatively long word, furthermore, a single Turkish word can give rise to a very large number of variants. Due to this complex morphological structure, Turkish requires text processing techniques different than English and similar languages. Therefore, besides converting all words to lower case and removing punctuation marks, some preliminary work is required such as stemming, removal of stop words and formation of a keyword list.
Keywords:
Text categorization, classification, feature projections,
stemming, wild card matching, stopword.
(Postscript copy)
Abstract: Over the past two decades there has been a huge increase in the amount of data being stored in databases and the on-line flow of data by the effects of improvements in Internet. This huge increase brought out the needs for intelligent tools to manage that size of data and its flow. Hierarchical approach is the best way to satisfy these needs and it is so widespread among people dealing with databases and Internet. Usenet newsgroups system is one of the on-line databases that have built-in hierarchical structures. Our point of departure is this hierarchical structure which makes categorization tasks easier and faster. In fact most of the search engines in Internet also exploit inherent hierarchy of Internet. Growing size of data makes most of the traditional categorization algorithms obsolete. Thus we developed a brand-new categorization learning algorithm which constructs an index tree out of Usenet news database and then decides the related newsgroups of a new news by categorizing it over the index tree. In learning phase it has an agglomerative and bottom-up hierarchical approach. In categorization phase it does an overlapping and supervised categorization. K-Nearest Neighbor categorization algorithm is used to compare the complexity measure and accuracy of our algorithm. This comparison does not only mean comparing two different algorithms but also means comparing hierarchical approach vs. flat approach, similarity measure vs. distance measure and importance of accuracy vs. importance of speed. Our algorithm prefers hierarchical approach and similarity measure, and greatly outperforms k-Nearest Neighbor categorization algorithm in speed with minimal loss of accuracy.
Keywords:
Learning, categorization, clustering, hierarchy, Usenet,
newsgroup, top-level, header-line, posting, frequency, norm-scaling,
similarity measure, distance measure, agglomerative, bottom-up,
stemming, stopword, index.
(Postscript copy)
Abstract: In very large distributed database systems, the data is declustered in order to exploit parallelism while processing a query. Declustering refers to allocating the data into multiple disks in such a way that the tuples belonging to a relation are distributed evenly across disks. There are many declustering strategies proposed in the literature, however these strategies are domain specific or have deficiencies. We propose a model that exactly fits the problem and show that iterative improvement schemes can capture detailed per-relation basis declustering objective. We provide a two phase iterative improvement based algorithm and appropriate gain functions for these algorithms. The experimental results show that the proposed algorithm provides a significant performance improvement compared to the state-of-the-art graph-partitioning based declustering strategy.
Keywords:
Distributed Databases, Declustering, Hypergraph Partitioning,
Max-cut Graph Partitioning
(Postscript copy)
Abstract: Thanks to the advances in telecommunications and computers, today mobile computing becomes a significant means in every pace of life. Many people are now carrying portable devices such as laptop computers, Personal Digital Assistants (PDAs), and cellular phones. These mobile computing devices are supported by rapidly expanding telecommunication technology. Cellular communication, wireless LAN and WAN, and satellite services are available for daily life applications, and portable devices make use of these wireless connections to contact with the information providers. Thus, a user does not need to maintain a fixed connection in the network and may enjoy almost unrestricted user mobility.
As the new and various mobile infrastructures emerge, users demand a new class of applications running in this environment. However, the narrow bandwidth of the wireless communication channels, the relatively short active life of the power supplies of mobile units, and the mobility of clients make the problem of data retrieval more difficult than that in wired networks. Therefore, mechanisms to efficiently transmit information to vast numbers of mobile users are of significant interest. Data broadcasting has been considered one of the most promising ways of data dissemination in mobile environments. There are two basic data broadcasting approaches available: push and pull. In push-based broadcasting approach, data is broadcast to mobile users according to users' profiles or subscriptions, whereas in pull-based broadcasting approach, transmission of data is initiated by the explicit request of users.
In this thesis, we have focused on the problem of scheduling data
items to broadcast in a pull-based environment. We have developed an
efficient broadcast scheduling algorithm, and comparing its
performance against several well-known algorithms, we have observed
that our algorithm appears to be one of the best algorithms proposed
so far.
(Postscript copy)
Abstract:
Two new machine learning methods, Regression by Selecting Best Feature
Projections (RSBFP) and Regression by Selecting Best Features (RSBF),
are presented for regression problems. These methods heavily make use
of least squares regression to induce eager, parametric and
context-sensitive models. Famous regression approaches of machine
learning and statistics literature such as DART, MARS, RULE and kNN
can not construct models that are both predictive and having
reasonable training and/or querying time durations. We developed RSBFP
and RSBF to fill the gap in the literature for a regression method
having higher predictive accuracy and faster training and querying
time durations. RSBFP constructs a decision list consisting of simple
linear regression lines belonging to linear features and/or
categorical feature segments. RSBF is the extended version of RSBFP
such that the decision list consists of both simple, belonging to
categorical feature segments, and/or multiple, belonging to linear
features, linear regression lines. A relevancy heuristic has been
developed to determine the features involving in the multiple
regression lines. It is shown that the proposed methods are robust to
irrelevant features, missing feature values and target feature noise,
which make them suitable prediction tools for real-world databases. In
terms of robustness, RSBFP and RSBF give better results when compared
to other famous regression methods.
(PDF copy)
Abstract:
This thesis presents an improved version of a componentwise bounding
algorithm for the steady state probability vector of nearly completely
decomposable Markov chains. The given two-level algorithm uses
aggregation and stochastic comparison with the strong stochastic (st)
order. In order to improve accuracy, it employs reordering of states
and a better componentwise probability bounding algorithm given
st upper- and lower-bounding probability vectors. A thorough analysis
of the algorithm from the point of view of irreducibility is provided.
The bounding algorithm is implemented in sparse storage and its
implementation details are given. Numerical results on an application
of wireless Asynchronous Transfer Mode network show that there are
cases in which the given algorithm proves to be useful in computing
bounds on the performance measures of the system. An improvement in
the algorithm that must be considered to obtain better bounds on
performance measures is also presented at the end.
(Postscript copy)
Abstract:
A new instance-based learning method is presented for regression
problems with high-dimensional data. As an instance-based approach,
the conventional K-Nearest Neighbor (KNN) method has been applied to
both classification and regression problems. Although KNN performs
well for classification tasks, it does not perform similarly for
regression problems. We have developed a new instance-based method,
called Regression by Partitioning Feature Projections (RPFP), to fill
the gap in the literature for a lazy method that achieves a higher
accuracy for regression problems. We also present some additional
properties and even better performance when compared to famous eager
approaches of machine learning and statistics literature such as MARS,
rule-based regression, and regression tree induction systems. The most
important property of RPFP is that it performs much better than all
other eager or lazy approaches on many domains that have missing
values. If we consider databases today, where there are generally
large number of attributes, such sparse domains are very
frequent. RPFP handles such missing values in a very natural way,
since it does not require all the attribute values to be present in
the data set.
(PDF copy)
Abstract:
With the computerization of many business and government
transactions, huge amounts of data have been stored in computers. The
existing database systems do not provide the users with the necessary
tools and functionalities to capture all stored information
easily. Therefore, automatic knowledge discovery techniques have been
developed to capture and use the voluminous information hidden in
large databases. Discovery of association rules is an important class
of data mining, which is the process of extracting interesting and
frequent patterns from the data. Association rules aim to capture
the co-occurrences of items, and have wide applicability in many
areas. Discovering association rules is based on the computation of
large itemsets (set of items that occur frequently in the database)
efficiently, and is a computationally expensive operation in large
databases. Thus, maintenance of them in large dynamic databases is an
important issue. In this thesis, we propose an efficient algorithm,
to update large itemsets by considering the set of previously discovered
itemsets. The main idea is to prune an itemset as soon as it is understood
to be small in the updated database, and to keep the set of candidate
large itemsets as small as possible. The proposed algorithm outperforms
the existing update algorithms in terms of the number of scans over the
databases, and the number of candidate large itemsets generated and
counted. Moreover, it can be applied to other data mining tasks that are
based on large itemset framework easily.
(Postscript copy)
Abstract:
The Turing Test is one of the most disputed topics in Artificial
Intelligence, Philosophy of Mind and Cognitive Science. It has been
proposed 50 years ago, as a method to determine whether machines can
think or not. It embodies important philosophical issues, as well as
computational ones. Moreover, because of its characteristics, it
requires interdisciplinary attention. The Turing Test posits that, to
be granted intelligence, a computer should imitate human conversational
behavior so well that it should be indistinguishable from a real human
being. From this, it follows that conversation is a crucial concept in
its study. Surprisingly, focusing on conversation in relation to the
Turing Test has not been a prevailing approach in previous
research. This thesis first provides a thorough and deep review of the
50 years of the Turing Test. Philosophical arguments, computational
concerns, and repercussions in other disciplines are all
discussed. Furthermore, this thesis studies the Turing Test as a
special kind of conversation. In doing so, the relationship between
existing theories of conversation and human-computer communication is
explored. In particular, Grice's cooperative principle and
conversational maxims are concentrated on. Viewing the Turing Test as
conversation and computers as language users have significant effects
on the way we look at Artificial Intelligence, and on communication in
general.
(Postscript copy)
Abstract:
Recent advances in computer hardware technology and wireless
communication networks have led to the emergence of mobile computing
systems. In a mobile computing environment, a user with a wireless
connection to the information network can access data via submitting
queries to the data server. Since the mobility is the most
distinguishing feature of the mobile computing paradigm, location
becomes an important piece of information for the so called
"location-dependent queries" where the answer to a query depends on
the current location of the user who issued the query. A
location-dependent query submitted by a mobile user can become more
difficult to process when it is submitted as a "continuous query" for
which the answer changes as the user moves. The answer to a
location-dependent continuous query is a set that consists of tuples
indicating that object S is the answer of the query
from time begin to time end. Once the tuples in the answer set are
determined, the next step is to determine when to send these tuples to
the user. The transmission time of the tuples is critical in the sense
that it can affect the communication overhead imposed on the wireless
network and the availability of tuples in case of disconnections. In
this thesis, we propose three tuple transmission approaches that
determine the transmission time of a tuple in the answer set of a
location-dependent continuous query. We also design and implement a
simulation model to compare the performance of the proposed tuple
transmission approaches under different settings of environmental
parameters.
(Postscript copy)
Abstract:
Computing distribution of light in a given environment is an important
problem in computer-aided photo-realistic image generation. Radiosity
method has been proposed to address this problem which requires an
enormous amount of calculation and memory. Hierarchical radiosity
method is a recent approach that reduces these computational
requirements by careful error analysis. It has its idea from the
solution methods of N-body problems. Although hierarchical approach
has greatly reduced the amount of calculations, satisfactory results
still cannot be obtained in terms of processing time. Exploiting
parallelism is a practical way to reduce the computation time
further. In this thesis, we have designed and implemented a parallel
hierarchical radiosity algorithm for distributed memory computers. Due
to its highly irregular computational structure, hierarchical
radiosity algorithms do not yield easily to parallelization on
distributed memory machines. Dynamically changing computational
patterns of the algorithm cause severe load imbalances. Therefore, we
have developed a dynamic load balancing technique for the parallel
hierarchical radiosity calculation.
(Postscript copy)
Abstract:
Task assignment problem deals with assigning tasks to
processors in order to minimize the sum of execution and communication
costs in a distributed system. In this work, we propose a novel task
clustering scheme which considers the differences between the
execution times of tasks to be clustered as well as the communication
costs between them. We use this clustering approach with proper
assignment schemes to implement two-phase assignment algorithms which
can be used to find suboptimal solutions to any task assignment problem.
In addition, we adapt the multilevel scheme used in graph/hypergraph
partitioning to the task assignment. Multilevel assignment algorithms
reduce the size of the original problem by collapsing tasks, find an
initial assignment on the smaller problem, and then projects it
towards the original problem by successively refining the assignment
at each level. We propose several clustering schemes for multilevel
assignment algorithms. The performance of all proposed algorithms are
evaluated through an experim ental study where the assignment
qualities are compared with two up-to-date heuristics. Experimental
results show that our algorithms substantially outperform both of the
existing heuristics.
(Postscript copy)
Abstract:
The learning of recursive logic programs (i.e. the class
of logic programs where at least one clause is recursive) from
incomplete information, such as input/output examples, is a
challenging subfield both of ILP (Inductive Logic Programming) and of
the synthesis (in general) of logic programs from formal
specifications. This is an extremely important class of logic
programs, as the recent work on constructive induction shows that
necessarily invented predicates have recursive programs, and it even
turns out that their induction is much harder than the one of
non-recursive programs. We call this inductive program synthesis. We
introduce a system called DIALOGS-II (Dialogue-based Inductive and
Abductive LOGic Program Synthesizer-II) whose ancestor is DIALOGS. It
is a schema-guided, interactive, and non-incremental synthesizer of
recursive logic programs that takes the initiative and queries a
(possibly naive) specifier for evidence in her/his conceptual
language. It can be used by any learner (including itself) that
detects, or merely conjectures, the necessity of invention of a new
predicate. Moreover, due to its powerful codification of
"recursion-theory" into program schemata and schematic constraints, it
needs very little evidence and is very fast.
(Postscript copy)
Abstract:
Learning is one of the necessary abilities of an intelligent
agent. This thesis proposes several learning algorithms for
multi-concept descriptions in the form of feature intervals,
called Voting Feature Intervals (VFI) algorithms.
These algorithms are non-incremental classification learning
algorithms, and use feature projection based knowledge representation
for the classification knowledge induced from a set of preclassified
examples. The concept description learned is a set of intervals
constructed separately for each feature. Each interval carries
classification information for all classes. The classification of an
unseen instance is based on a voting scheme, where each feature
distributes its vote among all classes. Empirical evaluation of the
VFI algorithms have shown that they are the best performing algorithms
among other previously developed feature projection based methods in
terms of classification accuracy. In order to further improve the
accuracy, genetic algorithms are developed to learn the optimum
feature weights for any given classifier. Also a new crossover
operator, called continuous uniform crossover, to be used in
this weight learning genetic algorithm is proposed and developed
during this thesis. Since the explanation ability of a learning system
is as much important as its accuracy, VFI classifiers are supplemented
with a facility to convey what they have learned in a comprehensible
way to humans.
(PDF copy)
Abstract: In traditional programming methodology, developing a correct and efficient program is divided into two phases: in the first phase, called the synthesis phase, a correct, but maybe inefficient program is constructed, and in the second phase, called the transformation phase, the constructed program is transformed into a more efficient equivalent program. If the synthesis phase is guided by a schema that embodies the algorithm design knowledge abstracting the construction of a particular family of programs, then the transformation phase can also be done in a schema-guided fashion using transformation schemas, which encode the transformation techniques from input program schemas to output program schemas by defining the conditions that have to be verified to have a more efficient equivalent program.
Seven program schemas are proposed, which capture sub-families of
divide-and-conquer programs and the
programs that are constructed using some generalization methods. The
proposed transformation schemas either
automate transformation strategies, such as accumulator introduction
and tupling generalization, which is a
special case of structural generalization, or simulate and extend a
basic theorem in functional programming (the
first duality law of the fold operators) for logic programs. A
prototype transformation system is presented that can
transform programs, using the proposed transformation schemas.
(
Postscript copy)
Abstract:
This thesis presents iterative methods based on splittings (Jacobi,
Gauss--Seidel, Successive Over Relaxation) and their block
versions for Stochastic Automata Networks (SANs).
These methods prove to be better than the power method that has been used
to solve SANs until recently.
Through the help of three examples we show that the time it takes
to solve a system modeled as a SAN is still substantial and it does not
seem to be possible to solve systems with tens of millions of states
on standard desktop workstations with the current state of technology.
However, the SAN methodology enables one to solve much larger models than
those could be solved by explicitly storing the global generator in the core
of a target architecture especially if the generator is reasonably dense.
(Postscript copy)
Abstract:
Preconditioned Krylov subspace methods are state-of-the-art
iterative solvers developed mostly in the last fifteen years that may be
used, among other things, to solve for the stationary distribution of Markov
chains. Assuming Markov chains of interest are irreducible, the problem
amounts to computing a positive solution vector to a homogeneous system
of linear algebraic equations with a singular coefficient matrix under
a normalization constraint. That is, the
(n x 1) unknown
stationary vector x in
Ax=0, ||x||1=1 (1)
is sought. Hence A=I-PT, an n x n singular M-matrix, and P is the one-step stochastic transition probability matrix.
Albeit the recent advances, practicing performance analysts still widely prefer iterative methods based on splittings when they want to compare the performance of newly devised algorithms against existing ones, or when they need candidate solvers to evaluate the performance of a system model at hand. In fact, experimental results with Krylov subspace methods on Markov chains, especially the ill-conditioned nearly completely decomposable (NCD) ones, are few. We believe there is room for research in this area specifically to help us understand the effect of the degree of coupling of NCD Markov chains and their nonzero structure on the convergence characteristics and space requirements of preconditioned Krylov subspace methods.
The work of several researchershave raised important and interesting questions that led to research in another, yet related direction. These questions are the following: "How must one go about partitioning the global coefficient matrix A in equation (1) into blocks if the system is NCD and a two-stage iterative solver (such as block successive overrelaxation---SOR) is to be employed? Are block partitionings dictated by the NCD normal form of P necessarily superior to others? Is it worth investing alternative partitionings? Better yet, for a fixed labeling and partitioning of the states, how does the performance of block SOR (or even that of point SOR) compare to the performance of the iterative aggregation-disaggregation (IAD) algorithm? Finally, is there any merit in using two-stage iterative solvers when preconditioned Krylov subspace methods are available?"
Experimental results show that in most of the test cases two-stage iterative solvers are superior to Krylov subspace methods with the chosen preconditioners, on NCD Markov chains. For two-stage iterative solvers, there are cases in which a straightforward partitioning of the coefficient matrix gives a faster solution than can be obtained using the NCD normal form.
Abstract:
In traditional databases, we deal with static attributes
which change very infrequently over time and their change is handled
with an explicit update operation. In temporal databases, the time of
change of attributes is also important and every update creates a new
version. Attributes, typically change more frequently over time. A
more agitated category of attributes are the so-called dynamic
attributes whose value changes continuously over time, thus making it
impractical to explicitly update them as they change. In this thesis,
we conduct a performance evaluation study of two indexing methods for
dynamic attributes. These are based on the key idea of using a linear
function that describes the way the attribute changes over time and
allows us to predict its value in the future based on any value of it
in the past. The problem is rooted in the context of mobile data
management and draws upon the fields of spatial and temporal indexing.
We contribute various experimental results, mathematical analyses, and
improvement and optimization algorithms. Finally, inspired by a few of
the observed shortcomings of both of the studied techniques, we
propose a novel indexing method which we call the FP-Index which
is shown analytically to have promising prospects and to beat both
methods over most performance parameters.
(Postscript copy)
Abstract:
All natural language processing systems (such as parsers, generators, taggers) need
to have access to a lexicon about the words in the language. This
thesis presents a lexicon architecture for natural language processing
in Turkish. Given a query form consisting of a surface form and other
features acting as restrictions, the lexicon produces feature
structures containing morphosyntactic, syntactic, and semantic
information for all possible interpretations of the surface form
satisfying those restrictions. The lexicon is based on contemporary
approaches like feature-based representation, inheritance, and
unification. It makes use of two information sources: a morphological
processor and a lexical database containing all the open and
closed-class words of Turkish. The system has been implemented in
SICStus Prolog as a standalone module for use in natural language
processing applications.
(
Postscript copy )
Abstract: This thesis presents several learning algorithms for multi-concept descriptions in the form of disjoint feature intervals, called Feature Interval Learning algorithms (FIL). These algorithms are batch supervised inductive learning algorithms, and use feature projections of the training instances for the representation of the classification knowledge induced. These projections can be generalized into disjoint feature intervals. Therefore, the concept description learned is a set of disjoint intervals separately for each feature. The classification of an unseen instance is based on the weighted majority voting among the local predictions of features. In order to handle noisy instances, several extensions are developed by placing weights to intervals rather than features. Empirical evaluation of the FIL algorithms is presented and compared with some other similar classification algorithms. Although the FIL algorithms achieve comparable accuracies with other algorithms, their average running times are much more less than the others.
This thesis also presents a new adaptation of the well-known
k-NN classification algorithm to the feature projections
approach, called k-NNFP for k-Nearest Neighbor on Feature
Projections, based on a majority voting on individual
classifications made by the projections of the training set on each
feature and compares with the k-NN algorithm on some real-world
and artificial datasets.
(PDF copy)
Abstract:The work presented here describes the power of 2D
animation with texture mapping controlled by line drawings. Animation
is specifically intended for facial animation and not restricted by
the human face. We initially have a sequence of facial images which
are taken from a video sequence of the same face and an image of
another face to be animated. The aim is to animate the face image with
the same expressions as those of the given video sequence. To realize
the animation, a set of frames are taken from a video sequence. Key
features of the first frame are rotoscoped and the other frames are
automatically rotoscoped using the first frame. Similarly,the
corresponding features of the image which will be animated are
rotoscoped. The key features of the first frame of the sequence and
the image to be animated are mapped and using cross-synthesis
procedure, other drawings for the given image are produced. Using
these animated line drawings and the original image, the corresponding
frame sequence is produced by image warping. The resulting sequence
has the same expressions as those of the video sequence. This work
encourages the reuse of animated motion by gathering facial motion
sequences into a database. Furthermore, by using motion sequences of a
human face, non-human characters can be animated realistically or
complex characters can be animated by the help of motion sequences of
simpler characters.
(Postscript copy)
Abstract:
Natural Language Generation (NLG) is roughly decomposed into two stages:
text planning, and text generation. In the text planning stage, the
semantic description of the text is produced from the conceptual
inputs. Then, the text generation system transforms this semantic
description into an actual text. This thesis focuses on the design and
implementation of a Turkish text generation system rather than text
planning. To develop a text generator, we need a linguistic theory
that describes the resources of the desired natural language, and also
a software tool that represents and performs these linguistic
resources in a computational environment. In this thesis, in order to
carry out the mentioned requirements, we have used a functional
linguistic theory called Systemic--Functional Grammar (SFG), and the
FUF text generation system as a software tool. The ultimate text
generation system takes the semantic description of the text sentence
by sentence, and then produces a morphological description for each
lexical constituent of the sentence. The morphological descriptions
are worded by a Turkish morphological generator. Because of our
concentration on the text generation, we have not considered the
details of the text planning. Hence, we assume that the semantic
description of the text is produced and lexicalized by an application
(currently given by hand).
(Postscript copy)
Abstract:
This thesis presents a constraint-based morphological disambiguation
approach that is applicable to languages with complex
morphology--specifically agglutinative languages with productive
inflectional and derivational morphological phenomena. For
morphologically complex languages like Turkish, automatic
morphological disambiguation involves selecting for each token
morphological parse(s), with the right set of inflectional and
derivational markers. Our system combines corpus independent
hand-crafted constraint rules, constraint rules that are learned via
unsupervised learning from a training corpus, and additional
statistical information obtained from the corpus to be morphologically
disambiguated. The hand-crafted rules are linguistically motivated
and tuned to improve precision without sacrificing recall. In certain
respects, our approach has been motivated by Brill's recent work,
but with the observation that his transformational approach is not
directly applicable to languages like Turkish. Our approach also uses
a novel approach to unknown word processing by employing a secondary
morphological processor which recovers any relevant inflectional and
derivational information from a lexical item whose root is unknown.
With this approach, well below 1% of the tokens remains as unknown in
the texts we have experimented with. Our results indicate that by
combining these hand-crafted, statistical and learned information
sources, we can attain a recall of 96 to 97% with a corresponding
precision of 93 to 94%, and ambiguity of 1.02 to 1.03 parses per
token.
(Postscript copy)
Abstract:This thesis describes a tactical generator for
Turkish, a free constituent order language, in which the order of the
constituents may change according to the information structure of the
sentences to be generated. In the absence of any information regarding
the information structure of a sentence (i.e., topic, focus,
background, etc.), the constituents of the sentence obey a default
order, but the order is almost freely changeable, depending on the
constraints of the text flow or discourse. We have used a recursively
structured finite state machine for handling the changes in
constituent order, implemented as a right-linear grammar backbone. Our
implementation environment is the GenKit system, developed at Carnegie
Mellon University-Center for Machine Translation. Morphological
realization has been implemented using an external morphological
analysis/generation component which performs concrete morpheme
selection and handles morphographemic processes.
(Postscript copy)
Abstract:
Although much work in the area of Active Database Management Systems
(ADBMSs) has been done, there have been only a few attempts to
evaluate the performance of these systems, and it is not yet clear how
the performance of an active DBMS can be evaluated systematically. In
this thesis, we describe the OBJECTIVE Benchmark for object-oriented
ADBMSs, and present experimental results from its implementation in an
active database system prototype. OBJECTIVE can be used to identify
performance bottlenecks and active functionalities of an ADBMS, and
compare the performance of multiple ADBMSs. The philosophy of
OBJECTIVE is to isolate components providing active functionalities,
and concentrate only on the performance of these components while
attempting to minimize the effects of other factors.
(Postscript copy)
Abstract:
Scalable multicomputers are based upon interconnection
networks that typically provide multiple communication routes between
any given pair of processor nodes. In such networks, the selection of
the routes is an important problem because of its impact on the
communication performance. We propose the adaptive source routing
(ASR) scheme which combines adaptive routing and source routing into
one which has the advantages of both schemes. In ASR, the degree of
adaptivity of each packet is determined at the source processor. Every
packet can be routed in a fully adaptive or partially adaptive or
non-adaptive manner, all within the same network at the same time. The
ASR scheme permits any network topology to be used provided that
deadlock constraints are satisfied. We evaluate and compare
performance of the adaptive source routing and non-adaptive
randomized routing by simulations. Also we propose an algorithm to
generate adaptive routes for all pairs of processors in any multistage
interconnection network. Adaptive routes are stored in a route table
in each processor's memory and provide high bandwidth and reliable
interprocessor communication. We evaluate the performance of the
algorithm on IBM SP2 networks in terms of obtained bandwidth, time to
fill in the route tables, and efficiency exploited by the parallel
execution of the algorithm.
(Postscript copy)
Abstract:
This thesis presents a new form of exemplar-based learning
method, based on overlapping feature intervals. Classification with
Overlapping Feature Intervals (COFI) is a particular implementation
of this technique. In this incremental, inductive and supervised
learning method, the basic unit of the representation is an interval.
The COFI algorithm learns the projections of the intervals in each
class dimension for each feature. An interval is initially a point on
a class dimension, then it can be expanded through generalization.
No specialization of intervals is done on class dimensions by this
algorithm. Classification in the COFI algorithm is based on a majority
voting among the local predictions that are made individually by each
feature.
(Postscript copy)
Abstract: The lexicon has a crucial role in all natural
language processing systems and has special importance in machine
translation systems. This thesis presents the design and
implementation of a verb lexicon and a verb sense disambiguator for
Turkish. The lexicon contains only verbs because verbs encode events
in sentences and play the most important role in natural language
processing systems, especially in parsing (syntactic analyzing) and
machine translation. The verb sense disambiguator uses the
information stored in the verb lexicon that we developed. The main
purpose of this tool is to disambiguate senses of verbs having several
meanings, some of which are idiomatic. We also present a tool
implemented in Lucid Common Lisp under X-Windows for adding,
accessing, modifying, and removing entries of the lexicon, and a
semantic concept ontology containing semantic features of commonly
used Turkish nouns.
(Postscript copy)
Abstract: Mean Field Annealing algorithm which was proposed
for solving combinatorial optimization problems combines the
properties of neural networks and Simulated Annealing. In this thesis,
MFA is formulated for mapping problem in parallel processing and
global routing problem in physical design automation of Field
Programmable Gate Array (FPGAs) A new Mean Field Annealing (MFA)
formulation is proposed for the mapping problem for mesh-connected and
hypercube architectures. The proposed MFA heuristic exploits the
conventional routing scheme used in mesh and hypercube interconnection
topologies to introduce an efficient encoding scheme. An efficient
implementation scheme which decreases the complexity of the proposed
algorithm by asymptotical factors is also developed. Experimental
results also show that the proposed MFA heuristic approaches the speed
performance of the fast Kernighan-Lin heuristic while approaching the
solution quality of the powerful simulated annealing heuristic. Also,
we propose an order-independent global routing algorithm for SRAM type
FPGAs based on Mean Field Annealing. The performance of the proposed
global routing algorithm is evaluated in comparison with LocusRoute
global router on ACM/SIGDA Design Automation benchmarks. Experimental
results indicate that the proposed MFA heuristic performs better than
the LocusRoute.
(Postscript copy)
Abstract:The issue of context arises in assorted areas of
Artificial Intelligence, Mathematical Logic, and Natural Language
Semantics. Although its importance is realized by various researchers,
there is not much work towards a useful formalization. In this
report, we will try to identify the problem, and decide what we need
for an acceptable (formal) account of the notion of context. We will
present a preliminary model (based on Situation Theory) and give
examples to show the use of context in various fields, and the
advantages gained by the acceptance of our proposal.
(Postscript copy)
Abstract: A part-of-speech (POS) tagger is a system that
uses various sources information to assign possibly unique POS to
words. Automatic text tagging is an important component in higher
level analysis of text corpora. Its output can also be used in many
natural language processing applications. In languages like Turkish
or Finnish, with agglutinative morphology, morphological
disambiguation is a very crucial process in tagging as the structures
of many lexical forms are morphologically ambiguous. This thesis
presents a POS tagger for Turkish text based on a full-scale two-level
specification of Turkish morphology. The tagger is augmented with a
multi-word and idiomatic construct recognizer, and most importantly
morphological disambiguator based on local lexical neighborhood
constraints, heuristics and limited amount of statistical
information. The tagger also has additional functionality for
statistics compilation and fine tuning of the morphological analyzer,
such as logging erroneous morphological parses, commonly used roots,
etc. Test results indicate that the tagger can tag about 97% to 99% of
the texts accurately with very minimal user intervention. Furthermore
for sentences morphologically disambiguated with the tagger, an LFG
parser developed for Turkish, on the average, generates 50% less
ambiguous parses and parses almost 2.5 times faster.
(Postscript copy)
Abstract: Situation theory is a mathematical theory of
meaning introduced by Jon Barwise and John Perry. It has evoked great
theoretical and practical interest and motivated the framework of a
few `computational' systems. PROSIT is the pioneering work in this
direction. Unfortunately, there is a lack of real-life applications
on these systems and this study is a preliminary attempt to remedy
this deficiency. Here, we examine how much PROSIT reflects
situation-theoretic concepts and solve a group of epistemic puzzles
using the constructs provided by this programming language.
(Postscript copy)
Abstract: Anaphora resolution is one of the most active
research areas in natural language processing. This study examines
focusing as a tool for the resolution of pronouns which are a kind of
anaphora. Focusing is a discourse phenomenon like anaphora. Candy
Sidner formalized focusing in her 1979 MIT PhD thesis and devised
several algorithms to resolve definite anaphora including
pronouns. She presented her theory in a computational framework but
did not generally implement the algorithms. Her algorithms related to
focusing and pronoun resolution are implemented in this thesis. This
implementation provides a better comprehension of the theory both from
a conceptual and a computational point of view. The resulting program
is tested on different discourse segments, and evaluation and analysis
of the experiments are presented together with the statistical
results.
(Postscript copy)
Abstract: In this study, we present the parallelization of
Mehrotra's predictor-corrector interior point algorithm, which is a
Karmarkar-type optimization method for linear
programming. Computation types needed by the algorithm are identified
and parallel algorithms for each type are presented. The repeated
solution of large symmetric sets of linear equations, which
constitutes the major computational effort in Karmarkar-type
algorithms, is studied in detail. Several forward and backward
solution algorithms are tested, and buffered backward solution
algorithm is developed. Heuristic bin packing algorithms are used to
schedule sparse matrix-vector product and factorization
operations. Algorithms having the best performance results are used to
implement a system to solve linear programs in parallel on
multicomputers. Design considerations and implementation details of
the system are discussed, and some performance results are presented
from a number of real problems.
(Postscript copy)
Abstract: This report presents a new methodology of learning from examples, based on feature partitioning. Classification by Feature Partitioning (CFP) is a particular implementation of this technique, which is an inductive, incremental, and supervised learning method. Learning in CFP is accomplished by storing the objects separately in each feature dimension as disjoint partitions of values. A partition, a basic unit of representation which is initially a point in the feature dimension, is expanded through generalization. The CFP algorithm specializes a partition by subdividing it into two subpartitions. Theoretical (with respect to PAC-model) and empirical evaluation of the CFP is presented and compared with some other similar techniques.
Keywords:
Machine Learning, inductive learning, incremental learning, supervised
learning, feature partitioning.
(Postscript copy)
Abstract: Syntactic parsing is an important step in any
natural language processing system. Augmented Transition Networks
(ATNs) are procedural mechanisms which have been one of the earliest
and most common paradigms for parsing natural language. ATNs have
the generative power of a Turing machine and were first popularized by
Woods in 1970. This thesis presents our efforts in developing an ATN
grammar for a subset of Turkish including simple and complex
sentences. There are five networks in our grammar: the sentence (S)
network, which includes the sentence structures that falls in our
scope, the noun phrase (NP) network, the adverbial phrase (ADVP)
network and finally the clause (CLAUSE) and gerund (GERUND) networks
for handling complex sentences. We present results from parsing a
large number of Turkish sentences.
(Postscript copy)
Abstract: Natural language processing is a research area
which is becoming increasingly popular each day for both academic and
commercial reasons. Syntactic parsing underlies most of the
applications in natural language processing. Although there have been
comprehensive studies of Turkish syntax from a linguistic perspective,
this is one of the first attempts for investigating it extensively
from a computational point of view. In this thesis, a
lexical-functional grammar for Turkish syntax is presented. Our
current work deals with regular Turkish sentences that are
structurally simple or complex.
(
Postscript copy)