Department of Computer Engineering
Characteristics of Web-Based Textual Communications
Computer Engineering Department
In this thesis, we analyze different aspects of Web-based communications and argue that all such communications share some common properties. In order to provide practical evidence for the validity of these properties and support our claim, we select and two of these common properties and examine them using various types of Web-based textual communications data. The properties we have examined in this work are: All Web-based communications contain features attributable to their author and reciever; and all Web-based communications exhibit similar heavy tailed distributional properties.
In order to prove our claims, we provide three practical, real life research problems and exploit the proposed properties of Web-based communications to solve these problems. In this work, we first provide a feature-based result caching framework for real life search engines. To this end, we mined attributes from user queries in order to classify queries and estimate a quality metric for giving admission and eviction decisions for the query result cache. Second, we analyzed messages of an online chat server in order to predict user and mesage attributes. Our results show that several user- and message-based attributes can be predicted with significant occuracy using both chat message- and writing-style based features of the chat users. Third, we provide a parallel framework for in-memory construction of term partitioned inverted indexes. In this work, in order to minimize the total communication time between processors, we provide a bucketing scheme that is based on term-based distributional properties of Web page contents.
DATE: 26 December, 2012, Wednesday @ 15:40