SEMINAR

DEPARTMENT OF COMPUTER ENGINEERING

ABSTRACT

MINING USER ACCESS PATTERNS AND IDENTITY INFORMATION

FROM WEB LOGS FOR EFFECTIVE PERSONALIZATION

Esra Satıroğlu

M.S. in Computer Engineering

Supervisor:

Prof. Dr. H.Altay Güvenir

Web is a huge source of data in terms of its usage as a result of being visited by millions of people on each day. Its usage data is stored in web server logs which contain a detailed description of every single hit taken by the corresponding web server. Recently, it has been discovered that analysis of this data for understanding the user behavioral patterns may have critical implications. Understanding the behavioral patterns of visitors is especially important for e-commerce companies which try to gain customers and sell products through the web. Interactive sites that recognize their customer and customize themselves accordingly may save lots of money to the companies. Usage Based Personalization is a study on designing such personalized sites. In this thesis, we present a new usage based personalization system. The system we designed and implemented is capable of guessing the web pages that may be requested by the on-line visitors during the rest of their visits. The system shows the subset of these pages with highest scores as recommendations to the visitors as being attached to the original pages. The system has two major modules. The off-line module mines the log files off-line for determining the behavioral patterns of the old visitors of the web site considered. The information obtained by the off-line module is utilized by the on-line module of the system for recognizing new visitors and producing online recommendations. The first criterion for identifying online visitors is the paths followed by them. Path a particular visitor consists of pages retrieved by him throughout his/her visit to the web site. Another criterion considered by the system is the identity information(IP address or domain name) of the Visitors. By using identity information, it is possible to learn old preferences of the visitor himself/herself or visitors from similar domains. We have tested the system on the web site of CS department of Bilkent University. The results of the experiments show the efficiency and applicability of the system.

Keywords: Personalization, Web Usage Mining, User Access Patterns

The Seminar will be on 14 September 2001 at 10:00 in EA502