Research Projects

  • Metadata-based Web Querying [already completed]
  • Objective: Developing a metadata based data model for Web resources for effective and efficient querying.

    Project Group:

    Faculty Ph.D. Students M.S. Students Undergraudates
    Prof. Özgür Ulusoy (Bilkent)

    Prof. Gultekin Ozsoyoglu (CWRU)

    Prof. Meral Ozsoyoglu (CWRU) 

    Selma Ayse Ozel

    İsmail Sengör Altingövde (former MS student)

     

    Mustafa Kutluturk (graduated) Isıl Gursoy, Itır Akel, Fatih Altiparmak (graduated)

    Project Summary:

    A recent approach on increasing the quality of Web search is associating metadata to the resources on the Web. To this end, there are various standardization efforts and initiatives, just like Dublin Core, RDF and Semantic Web. Another such framework is topic maps, which represent a data structure involving topics, their associations and their occurrences in Web resources and can be stored in various ways (in a typical DBMS or XML, for instance).

    In this research, we first outline a data model based on topic maps, to model the Web resources (which may be HTML or XML documents on the Web) and discuss some practical approaches to create such metadata repositories semi automatically. To achieve the latter, we propose a mapping mechanism between entiites of our metadata model (topics and their associations, called as metalinks in our work) and a typical XML DTD. Than, a robot-like agent traverses the Web, extracts topics and associations for those XML files conformant with the input DTD and stores to a local database which will than serve as a metadata repository for the visited XML pages.

    A prototype system is created for DBLP data provided in XML format at DBLP Web Sİte. In the prototype system,  topics are of type research paper, author, journal/conference, etc. Topic names for topics of research paper type are simply names of papers, and so on. Various metalinks are described among topics, for instance topic X -> AuthorOf topic Y metalink instance implies that topic Y (which is a topic of type research paper) is written by X, which is a topic of type author. Another metalink instance RelatedTo holds between two papers, and PreRequisite metalink states that a paper is prerequisite to another. A demo of the prototype system, BilDig (stands for bilkent Digital Library) is available here. BilDid system allows one to search over DBLP data set by querying topic properties (e.g., paper title, author name, etc.) as well as by querying metalinks (e.g., find all papers related to paper X, find all papers prerequisite to paper X, etc.) 

            Publications:

            The refereed publications for this research include:

  • Ozel, S. A., Altingovde, I.S., Ulusoy, O., Ozsoyoglu, G., Ozsoyoglu, Z. M., Metadata-Based Modeling of Information Resources on the Web, Journal of the American Society for Information Science and Technology (JASIST), to appear, 2003.
  • I. S. Altingovde, S. A. Ozel, O. Ulusoy, G. Ozsoyoglu, Z. M. Ozsoyoglu, Topic-Centric Querying of Web Information Resources, Database and Expert Systems Applications (DEXA'01), Munich, Germany, Lecture Notes in Computer Science (Springer Verlag), vol.2113, September 2001.
  • Other publications are:

    Mustafa Kutluturk, Implementation of a Topic Map Data Model for a Web-Based Information Resource,  M.S. Thesis, Bilkent University, Ankara, Turkey, August 2002.
    Altingövde, I. S., Topic-Centric Querying of Web Resources M.S. Thesis, Bilkent University, Ankara, Turkey, September 2001.
    I. S. Altingovde, S. A. Ozel, O. Ulusoy, G. Ozsoyoglu, Z. M. Ozsoyoglu, SQL-TC: A Topic-Centric Query Language for Web-Based Information Resources, Technical Report, Bilkent University, 2001. (BU-CE-0108)

    Talks:

    Some of the senior presentations by project team and undergraduate students employed in the project:

    BilDig Demo 

    Please click here.

  • Native Score Management and Text Support in Databases [on-going]
  • Objectives: (1) Building native score management mechanisms for modifying tuple scores and ranking output tuples in databases, (2) Extending SQL with handling of score generating predicates such as the text-similarity comparison (so called threshold) predicates, UDF predicates, (3) Modifiying relational algebra for score management (Sideway Value Algebra) and (4) Defining physical implementations of newly introduced algebraic operators.

    Project Group:

    Faculty Ph.D. Students M.S. Students
    Prof. Özgür Ulusoy (Bilkent)

    Prof. Gultekin Ozsoyoglu (CWRU)

    Prof. Meral Ozsoyoglu (CWRU) 

    İsmail Sengör Altingövde (former MS student)

    Selma Ayse Ozel

    Abdullah Al-Hamdani (CWRU)

    Li-li (graduated)

     

    Project Summary:

    Publications:

            The refereed publications for this research include:

    • G. Ozsoyoglu,  I. S. Altingovde, Abdullah Al-Hamdani, S. A. Ozel, O. Ulusoy, Z. M. Ozsoyoglu, Querying Web Metadata in a Database: Native Score Management and Text Support in Databases, to be submitted.
    • G. Ozsoyoglu, Abdullah Al-Hamdani, I. S. Altingovde, S. A. Ozel, O. Ulusoy, Z. M. Ozsoyoglu, Sideway Value Algebra for Object-Relational Databases, International Conference on Very Large Databases (VLDB'02), Hong Kong, August 2002 (pdf).

    Other publications are:

    • thesis Li etc.

    Talks:

  • VLDB talk
  • Focused Web Crawling [on-going]

    Objective: Proposing new approaches for focused crawling.

    Summary: 

    Web crawlers (also called as robots) are programs that traverse and download each and every page on the Web, by starting from an initial (seed) set of pages and following the hyperlinks. The Web pages are typically visited in breadth-first or depth-first manner. The downloaded Web pages are than used for other tasks, i.e., for indexing purposes, so that search engines running on top of these indexes can respond to keyword queries quickly and effectively.

         A recent trend in Web searching paradigm is constructing specific Web search engines, e.g., for educational resources, science branches, finance, etc. To this end, an intelligent Web crawler is required, which will avoid visiting Web pages that is irrelevant to the application topic, and try to gather as much pages as possible on its topic. This is called focused-crawling. In this research, we are investigating new techniques for focused crawling.

    Status:

  • Actually, we have implemented a general-purpose Web crawler and 3 different approaches for focused crawling, 2 of which are earlier approaches in the literature and the last one is based on  new proposal of us. Project is already on-going and new blood is always welcome! :)

    Resources: 

    Please click here for relevant papers and research plan.

  • Cluster-based retrieval (CBR) [on-going]
  • Objective: We introduce a simple yet novel data structure for improving CBR efficiency. In particular, we embed cluster membership information into a typical inverted index to improve query processing performance.

    Summary: Project is already on-going and new blood is always welcome! :)

    Publications:

            The refereed publications for this research include:

    Resources:

    Please click here for source codes.