Metadata Based Modeling of the Web [already completed]

[ Home | Objectives | Members | Description | Publications | Funding ]


Objective(s)

Developing a metadata based data model for Web resources for effective and efficient querying.

Project Members

Faculty
  • Prof. Özgür Ulusoy (Bilkent)
  • Prof. Gultekin Ozsoyoglu (CWRU)
  • Prof. Meral Ozsoyoglu (CWRU)
 Ph.D./M.S. Students
  • İsmail Sengör Altingövde (Ph.D.)
Former Students
  • Selma Ayse Ozel (Ph.D.)
  • Abdullah Al-Hamdani (Ph.D.)
  • Mustafa Kutluturk (M.S.)
Undergraduates
  • Isıl Gursoy, Itır Akel, Fatih Altiparmak (graduated)

Project Description

A recent approach on increasing the quality of Web search is associating metadata to the resources on the Web. To this end, there are various standardization efforts and initiatives, just like Dublin Core, RDF and Semantic Web. Another such framework is topic maps, which represent a data structure involving topics, their associations and their occurrences in Web resources and can be stored in various ways (in a typical DBMS or XML, for instance). 

We propose a “Web information space” model, which is composed of Web-based information resources (HTML/XML documents on the Web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized information about users (captured as user profiles that indicate users’ preferences about experts as well as users’ knowledge about topics).  

 

Modeling Web Resources with Topic Maps

Expert advice, the heart of the Web information space model, is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Topics and metalinks constitute metadata that describe the contents of the underlying HTML/XML Web resources. 

How to obtain metadata?
 
 

The metadata specification process is semiautomated, and it exploits XML DTDs to allow domain-expert guided mapping of DTD elements to topics and metalinks.

<!ELEMENT dblp (article| inproceedings|

                                               proceedings|

                                               book|...)*>

 

<!ENTITY % field "author|editor|

                                  title|booktitle|year|

                                  address|journal">

<!ELEMENT article       (%field;)*>

<!ELEMENT inproceedings (%field;)*>

<!ELEMENT proceedings   (%field;)*>

 

<!ELEMENT author    (#PCDATA)>

<!ELEMENT editor    (#PCDATA)>

<!ELEMENT address   (#PCDATA)>

...

Example DTD

Mapping M

In particular, domain expert specifies a mapping between the entities of our metadata model (topics and metalinks) and the XML DTDs in corresponding domain(s). On the left, such a mapping is shown for the above DTD.
Then, an agent traverses the Web, extracts topics and metalinks for those XML files conformant with the input DTD and stores them into a local object-relational database management system (DBMS), which will than serve as an expert advice (metadata) repository for these visited Web resources.The resulting metadata repository will include the below relations.

<?xml version="1.0"?>

<!DOCTYPE dblp SYSTEM "dblp.dtd">

<dblp>

  <article key="...">

    <title> Access Methods for Text </title>

    <author>Christos Faloutsos</author>

    <pages>...</pages>

    <crossref>...</crossref>

    <year>1985</year>

    <journal>ACM CSUR</journal>

    <url>http://www.informatik.uni-trier.de/

          ~ley/db/journals/csur/Faloutsos85.html

    </url>

  </article>

</dblp>

Example XML document

 

Topics relation

 

ResearchPaperOf metalink relation

Prototype System 

To demonstrate the practicality and usability of the proposed Web information space model, we created a prototype expert advice repository of more than one million topics/metalinks for DBLP (Database and Logic Programming) Bibliography data set.

In the prototype system,  topics are of type research paper, author, journal/conference, etc. Topic names for topics of research paper type are simply names of papers, and so on. Various metalinks are described among topics, for instance topic X -> AuthorOf topic Y metalink instance implies that topic Y (which is a topic of type research paper) is written by X, which is a topic of type author. Another metalink instance RelatedTo holds between two papers, and PreRequisite metalink states that a paper is prerequisite to another.

A screen shot from the prototype system, BilDig (stands for bilkent Digital Library) is shown on the left. BilDig system allows one to search over DBLP data set by querying topic properties (e.g., paper title, author name, etc.) as well as by querying metalinks (e.g., find all papers related to paper X, find all papers prerequisite to paper X, etc.)

Publications

The refereed publications for this research include:

  • Ozel, S. A., Altingovde, I.S., Ulusoy, O., Ozsoyoglu, G., Ozsoyoglu, Z. M.

    Metadata-Based Modeling of Information Resources on the Web. [abstract] [.pdf] [bib]

    Journal of the American Society for Information Science and Technology (JASIST), 55, 2 (January 2004), 97-110.

     

  • I. S. Altingovde, S. A. Ozel, O. Ulusoy, G. Ozsoyoglu, Z. M. Ozsoyoglu.

    Topic-Centric Querying of Web Information Resources. [abstract] [.ps] [bib]

    Proceedings of the Database and Expert Systems Applications (DEXA'01), Lecture Notes in Computer Science (Springer Verlag), vol.2113, (Munich, Germany, September 2001) 699-711. (pdf)

Other publications are:

  • A. Özel.

    Metadata-Based and Personalized Web Querying.

    Ph.D. Thesis, Bilkent Uinversity, Ankara, Turkey, January 2004.

     

  • M. Kutluturk.

    Implementation of a Topic Map Data Model for a Web-Based Information Resource.

    M.S. Thesis, Bilkent University, Ankara, Turkey, August 2002.

     

  • I. S. Altingovde.

    Topic-Centric Querying of Web Resources

    M.S. Thesis, Bilkent University, Ankara, Turkey, September 2001.

     

  • I. S. Altingovde, S. A. Ozel, O. Ulusoy, G. Ozsoyoglu, Z. M. Ozsoyoglu.

    SQL-TC: A Topic-Centric Query Language for Web-Based Information Resources. (BU-CE-0108)

    Technical Report, Bilkent University, 2001.

Talks:

  • Topic-Centric Querying of Web Information Resources, DEXA'01 presentation (.ppt)

Funding

This research is supported by a joint grant from TÜBITAK  (Grant No. 100U024) of Turkey and the National Science Foundation (Grant INT-9912229) of the USA.