SEDA: Search Driven Analysis of Heterogenous XML Data
Dr. Fatma Ozcan
IBM Almaden Research Center,
San Jose, CA, USA

Date and Time: September 30 2009, 14:40
Place: Dept of Computer Engineering, Bilkent University, Room: EA-409

Abstract:

Analytical processing on XML repositories is usually enabled by designing complex data transformations that shred the documents into a common data warehousing schema. This can be very time-consuming and costly, especially if the underlying XML data has a lot of variety in structure, and only a subset of attributes constitutes meaningful dimensions and facts. Today, there is no tool to explore an XML data set, discover interesting attributes, dimensions and facts, and rapidly prototype an OLAP solution. In this talk, we will describe a system, called SEDA that enables users to start with simple keyword-style querying, and interactively refine the query based on result summaries. SEDA then maps query results onto a set of known, or newly created, facts and dimensions, and derives a star schema and computes the corresponding data mart to be analyzed further with an an off-the-shelf OLAP tool.

Bio:

Fatma Ozcan is a research staff member at IBM Almaden Research Center since September 2001. Her current research interests include data analytics, cloud computing, and XML query processing and optimization. Ozcan got her BSc degree in computer engineering from the Middle East Technical University in Ankara, Turkey and her MSc and PhD degrees in computer science from the University of Maryland, College Park. Ozcan is the coauthor of the book titled "Heterogeneous Agents" published by MIT Press, and coauthor of various articles and conference papers.