Bilkent University
Department of Computer Engineering


Load Shedding Techniques for Data Stream Management Systems


Dr. Nesime Tatbul
Brown University

In recent years, we have witnessed the emergence of a new class of applications that must deal with large volumes of streaming data. Examples include financial data analysis on feeds of stock tickers, sensor-based environmental monitoring, and network traffic monitoring. Traditional database management systems (DBMS) which are very good at managing large volumes of stored data, fall short in serving this new class of applications, which require low-latency processing on live data from push-based sources. Aurora is a data stream management system (DSMS) that has been developed to meet these needs. A DSMS such as Aurora may be subject to higher input rates than its resources can handle. When input rates exceed system capacity, the system will become overloaded and Quality of Service (QoS) at system outputs will fall below acceptable levels. Under these conditions, the system will shed load by selectively dropping tuples, thus degrading the answer, in order to improve the observed latency of the results. In this talk, I will first define the load shedding problem in data stream management systems and provide a general solution framework which handles the overload problem in a light-weight manner, while minimizing the loss in result accuracy. Then I will present additional techniques on top of this framework to handle windowed aggregation queries in a way that preserves the subset result guarantee. Due to the distributed nature of stream-based data sources as well as the need for better scalability and fault tolerance, we have recently extended Aurora into Borealis - a larger-scale system that can operate in distributed environments. In such an environment, the load shedding problem involves simultaneously removing excess load from multiple overloaded server nodes in a coordinated and scalable fashion. In the final part of my talk, I will discuss this distributed load shedding problem, and present several alternative solution approaches to extend our earlier framework in Aurora to the distributed setting of the Borealis system.

Bio:Nesime Tatbul has recently completed her Ph.D. in Computer Science at Brown University. Her research interests are in database systems, with a current focus on stream and sensor data management. She received her B.S. and M.S. degrees in Computer Engineering from the Middle East Technical University in Turkey. During her graduate years at Brown, she also worked as a research intern at the IBM Almaden Research Center, and as a consultant for the U.S. Army Research Institute of Environmental Medicine. In January 2007, she will join ETH Zurich as an assistant professor of Computer Science.


DATE: December 15, 2006, Friday@ 13:40