Bilkent University
Department of Computer Engineering


Locality Aware Distibuted State Partitioning for Stream Processing Systems


M. Yağmur Şahin
MS Student
(Supervisor: Assoc. Prof. Dr. Buğra Gedik)
Computer Engineering Department
Bilkent University

Today, there are many applications that deal with high-volume data streams. These distributed stream processing applications process data on-the-fly and provide real-time distributed computing for big data. Due to the volume of data they process, some of these applications make use of data parallel nodes. The state management for distributed nodes in these applications is an important task to handle, because of different use cases such as: dealing with node failures, checkpointing, data enrichment, and re-partitioning. Therefore, distributed stream processing applications need a state management mechanism. In this thesis, we present a locality-aware state management mechanism for distributed stream processing applications. The proposed mechanism provides a transparent locality-aware data partitioning and state management system for distributed stream processing applications. The mechanism partitions data while preserving locality and handles data transfer among nodes transparently to adapt the potential changes in partitioning scheme. In addition to this, it provides operators with a high-performance state management facility that can tackle check-pointing scenarios. The idea is implemented as a pluggable library for the open-source, distributed stream-processing engine, Apache Storm.


DATE: 26 October 2016, Wednesday @ 11:00