Elastic and Transparent Scaling for Stream Processing Applications
In this project, we are looking into automatically parallelizing stream processing applications. The goal is to take a sequential version of a stream program and produce a functionally equivalent version that is distributed and parallel. We then provide runtime mechanisms to fine tune the parallelization.
Auto Pipelining
In this project, we aim at locating pipeline parallelization opportunities in a data flow graph and performing runtime profiling and adaptation to exploit these opprtunities to achieve better throughput. An important challenge is to find a good setting among a combinatorially large number of choices.
Auto Data-parallelization
In this project, we aim at locating data parallelization opportunities in a data flow graph and perform fission to exploit these opportunities. An important aspect of this work is to ensure safety in the presence of selective and stateful operators, which require special runtime mechanisms.
Elastic Data-parallelization
In this project, we extend our work on auto data-parallelization with the aim of enabling runtime adaptation to changes in workload and resource availability. One particular challenge is to come up with an effective control algorithm. Another challenge is to manage partial state migration in the presence of stateful operators.
Multi-segment Elasticity
In this project, we look at the challenging problem of managing multiple parallel segments in a distributed system to optimize the throughput of auto data-parallelized applications.
Publications
- Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung Wu. “Auto-Parallelizing Stateful Distributed Streaming Applications”, International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
- Yuzhe Tang and Buğra Gedik. “Auto-pipelining for Data Stream Processing”. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2013.
- Under submission: Buğra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung. “Elastic Auto-parallelization for Stream Processing Applications”.
- Under submission: Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung. “Safe Parallelism for General Streaming”.
Collaborators
- Scott Schneider, IBM T. J. Watson Research Center
- Martin Hirzel, IBM T. J. Watson Research Center
- Kun-Lung Wu, IBM T. J. Watson Research Center