Elastic and Transparent Scaling for Stream Processing Applications

In this project, we are looking into automatically parallelizing stream processing applications. The goal is to take a sequential version of a stream program and produce a functionally equivalent version that is distributed and parallel. We then provide runtime mechanisms to fine tune the parallelization.

Auto Pipelining

In this project, we aim at locating pipeline parallelization opportunities in a data flow graph and performing runtime profiling and adaptation to exploit these opprtunities to achieve better throughput. An important challenge is to find a good setting among a combinatorially large number of choices.

Auto Data-parallelization

In this project, we aim at locating data parallelization opportunities in a data flow graph and perform fission to exploit these opportunities. An important aspect of this work is to ensure safety in the presence of selective and stateful operators, which require special runtime mechanisms.

Elastic Data-parallelization

In this project, we extend our work on auto data-parallelization with the aim of enabling runtime adaptation to changes in workload and resource availability. One particular challenge is to come up with an effective control algorithm. Another challenge is to manage partial state migration in the presence of stateful operators.

Multi-segment Elasticity

In this project, we look at the challenging problem of managing multiple parallel segments in a distributed system to optimize the throughput of auto data-parallelized applications.

Publications

Collaborators