Elastic and Transparent Scaling for Stream Processing Applications

In this project, we are looking into automatically parallelizing stream processing applications. The goal is to take a sequential version of a stream program and produce a functionally equivalent version that is distributed and parallel. We then provide runtime mechanisms to fine tune the parallelization.

Auto Pipelining

In this project, we aim at locating pipeline parallelization opportunities in a data flow graph and performing runtime profiling and adaptation to exploit these opportunities to achieve better throughput. An important challenge is to find a good setting among a combinatorially large number of choices.

Auto Data-parallelization

In this project, we aim at locating data parallelization opportunities in a data flow graph and perform fission to exploit these opportunities. An important aspect of this work is to ensure safety in the presence of selective and stateful operators, which require special runtime mechanisms.

Elastic Data-parallelization

In this project, we extend our work on auto data-parallelization with the aim of enabling runtime adaptation to changes in workload and resource availability. One particular challenge is to come up with an effective control algorithm. Another challenge is to manage partial state migration in the presence of stateful operators.

Combined data and pipeline parallelism

In this project, we look at the problem of doing data and pipeline parallelism within the same framework, with the aim of finding a close to optimal configuration without spending excessive amount of resources for finding the parallelization configuration.

Publications

  • Basri Kahveci and Buğra Gedik. “Joker: A Stream Processing Engine with Organic Adaptation”, in preparation.
  • Semih Şahin and Buğra Gedik. “C-Stream: A Coroutine-based Elastic Stream Processing Engine”, ACM Transactions on Parallel Computing, Revisions Pending, 2017.
  • Martin Hirzel, Scott Schneider, Buğra Gedik. “SPL: An Extensible Language for Distributed Stream Processing”. ACM Transactions on Programming Languages and Systems (TOPLAS), 39(1), 2017.
  • Buğra Gedik, Hasibe Güldamla Özsema, Özcan Öztürk. “Pipelined Fission for Stream Programs with Dynamic Selectivity and Partitioned State”. Journal of Parallel and Distributed Computing (JPDC), 10.1016/j.jpdc.2016.05.003, 2016.
  • Scott Schneider, Buǧra Gedik, Martin Hirzel. “Language Runtime and Optimizations in IBM Streams”. Bulletin of the Technical Committee on Data Engineering, 38(4), 2016.
  • Scott Schneider, Martin Hirzel, Buğra Gedik, Kun-Lung Wu. "Safe data parallelism for general streaming", Transactions on Computers, IEEE (TC), 64(2), 504-517, 2015.
  • Martin Hirzel, Robert Soule, Scott Schneider, Buğra Gedik, Robert Grimm, "A Catalog of Streaming Optimizations", ACM Computing Surveys, ACM (CSUR), 46(4), 2014.
  • Buğra Gedik, Scott Schneider, Martin Hirzel, Kun-Lung Wu. "Elastic Scaling for Data Stream Processing", Transactions on Parallel and Distributed Systems, IEEE (TPDS), 25(6), 1447-1463, 2014.
  • Buğra Gedik. "Partitioning Functions for Stateful Data Parallelism in Stream Processing", Very Large Data Bases Journal (VLDBJ), 23(4), 517-539, 2014.
  • Yuzhe Tang and Buğra Gedik. "Auto-pipelining for Data Stream Processing", Transactions on Parallel and Distributed Systems, IEEE (TPDS), 24(12), 2344-2354, 2013.

Sponsors

  • FP7 European Commission, Marie Curie Actions

Students

  • Basri Kahveci, Ph.D., “Organic stream processing”, September 2015 - *
  • Güldamla Özsema, M.S., “Pipelined fission in stream processing systems”, September 2012 - December 2014. Completed.
  • Semih Şahin, M.S., “Scheduling data streaming applications for multi-core execution”, September 2013 – Expected June 2015.
  • Yağmur Şahin, M.S., “Locality-aware partitioning for stream processing”, February 2013 – Expected June 2015.

Collaborators

  • Özcan Öztürk, Bilkent University
  • Scott Schneider, IBM T. J. Watson Research Center
  • Martin Hirzel, IBM T. J. Watson Research Center
  • Kun-Lung Wu, IBM T. J. Watson Research Center
projects/autoparallel.txt · Last modified: 2017/06/12 03:00 by bgedik