The world is experiencing an explosion in the amount of online data produced continuously by different types of sensors, processes, and human activities. Being able to analyze live data as it is generated continuously, and distil insights for improved decision making, is vital to several large and complex applications in domains including financial systems, cyber- and physical-security systems, environmental monitoring, health care, manufacturing systems, telecommunication networks and power distribution grids.
Existing store-and-process information management technologies are ill-suited to meet the performance, scalability, and usability requirements of these applications.
Stream processing is a novel distributed compute paradigm that supports the gathering, processing, and analysis of high-volume, heterogeneous, continuous data streams, to extract insights and actionable results in real time. Stream processing builds on research from several domains ranging from distributed systems and relational databases, to programming languages and design, to signal processing and data mining algorithms.
In this course, we provide the fundamentals of the emerging stream processing paradigm. We introduce the key components of this paradigm, including the distributed system infrastructure, the programming model, the design patterns, and the streaming analytics.
Throughout the course, we describe the underlying theoretical principles, illustrative examples and implementations, and end to end real-world case studies to provide students and practitioners a comprehensive guide to building such systems and applications, and advance the state-of-the-art.
Finally, this course includes hands-on exposure to large-scale stream processing through relevant homework assignments involving programming exercises. The central component of this class revolves around student seminars and a final design and implementation project, allowing the students to explore the state-of-the art in this field with the potential for tackling open research challenges.
Programming exercises can be performed on a stream processing middleware. Two options include:
The book is going to be published this year. So for now, please contact me for a draft.
Students will design and implement group projects (in groups of 2 students) aimed at creating moderate-sized stream processing applications and to experiment and showcase state-of-the art algorithms in a close to real setting. Some project ideas:
We will have 15 weeks of classes. Each week will have 2 lectures, where each lecture consists of 2x 50 minutes with a 10 minute break in between. The second hour of the Wednesday lecture is 'spare', and will be used rarely.