In this project, we study data analytics that are performed on streaming big data. The key property of our solutions is their distributed nature to handle large volumes and their streaming nature to provide low latency and adapt to changes in the workload.
Many telco analytics require maintaining call profiles based on recent customer call patterns. Such call profiles are typically organized as aggregations computed at different time scales over the recent customer interactions. Due to the large number of customers, maintaining profile clusters have high processing and memory resource requirements. In order to tackle this problem, we apply distributed stream processing.
In this project, we develop a keyword-based pub-sub system supported by a distributed stream processing backend. The goal is to provide high throughput, low latency, as well as horizontal and vertical scalability. The idea is to do filtering and partitioning together to create a scalable system. We evaluate the system on Twitter feeds to show load balance and scalability properties.