Bilkent University
Department of Computer Engineering
MS Thesis Presentation


Scalable Streaming Profile Clustering for Telco Analytics


Mehmet Ali Abbasoğlu
MS Student
Computer Engineering Department
Bilkent University

Many telco analytics require maintaining call profiles based on recent customer call patterns. Such call profiles are typically organized as aggregations computed at different time scales over the recent customer interactions .Customer call profiles are key inputs for analytics targeted at improving operations, marketing, and sales of telco providers. Many of these analytics require clustering customer call profiles, so that customers with similar calling patterns can be modeled as a group. Example applications include optimizing tariffs, customer profiling, segmentation, and usage forecasting. We present an approach for clustering profiles that are incrementally maintained over a stream of updates. The goal is to maintain profile clusters for business intelligence, so that customers with similar behaviors can be grouped together. Due to the large number of customers, maintaining profile clusters have high processing and memory resource requirements. In order to tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over machines (nodes) such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, to adapt to potentially changing customer calling patterns, the partitioning of profiles to machines should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We provide a re-partitioning technique that achieves all these goals. We keep micro-cluster summaries at each node, collect these summaries at a centralized node, and use a greedy algorithm with novel affinity heuristics to revise the partitioning. We present a demo application that showcases our Storm and Hbase based implementation to the proposed solution in the context of a customer segmentation application.


DATE: 2 August, 2013, Friday @ 15:30