I conduct research in the general area of data-intensive distributed systems, with a particular focus on fast data and big data. My work involves run-time, compiler, and language design, and addresses issues such as parallelization, fault-tolerance, profiling, optimization, and debugging in stream processing systems.
I have extended research interest in the areas of hardware acceleration for data management, big data technologies (MapReduce, distributed key-value stores), and large-scale graph processing.
My past research focus includes developing architectures and techniques to address scalability problems in large-scale distributed data intensive systems and applications, and support for distributed information monitoring services.
In this position, I have been conducting academic research on topics listed under the research direction section above. As part of these activities, I have been heavily involved in the System S research project, which aims at building a scalable, extensible, and high-performance continuous data analysis platform. The goal of the platform is to facilitate the development and deployment of data-in-motion analytics to process high-volume feeds from multi-modal, live sources and produce near-real time insights with minimal latency. The System S project has been the recipient of the 2010 R&D Magazine's R&D 100 award as well as an IBM internal science accomplishment award. The project has been named as one of IBM's Icons of Progress for the centennial (http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/streamcomputing) and I have been named as one of the three key contributors, alongside with my manager Halim Nagui (IBM Fellow).
In this position, I have led a team of ~30 developers in the design and implementation of the IBM InfoSphere Streams 2.0 product. This release has incorporated a complete overhaul of the programming language and the runtime and is the first industrial strength release of the platform that guarantees binary compatibility for future releases. I am the co-inventor of the SPL language that was introduced in this release. Solutions using the Streams platform has been deployed in government, telecommunications, health-care, and finance domains.
In this position, I have served as the lead architect for the programming model and the compiler of the IBM InfoSphere Streams 1.0 product. I am the co-inventor of the SPADE language that was introduced in this release, which served as a pre-cursor to the SPL language.
I have completed my Ph.D. study in the Systems focus area within Computer Science. My doctoral research has been on the topic of scalable information monitoring architectures. I have investigated the application of Continual Queries to new platforms such as peer-to-peer networks, mobile systems, and sensor networks. Major themes of my research included system architectures that promote incremental evaluation of queries, moving computation close to where data is produced, and run-time adaptation to changing conditions in resource availability.
During my Ph.D. study, I have also served as a teachnig asistant for the graudate-level 'Real-time and Embedded Systems' course and gave guest lectures for courses such as 'Internet Computing' and 'Introduction to Enterprise Computing'.
I have completed my undergraduate study as second in class with a GPA of 3.91/4.0.