I conduct research in the general area of data-intensive distributed systems, with a particular focus on fast data and big data. My work addresses issues such as scalability, parallelization, profiling and optimization, and fault-tolerance in data stream processing systems.
I have extended research interest in the areas of hardware acceleration for data management, big data technologies (MapReduce, distributed key-value stores), and large-scale graph processing.
My past research focus includes developing architectures and techniques to address scalability problems in large-scale distributed data intensive systems and applications, and support for distributed information monitoring services.
In this position, I have conducted academic research on data-intensive distributed systems. In particular, I have been heavily involved in the System S research project, which aims at building a scalable, extensible, and high-performance continuous data analysis platform. The goal of the platform is to facilitate the development and deployment of data-in-motion analytics to process high-volume feeds from multi-modal, live sources and produce near-real time insights with minimal latency. The System S project has received the 2010 R&D Magazine R&D 100 award as well as an IBM internal science accomplishment award. The project has been named as one of IBM's Icons of Progress for the centennial and I have been named as one of the three key contributors (link), alongside with my manager Halim Nagui (IBM Fellow).
In this position, I have led a large team developers in the design and implementation of the IBM InfoSphere Streams 2.0 product. This release has incorporated a complete overhaul of the programming language and the runtime and is the first industrial strength release of the platform that guarantees compatibility for future releases. I am the co-inventor of the SPL language that was introduced in this release. Solutions using the Streams platform has been deployed in government, telecommunications, health-care, and finance domains.
In this position, I have served as the lead architect for the programming model and the compiler of the IBM InfoSphere Streams 1.0 product. I am the co-inventor of the SPADE language that was introduced in this release, which served as a pre-cursor to the SPL language.
I have completed my Ph.D. study in the Systems focus area within Computer Science. My doctoral research has been on the topic of scalable information monitoring architectures. I have investigated the application of Continual Queries to new platforms such as peer-to-peer networks, mobile systems, and sensor networks. Major themes of my research included system architectures that promote incremental evaluation of queries, moving computation close to where data is produced, and run-time adaptation to changing conditions in resource availability.
I have completed my undergraduate study with a GPA of 3.91/4.0.