Bilkent University
Department of Computer Engineering
CS 590 SEMINAR

 

Real-time Multi-Stream Data Classification with Bias Correction

 

Ömer Gözüaçık
MS Student
Computer Engineering Department
Bilkent University

Data streams are becoming more important everyday as amount of data generated increases. Traditionally data is analyzed in datasets where certain amount of data is stored beforehand and not changed. However, with the increasing amount of temporal data, it is very hard to store and process them due to their size and complexity. Furthermore, these datasets only provide snapshot of information about a certain interval in time. However, with the real-time processing, such drawbacks can be fixed. Real-time stream processing aims to process data as it comes and give results instantly. Typically, train and test distributions of the data streams are considered same and the class labels are available. In multi-stream context, there are two different stream types: source and target. They are from the same domain but asynchronous. Only the source (train) stream is labeled and its distribution does not have to match with the target (test) due to sample selection bias. Sample selection bias occurs when there is a difference in the joint probability distribution of two sets while conditional probability of feature and labels are same. The source data is weighted to match the distribution of the target by using various techniques (KMM: Kernel Mean Matching, SK: Surrogate Kernels, ...) which generally have high time complexity and intractable for real time processing. Afterwards, bias-corrected source is used to create a target classifier. This classifier is not updated unless there is a drift in the target stream. In general, how multi-stream data can be classified effective will be evaluated, while making changes in the bias-correction and classification phases.

 

DATE: 22 October, 2018, Monday, CS590 presentations begin at @ 15:40
PLACE: EA-409