Seminar in Computer Engineering

Bilkent University
Department of Computer Engineering
CS 590 SEMINAR

Active Learning by Statistical Leverage Scores

Cem Orhan
MS Student
Computer Engineering Department
Bilkent University

Label scarcity is a serious problem in many machine-learning tasks. Active learning framework addresses this challenge by effectively selecting which examples to label. In the pool-based active learning framework for classification, active learner is provided with a large set of unlabeled examples augmented with few labeled instances. Active learner aims to obtain a classifier of high accuracy by using lesser amount of label requests in comparison to passive learning through effective queries. Many different querying strategies have been developed for the pool-based active learning setting in the past two decades, in which the examples are selected based on their informativeness or representativeness. We present a novel querying method based on statistical leverage scores computed on the kernel matrix of the examples. The statistical leverage score of a row in a matrix are the squared row-norms of the top k-dimensional eigenspace as defined in [2] and it can be used as a measure of influence of the row on the matrix. Leverage scores have been used for detecting highly influential points in regression diagnostics [1] and have been recently shown to be useful for randomized low-rank matrix approximation algorithms [2,3]. In our querying strategy, ALEVS, labels are requested based on their leverage scores iteratively. Our experiments on several binary classification benchmark datasets demonstrate that ALEVS is an effective querying strategy.
[1] S. Chatterjee and A. S. Hadi. Influential observations, high leverage points, and outliers in linear regression. Statist. Sci., 1(3):379–393, 08 1986.
[2] A. Gittens and M. Mahoney. Revisiting the Nyström method for improved largescale machine learning. In Proceedings of the 30th International Conference on Machine Learning, 2013.
[3] Mahoney, Michael W., and Petros Drineas. “CUR matrix decompositions for improved data analysis.” Proceedings of the National Academy of Sciences 106, no. 3 (2009): 697-702.

DATE: 28 March, 2016, Monday @ 16:50
PLACE: EA-409