Column Subset Selection Problem in Matrices: Complexity, Algorithms and Applications


Ali Civril
PhD student
Computer Science Department
Rensselaer Polytechnic Institute
Troy, NY, USA

We are interested, in a broad sense, the problem of selecting a subset of columns of a given matrix A so as to capture its whole spectrum as much as possible. This notion, quantified in a few different ways have long been tackled by the numerical linear algebra and theoretical computer science community in the past two decades. All related to this general scheme, our talk has three parts: 1) We first present linear time spectral graph drawing algorithm SSDE, based on an approximate decomposition of a distance matrix associated with the graph. Our algorithm is a vast improvement over the well known method Classical Multi-Dimensional Scaling (CMDS) and prevents from computing the whole distance matrix. 2) Motivated by the practical concerns of SSDE, we define the problem MAX-VOL, in which we seek a subset of columns of a matrix, whose parallelepiped has the maximum volume over all possible choices. Together with a few other related problems, we establish its NP-hardness and show that it does not admit PTAS. We also present an approximation algorithm for MAX-VOL along with a lower bound for the algorithm. 3) We generalize the sparse approximation problem and present an algorithm for choosing a subset of columns of a matrix A, to approximate a predefined subspace. In case this subspace is the best rank-k approximation to A, we have a relative error low-rank matrix approximation algorithm. This is the first deterministic algorithm which has provably (1+epsilon) relative error for arbitrarily small epsilon. Although, the number of columns chosen by the algorithm depends on a parameter related to the structure of A, our preliminary experiments on real and random data sets show that it is superior to other deterministic methods.


DATE: 19 September, 2008, Friday@ 13:40