Bilkent University
Department of Computer Engineering


Testing Properties of Discrete Distributions


Dr. Tuğkan Batu
London School of Economics

In this talk, we will survey some algorithmic results for several fundamental statistical inference tasks on distributional properties of data. The algorithms are given access only to i.i.d. samples from the discrete input distributions and make inferences based on these samples.

The main focus of this research is the sample complexity of each task as a function of the domain size for the underlying discrete probability distributions. The inference tasks studied include (i) similarity to a fixed distribution (i.e., goodness-of-fit); (ii) similarity between two distributions (i.e., homogeneity); (iii) independence of joint distributions; and (iv) entropy estimation. For each of these tasks, an algorithm with sublinear sample complexity is presented (e.g., a goodness-of-fit test on a discrete domain of size $n$ is shown to require $O(\sqrt{n}\log(n))$ samples). Given certain extra information on the distributions (such as the distribution is monotone or unimodal with respect to a fixed total order on the domain), the sample complexity of these tasks become polylogarithmic in the domain size.

Bio: Tugkan Batu has received his B.S. degree in Computer Science from Bilkent University in 1996. He has received his Ph.D. in Computer Science from Cornell University in August 2001. He has held postdoctoral fellow positions at University of Pennsylvania, the University of Texas at Austin, and Simon Fraser University. Since September 2006, he has been a Lecturer in the Department of Mathematics at the London School of Economics.


DATE: 26 November, 2012, Monday @ 13:30