LFCS Seminar

Testing Properties of Distributions

Tugkan Batu

London School of Economics

4pm Tuesday, 19th of May, 2009
Room 4.31/4.33, Informatics Forum

Abstract

In this talk, we will describe algorithms for several fundamental statistical inference tasks. The algorithms are given access only to i.i.d. samples from the input distributions and make inferences based on these samples. The main focus of this research is the sample complexity of each task as a function of the domain size for the underlying discrete probability distributions. The inference tasks studied include (i) similarity to a fixed distribution (i.e., goodness-of-fit); (ii) similarity between two distributions (i.e., homogeneity); (iii) independence of joint distributions; and (iv) entropy estimation. For each of these tasks, an algorithm with sublinear sample complexity is presented (e.g., a goodness-of-fit test on a discrete domain of size $n$ is shown to require $O(\sqrt{n}polylog(n))$ samples). Given some extra information on the distributions (such as the distribution is monotone or unimodal with respect to a fixed total order on the domain), the sample complexity of these tasks become polylogarithmic in the domain size.