CS SEMINAR

Mean estimation in R^d

Speaker
Prof. Shahar Mendelson, Australian National University
Chaired by
Dr Jonathan Mark SCARLETT, Associate Professor, School of Computing
scarlett@comp.nus.edu.sg

24 Aug 2023 Thursday, 01:30 PM to 02:30 PM

MR20, COM3-02-59

Abstract: Consider an unknown random vector X, taking values in R^d. Is it possible to "guess" its mean accurately if the only information one is given consists of N independent copies of X? More accurately, given an arbitrary norm on R^d, the goal is to find a mean estimation procedure: upon receiving a wanted confidence parameter \delta and N independent copies X_1,...,X_N of an unknown random vector X - that has a finite mean and covariance -, the procedure returns \hat{\mu} for which the error \| \hat{\mu} - E X\| is as small as possible with probability at least 1-\delta (with respect to the product measure).

This mean estimation problem has been studied extensively over the years and I will present some of the ideas that have led to its solution. Two rather surprising facts are that the obvious choice, setting \hat{\mu} to be the empirical mean N^{-1}\sum_{i=1}^N X_i is actually a terrible option for small confidence parameters \delta (most notably, when X is "heavy tailed"); and, what is even more surprising is that one can find an optimal procedure that performs as if the (arbitrary) random vector X were Gaussian.

Bio: Prof. Shahar Mendelson is a Professor of Mathematics at the Australian National University. His research is on the mathematics of Data Science, including the connections between Statistical Learning Theory, Empirical Process Theory, and Asymptotic Geometric Analysis. His works have been published extensively in top venues such as the Conference on Learning Theory, Annals of Probability, and Annals of Statistics. He has also received a number of awards including the Medal of the Australian Mathematical Society and the Technion Taub Prize for research excellence.