Testing Noisy Linear Equations for Sparsity

Abstract: Consider the following basic problem in sparse linear regression -- an algorithm gets labeled samples of the form (x, <w.x> + \eps) where w is an unknown n-imensional vector, x is drawn from a background distribution D and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Rhomberg and Tao (2005) shows that w can be recovered with samples and time which scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are  information theoretically necessary.

In this talk, we look at this question from the vantage point of property testing and study the decision variant of the following question -- namely, what is the complexity of deciding if the unknown vector w is k-sparse (or at least say 0.01 far from k-sparse in \ell_2distance). We show that the decision version of the problem can be solved with samples which are independent of n as long as the background distribution D is i.i.d. and the components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale as log n (thus showing our results are tight).

Joint work with Xue Chen (Northwestern) and Rocco Servedio (Columbia).

Bio: Anindya De is an Assistant Professor at the University of Pennsylvania. Prior to Penn, he spent three years as an Assistant Professor at Northwestern University. He finished his PhD from UC Berkeley in 2013 advised by Luca Trevisan and was a Simons Research fellow and a postdoctoral fellow at IAS and DIMACS. Anindya is interested in complexity theory, learning theory and harmonic analysis of Boolean functions.