Finding Correlations, Learning Juntas, and the Closest Pair Problem



Greg Valiant

Microsoft Research, New England

Monday, December 3, 2012
4:00pm 5130 Upson Hall



Consider the following basic problem:  given a set of n boolean vectors with the promise that they are chosen uniformly at random, with the exception of a single pair of vectors that is slightly correlated, how quickly can one find the correlated pair?  Can one do better than the quadratic-time brute-force algorithm?  Yes.

Approaches such as the Locality Sensitive Hashing (LSH) can find the correlated pair in time n^(2-O(c)), where c is the correlation of the planted pair, and the constant in the O(c) term has been improved in a series of works.   We present a subquadratic time algorithm for this problem in which c does not appear in the exponent, having runtime
poly(1/c) n^1.6 .

This result is leveraged to yield new algorithmic results for several other problems, including learning parity with noise, learning juntas (given a function over n variables, in which only k<<n are actually relevant, how quickly can one find the set of k relevant variables?), and the eps-approximate closest pair problem (given n points, find a pair whose euclidean distance is at most a factor of 1+eps larger than that of the closest pair).