Microsoft Research, New England
Consider the following basic problem: given a set of n boolean vectors with the promise that they are chosen uniformly at random, with the exception of a single pair of vectors that is slightly correlated, how quickly can one find the correlated pair? Can one do better than the quadratic-time brute-force algorithm? Yes.
Approaches such as the Locality Sensitive Hashing (LSH) can find the correlated pair in time n^(2-O(c)), where c is the correlation of the planted pair, and the constant in the O(c) term has been improved in a
series of works. We present a subquadratic time algorithm for this
problem in which c does not appear in the exponent, having runtime
poly(1/c) n^1.6 .
This result is leveraged to yield new algorithmic results for several other problems, including learning parity with noise, learning juntas (given a function over n variables, in which only k<<n are actually relevant, how quickly can one find the set of k relevant variables?), and the eps-approximate closest pair problem (given n points, find a pair whose euclidean distance is at most a factor of 1+eps larger than that of the closest pair).