Sample-optimal inference, the method of moments, and community detection


Sam Hopkins

Monday, April 10th, 2017
4:00pm 122 Gates Hall




We propose a simple and efficient meta-algorithm for Bayesian estimation problems (i.e. hidden variable, latent variable, or planted problems). Our algorithm uses low-degree polynomials together with new and highly robust tensor decomposition methods. We focus on the question: for a given estimation problem, precisely how many samples (up to low-order additive terms) do polynomial-time algorithms require to obtain good estimates of hidden variables? Our meta-algorithm is broadly applicable, and achieves statistical or conjectured computational sample-complexity thresholds for many well-studied problems, including many for which previous algorithms were highly problem-specific.

As a running example we employ the stochastic block model -- a widely studied family of random graph models which contain latent community structure. We recover and unify the proofs of the best-known sample complexity bounds for the partial recovery problem in this model. We also give the first provable guarantees for partial recovery of community structure in constant-degree graphs where nodes may participate in many communities simultaneously. This model is known to exhibit a sharp sample complexity threshold -- with fewer than a very specific number of samples, recovering community structure becomes impossible. While previous explanations for this phenomenon appeal to sophisticated ideas from statistical mechanics, we give a new and simple explanation based on properties of low-degree polynomials.

Joint work with David Steurer.