Information-theoretic bounds and phase transitions in community detection and high-dimensional clustering


Cris Moore

Monday, April 24th, 2017
4:00pm 122 Gates Hall




Over the past decade, a rich interaction has grown up between statistical physics and statistical inference. In particular, physics often predicts phase transitions where, if a data set becomes too sparse or too noisy, it suddenly becomes impossible to find its underlying structure, or even to distinguish it from a “null model” with no structure at all. For instance, in the community detection problem in graphs, there is a transition below which the vertices cannot be labeled better than chance. Similarly, in the clustering problem in Gaussian mixture models, it becomes impossible to find the clusters, or even to tell whether clusters exist, i.e., to distinguish the data from a single Gaussian. 

Many of the physics predictions have been now made rigorous, but there is still a lively interchange between these fields, both about these transitions and about asymptotically optimal algorithms. In particular, while efficient message-passing and spectral algorithms are known that succeed down to the so-called Kesten-Stigum bound, in a number of problems there is a conjectured “possible but hard” regime where inference is information-theoretically possible, but conjectured to be computationally hard. I will give a friendly introduction to this field, and present some recent results along these lines. 

This is joint work with Jess Banks, Joe Neeman, Praneeth Netrapalli, and Jiaming Xu.