Speaker Marina Meila-Predoviciu
Time & Location: 4:15 PM, B17 Upson Hall
Host: Lillian Lee
Title: Efficient Leaning in high dimensions with trees and mixtures
How can one learn useful representations of multidimensional data, that can be used in a variety of tasks, some of them unspecified at the time of learning? This problem is called unsupervised learning and probabilistic models have proven to be particularly successful at it. However, learning and using models of multi-dimensional domains raises specific problems - termed the curse of dimensionality - that are not encountered in the univariate case. This talk shows how exploiting the computational properties of a simple probability model, the tree, leads to efficient, elegant and powerful algorithms for learning in multidimensional domains. The tree is distinguished among graphical models by its outstanding computational properties. I show how to combine trees into more powerful models, called mixtures of trees, and how these can be learned efficiently by a method based on the Maximum Spanning Tree and the EM algorithms. The basic tree learning algorithm is quadratic in the dimension of the data. I demonstrate that for sparse data it can be transformed into an algorithm that is subquadratic and that achieves speedup factors of up to a thousand. Experiments demonstrate the performance of trees and mixtures in classification and density estimation tasks.
No prior knowledge of graphical probability models is necessary to follow this talk.