Cornell Department of Computer Science Colloquium
4:15pm, Thursday December 13th, 2001
B17 Upson Hall

Learning Theory for Large Models

       David McAllester
        AT&T Research Labs
    http://www.research.att.com/~dmac/

 

Occam's razor provides a foundation for learning theory --- one avoids over fitting the data by using a simple predictive model, i.e., a model whose description length is short compared to the description length of the training data.  This talk will start by reviewing a simple nonBayesian (a PAC-Bayesian) justification for Occam's razor.  But Occam's razor seems to work poorly in practice.

In language modeling, for example, the best performing models memorize the training data.  The talk will then present a nonBayesian theoretical framework for understanding data-memorizing models.  This theoretical framework includes a new general approach to proving concentration inequalities with applications to the particular case of leave-one-out performance estimators.