Maria-Florina Balcan

Kernel methods have proven to be powerful tools in machine learning. They perform well in many applications, and there is also a well-developed theory of sufficient conditions for a kernel to be useful for a given learning problem. However, while a kernel can be thought of as just a pairwise similarity function that satisfies additional mathematical properties, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this work we develop an alternative theory of learning with similarity functions more generally that does not require reference to implicit spaces, and does not require the function to be positive semi-definite. Our results
also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition.

An interesting feature of the proposed framework is that it can also be applied to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to cluster well: to learn well without any label information at all. We find that if we are willing to relax the objective (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the correct answer), then this question leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. Our work can be viewed as an approach to defining a PAC model for clustering with non-interactive feedback.

This talk is based on joint work with Avrim Blum and Santosh Vempala.

Monday, November 26, 2007 4:00 PM 5130 Upson Hall		Theory Seminar Fall 2007 CS 789
	Maria-Florina Balcan Carnegie Mellon University
	A Theory of Similarity Functions for Learning and Clustering
	Kernel methods have proven to be powerful tools in machine learning. They perform well in many applications, and there is also a well-developed theory of sufficient conditions for a kernel to be useful for a given learning problem. However, while a kernel can be thought of as just a pairwise similarity function that satisfies additional mathematical properties, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this work we develop an alternative theory of learning with similarity functions more generally that does not require reference to implicit spaces, and does not require the function to be positive semi-definite. Our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition. An interesting feature of the proposed framework is that it can also be applied to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to cluster well: to learn well without any label information at all. We find that if we are willing to relax the objective (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the correct answer), then this question leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. Our work can be viewed as an approach to defining a PAC model for clustering with non-interactive feedback. This talk is based on joint work with Avrim Blum and Santosh Vempala.