CS 789 THEORY SEMINAR [home]
Speaker: Frank
McSherry
Affiliation: Microsoft/Cornell
University
Date: Monday, September 30, 2002
Title: "Data Mining via Spectral
Analysis"
Abstract:
Much attention has recently been paid to the analysis of data by first casting the data set as a matrix, and then considering its eigenvectors. This spectral analysis of data has been applied successfully in many domains; examples include Google's PageRank algorithm, Latent Semantic Analysis, and Kernel PCA. However, the success of spectral analysis is largely empirical, with little analytic understanding of why this approach works so well.
In this talk I will present a general framework for data mining problems and show how this framework justifies the application of spectral analysis. Specifically, we will see that the problems of collaborative filtering, web search, and data clustering fall into the framework and give rigorous bounds on the performance of spectral analysis on each problem. Furthermore, we will see how we can immediately translate ideas and understanding developed through this framework into new algorithms which address problems in graph theory and numerical analysis.
This research is joint work with Yossi Azar and Amos Fiat at Tel Aviv University, Dimitris Achlioptas at Microsoft Research, and Anna Karlin and Jared Saia at the University of Washington.
http://www.cs.washington.edu/homes/mcsherry