Text Box: Department of Computer Science at Cornell University
Text Box: Text Box: The Salton Series is supported by Amit Singhal, Cornell PhD ‘97


Gerard Salton (1927- 1995) A towering figure in the field of information retrieval, Gerard Salton synthesized ideas from mathematics, statistics, and natural language processing to create a scientific basis for extracting semantics from word frequency. The impact of his contributions is profound - five textbooks, over 150 research papers, and dozens of Ph.D. students. The modern computer science and information science research scene, with its terabyte databases, Web, and related technologies, owes a great deal to Gerry's pioneering efforts.


This lecture series honors our former colleague with speakers who similarly are innovators in their fields.



Text Box: About fourteen years ago (early 1995) I gave a talk at Cornell on some work I had been doing on statistical parsing—developing programs to determine the syntactic structure of English sentences using statistical methods. Professor Salton was in attendance and in the question period noted his approval that I could present numerical results. Of course, at that time the results were pretty bad.

The research program that I and many others have pursued since then has been remarkably successful. Indeed, for English, and for “standard” newspaper text, the problem can almost be considered solved in so far as there are several parsers on the web that can produce quite acceptable parses for all the articles in, say, today’s New York Times. 

The bulk of the talk will describe what has led to this happy state of affairs. At the end we will look at where new work in the area is going. As you might expect, it is largely non-English or non-standard.
Text Box:                The      
                Lecture Series           
Text Box: Thursday
January 22, 2009
Text Box: 4:15 pm
B17 Upson Hall
Reception - 4th Floor Atrium at 3:45pm


Professor of Computer Science and Cognitive Science 


His research has always been in the area of language understanding or technologies which relate to it, such as knowledge representation, reasoning under uncertainty, and learning. Over the last few years he has been interested in statistical techniques for language understanding.  His  research in  this area has included work in the subareas of part-of-speech tagging, probabilistic context-free grammar induction, and, more recently, syntactic disambiguation through word statistics, efficient syntactic parsing, and lexical resource acquisition through statistical means. 

Text Box: Statistical Parsing 
Fourteen Years Later