Text Box: Department of Computer Science at Cornell University
Text Box:

 

 

 

Text Box: What would it take to develop machine learners that run forever, each day improving their performance?  This talk will describe our attempt to build a never-ending language learner, NELL, that runs 24 hours per day, forever, learning to read the web.   Each day NELL extracts (reads) more facts from the web, and integrates these into its growing knowledge base of beliefs.  Each day NELL also learns to read better than yesterday, enabling it to go back to the text it read yesterday, and extract more facts, more accurately.

NELL has been running nonstop for over two years.  The result so far is a collection of 15 million interconnected beliefs (e.g., servedWtih(coffee, applePie), isA(applePie, bakedGood) ), that NELL is considering at different levels of confidence.  

The approach implemented by NELL is based on three key ideas: (1) coupling the semi-supervised training of thousands of diffent functions that extract different types of information from different web sources, (2) automatically discovering new constraints that more tightly couple the training of these functions over time, and (3) a curriculum or sequence of increasing difficult learning tasks.
Text Box:                The      
GERARD SALTON
                Lecture Series           
Text Box: Tuesday
February 7, 2012
Text Box: 4:15 pm
B17 Upson Hall
Reception - 4th Floor Atrium at 3:45pm

Tom Mitchell

Carnegie Mellon University

http://www.cs.cmu.edu/~tom.

Tom M. Mitchell is the E. Fredkin University Professor and founding head of the Machine Learning Department at Carnegie Mellon University. His research interests lie in machine learning, artificial intelligence, and cognitive neuroscience.  Mitchell is a member of the U.S. National Academy of Engineering, a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow and Past President of the Association for the Advancement of Artificial Intelligence (AAAI).  Mitchell believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.

Text Box: Learning to Read the Web

Gerard Salton (1927- 1995) A towering figure in the field of information retrieval, Gerard Salton synthesized ideas from mathematics, statistics, and natural language processing to create a scientific basis for extracting semantics from word frequency. The impact of his contributions is profound - five textbooks, over 150 research papers, and dozens of Ph.D. students. The modern computer science and information science research scene, with its terabyte databases, Web, and related technologies, owes a great deal to Gerry's pioneering efforts.

 

This lecture series honors our former colleague with speakers who similarly are innovators in their fields.

 

 

Text Box: The Salton Series is supported by Amit Singhal, Cornell PhD ‘97