- About
- Events
- Calendar
- Graduation Information
- Cornell Tech Colloquium
- Student Colloquium
- BOOM
- CS Colloquium
- Conway-Walker Lecture Series
- Salton Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University High School Programming Contest
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- Research Night
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Field of Computer Science Ph.D. Student Handbook
- Ph.D. Requirements
- Business Card Policy
- Computer Science Graduate Office Hours
- Cornell Tech
- Curricular Practical Training
- Exam Scheduling Guidelines
- Fellowship Opportunities
- Field A Exam Summary Form
- Graduate School Forms
- Ph.D. Student Financial Support
- Special Committee Selection
- The Outside Minor Requirement
- Travel Funding Opportunities
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Groups
- Student Spotlights
- Contact PhD Office
Abstract:
It is common to hear that certain natural language processing (NLP) tasks have been "solved". These claims are often misconstrued as being about general human capabilities (e.g., to answer questions, to reason with language), but they are always actually about how systems performed on narrowly defined evaluations. Recently, adversarial testing methods have begun to expose just how narrow many of these successes are. This is extremely productive, but we should insist that these evaluations be *fair*. Has the model been shown data sufficient to support the kind of generalization we are asking of it? Unless we can say "yes" with complete certainty, we can't be sure whether a failed evaluation traces to a model limitation or a data limitation that no model could overcome. In this talk, I will present a formally precise, widely applicable notion of fairness in this sense. I will then apply these ideas to natural language inference by constructing challenging but provably fair artificial datasets and showing that standard neural models fail to generalize in the required ways; only task-specific models are able to achieve high performance, and even these models do not solve the task perfectly. I'll close with discussion of what properties I suspect general-purpose architectures will need to have to truly solve deep semantic tasks.
(joint work with Atticus Geiger, Stanford Linguistics)
Bio:
Christopher Potts is Professor of Linguistics and, by courtesy, of Computer Science, at Stanford, and Director of the Center for the Study of Language and Information (CSLI) at Stanford. In his research, he develops computational models of linguistic reasoning, emotional expression, and dialogue. He is the author of the 2005 book The Logic of Conventional Implicatures as well as numerous scholarly papers in linguistics and natural language processing.