- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Title: Incentive problems in data science competitions, and how to fix them
Abstract: Machine learning and data science competitions, wherein contestants submit predictions about held-out data points, are an increasingly common way to gather information and identify experts. One of the most prominent platforms is Kaggle, which has run competitions with prizes up to 3 million USD. The traditional mechanism for selecting the winner is simple: score each prediction on each held-out data point, and the contestant with the highest total score wins. Perhaps surprisingly, this reasonable and popular mechanism can incentivize contestants to submit wildly inaccurate predictions. The talk will begin with a series of experiments inspired by Aldous (2019) to build intuition for the incentive issues and what sort of strategic behavior one would expect---and when. One takeaway is that, despite conventional wisdom, large held-out data sets do not always alleviate these incentive issues, and small ones do not necessarily suffer from them, as we confirm with formal results. We will then discuss a new mechanism which is approximately truthful, in the sense that rational contestants will submit predictions which are close to their best guess. If time we will see how the same mechanism solves an open question for online learning from strategic experts.
Bio: Rafael (Raf) Frongillo is an Assistant Professor of Computer Science at the University of Colorado Boulder. His research lies at the interface between theoretical machine learning and economics, primarily focusing on information elicitation mechanisms, which incentivize humans or algorithms to predict accurately. Before Boulder, Raf was a postdoc at the Center for Research on Computation and Society at Harvard University and at Microsoft Research New York. He received his PhD in Computer Science at UC Berkeley, advised by Christos Papadimitriou and supported by the NDSEG Fellowship.