- About
- Events
- Calendar
- Graduation Information
- Cornell Tech Colloquium
- Student Colloquium
- BOOM
- Spring 2023 Colloquium
- Conway-Walker Lecture Series
- Salton Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University High School Programming Contests 2023
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- Research Night
- Cornell Junior Theorists' Workshop
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs (via Zoom)
Abstract: Multi epoch, small batch, Stochastic Gradient Descent (SGD) has been the method of choice for training large overparameterized deep learning models. A popular theory for explaining why SGD solutions generalize well is that SGD algorithm perhaps has an implicit regularization that is biasing its output to good solutions. Indeed, for certain simple models, prior works have worked out the exact implicit regularizer that corresponds to running SGD. However, we prove in this paper that in general no such implicit regularization can explain the generalization of SGD. In fact, constructing specific instances of both stochastic convex optimization problems and restricted deep learning networks, we demonstrate that there are learning problems where SGD learns but no regularized empirical risk minimizer can match the performance of SGD. We also discuss the role of small batch size and multiple epochs in explaining the empirical success of SGD for deep learning.
Bio: Ayush Sekhari is a 4th year PhD student in the Computer Science department at Cornell University, advised by Professor Karthik Sridharan and Professor Robert D. Kleinberg. His research interests span across online learning, reinforcement learning and control, optimization and the interplay between them. Before coming to Cornell, he spent a year at Google as a part of the Brain residency program. Before Google, he completed his undergraduate studies in computer science from IIT Kanpur in India.