- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Testing Noisy Linear Equations for Sparsity
Abstract: Consider the following basic problem in sparse linear regression -- an algorithm gets labeled samples of the form (x, <w.x> + \eps) where w is an unknown n-imensional vector, x is drawn from a background distribution D and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Rhomberg and Tao (2005) shows that w can be recovered with samples and time which scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are information theoretically necessary.
In this talk, we look at this question from the vantage point of property testing and study the decision variant of the following question -- namely, what is the complexity of deciding if the unknown vector w is k-sparse (or at least say 0.01 far from k-sparse in \ell_2distance). We show that the decision version of the problem can be solved with samples which are independent of n as long as the background distribution D is i.i.d. and the components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale as log n (thus showing our results are tight).
Joint work with Xue Chen (Northwestern) and Rocco Servedio (Columbia).
Bio: Anindya De is an Assistant Professor at the University of Pennsylvania. Prior to Penn, he spent three years as an Assistant Professor at Northwestern University. He finished his PhD from UC Berkeley in 2013 advised by Luca Trevisan and was a Simons Research fellow and a postdoctoral fellow at IAS and DIMACS. Anindya is interested in complexity theory, learning theory and harmonic analysis of Boolean functions.