Machine Learning Theory (CS 6783)
News :
- Homewaork 1 is out, due sep 16th!
- Course added to Piazza, please join.
- Welcome to first day of class!
Location and Time :
Location : Hollister Hall, 306
Time : Tue-Thu 1:25 PM to 2:40 PM
Office Hours : Fridays, 2-3pm
Description :
We will discuss both classical results and recent advances in both statistical (iid batch) and online learning theory. We will also touch upon results in computational learning theory. The course aims at providing students with tools and techniques to understand inherent mathematical complexities of learning problems, to analyze and prove performance gaurantees for machine learning methods and to develop theoretically sound learning algorithms.
Pre-requisite :
Student requires a basic level of mathematical maturity and ease/familiarity with theorems and proofs style material. Familiarity with probability theory, basics of algorithms and an introductory course on Machine Learning (CS 4780 or equivalent) are required. M.Eng. and undergraduate students require permission of instructor.
Tentative topics :
1. Introduction
Overview of the learning problem : statistical and online learning frameworks.
PAC and agnostic PAC models, online learning protocol, (adaptive/ oblivious adversaries, various notions of regret)
connections to optimization and statistical estimation
2. Minimax formulation for learning, distribution free and adversarial learning settings, uniform guarantees and no free lunch theorems
3. Statistical Learning Framework
. Empirical risk minimization and Regularized empirical risk minimization
. Uniform convergence (iid data)
. finite classes, PAC Bayes theorem, compression bounds
. VC dimension and growth function
. Rademacher complexity, covering numbers, Dudley integral bounds, fat-shattering dimension
. Supervised learnability : necessary and sufficient conditions via uniform convergence (iid)
. Local Rademacher analysis and fast rates
4. Online Learning Framework (sequential prediction/decision making)
. Learning with expert advice, perceptron, winnow
. Sequential minimax analysis for online learning
. Uniform convergence over martingales
. Sequential Rademacher complexity, sequential covering numbers, Sequential fat-shattering dimension
. Supervised online learnability : necessary and sufficient conditions via martingale uniform convergence
5. Deriving algorithms through relaxation and minimax analysis for online learning
6. Additional topics
. Connections between learning and approximation algorithms
. Connections between learning and optimization
. Algorithmic stability tools
. Statistical estimation Vs statistical learning Vs Stochastic optimization
Assignments :
There will be a total of 5 assignments covering 55% of your grade.
Term project :
There will be a small term project due by the end of the course. The project is worth 40% of your grade. The project could be one of :
1. Your choice of research problem approved by me for which you will submit a report by end of term or,
2. Completing problems worth up to 10 stars from a list of problems which shall be provided soon. (number of stars for a given problem will depend on its difficulty)
Project proposal/approval of project due on oct 15th.
Lectures :
- Lecture 1 : Introduction, course details, what is learning theory, learning frameworks [slides]
Reference : [1] (ch 1 and 3)
- Lecture 2 : Minimax Rates, comparing the different frameworks [lec2]
- Lecture 3 : No Free Lunch, Statistical Learning [lec3]
- Lecture 4 : Statistical Learning: ERM ,Finite classes, MDL [lec4]
- Lecture 5 : Statistical Learning: MDL continued, infinite classes [lec5]
- Lecture 6 : Statistical Learning: Symmetrization, Rademacher Complexity, Growth function [lec6]
- Lecture 7 : Statistical Learning: Growth function, VC Dimension, Massart's finite lemma [lec7]
- Lecture 8 : Statistical Learning VC Sauer Shelah Lemma continued [lec7]
- Lecture 9 : Statistical Learning: Properties of Rademacher complexity, examples [lec9]
- Lecture 10 : Statistical Learning: Examples continued, Covering numbers, Pollard bound [lec10]
- Lecture 11 : Statistical Learning: Covering numbers, Pollard bound, Dudley Bound [lec11]
- Lecture 12 : Statistical Learning: Covering numbers, fat-shattering dimension, learnability [lec12]
- Lecture 13 : Online Learning: Halving, Exponential weights, minimax rate for bit prediction [lec13]
- Lecture 14 : Online Convex Learning: Online Gradient Descent [lec14]
- Lecture 15 : Online Mirror Descent [lec15]
- Lecture 16 : Online Mirror Descent [lec16]
- Lecture 17 : Online Mirror Descent, continued. [lec17]
- Lecture 18 : Minimax Rates for Online Learning [lec18]
- Lecture 19 : Minimax Rates for Online Learning, Sequential Rademacher Complexity [lec19]
- Lecture 20 :Sequential Rademacher Complexity [lec20]
- Lecture 21 :Sequential Complexity Measures [lec21]
- Lecture 22: Relaxations for Online Learning [lec22]
- Lecture 23: Relaxations for Online Learning [lec23]
- Lecture 24: Relaxations for Randomized Algorithms [lec24]
Reference Material : (more references and specific links will be added as we go)
- Statistical Learning and Sequential Prediction, A. Rakhlin and K. Sridharan [pdf]
- Introduction to Statistical Learning Theory, O. Bousquet, S. Boucheron, and G. Lugosi [pdf]
- Prediction Learning and Games, N. Cesa-Bianchi and G. Lugosi [link]
- Understanding Machine Learning From Theory to Algorithms, S. Ben David and S. Shalev-Shwartz [link]
- Introduction to Online Convex Optimization, Elad Hazan [link]
- Concentration inequalities, S. Boucheron, O. Bousquet, and G. Lugosi [pdf]
- A Gentle Introduction to Concentration Inequalities, K. Sridharan [pdf]
- On the Vapnik-Chervonenkis-Sauer Lemma by Leon Bottou [link]