Machine Learning Theory (CS 6783)
News :
- Lecture 25 pdf posted.
- Lecture 24 pdf posted.
- Lecture 23 pdf posted.
- Lecture 22 pdf posted.
- Lecture 21 pdf posted.
- Lecture 20 pdf posted.
- Lecture 19 pdf posted.
- Lecture 18 pdf posted.
- Lecture 17 pdf posted.
- Lecture 16 pdf posted.
- Lecture 15 pdf posted.
- Lecture 14 pdf posted.
- Lecture 13 pdf posted.
- Lecture 12 pdf posted.
- Lecture 11 pdf posted.
- HW 2 is out, due on oct 7th..
- Lecture 10 pdf posted.
- Lecture 9 pdf posted.
- Lecture 8 pdf posted.
- Lecture 7 pdf posted.
- Lecture 6 pdf posted.
- Lecture 5 pdf posted.
- Office Hours Fridays, 2-3pm
- Lecture 4 pdf posted.
- Homework 1 is out. Due Sep 11th. Available through CMS. If you are having trouble finding it, contact me. Sumit assignment in latex format via cms or via email.
- Course added on Piazza please join
- Lecture 3 pdf posted.
- Homework 0 : correction on problem 1 posted.
- Homework 0 posted : not for grade no need to submit
- Lecture 1 slides available
Location and Time :
Location : Upson, 5130
Time : Tue-Thu 1:25 PM to 2:40 PM
Office Hours : Fridays, 3-4pm
Description :
We will discuss both classical results and recent advances in both statistical (iid batch) and online learning theory. We will also touch upon results in computational learning theory. The course aims at providing students with tools and techniques to understand inherent mathematical complexities of learning problems, to analyze and prove performance gaurantees for machine learning methods and to develop theoretically sound learning algorithms.
Pre-requisite :
Student requires a basic level of mathematical maturity and ease/familiarity with theorems and proofs style material. Familiarity with probability theory, basics of algorithms and an introductory course on Machine Learning (CS 4780 or equivalent) are required. M.Eng. and undergraduate students require permission of instructor.
Tentative topics :
1. Introduction
Overview of the learning problem : statistical and online learning frameworks.
PAC and agnostic PAC models, online learning protocol, (adaptive/ oblivious adversaries, various notions of regret)
connections to optimization and statistical estimation
2. Minimax formulation for learning, distribution free and adversarial learning settings, uniform guarantees and no free lunch
3. Statistical Learning Framework
. Empirical risk minimization and Regularized empirical risk minimization
. Uniform convergence (iid data)
. finite classes, PAC Bayes theorem, compression bounds
. VC dimension and growth function
. Rademacher complexity, covering numbers, Dudley integral bounds, fat-shattering dimension
. Supervised learnability : necessary and sufficient conditions via uniform convergence (iid)
. Local Rademacher analysis and fast rates
4. Online Learning Framework (sequential prediction/decision making)
. Learning with expert advice, perceptron, winnow
. Sequential minimax analysis for online learning
. Uniform convergence over martingales
. Sequential Rademacher complexity, sequential covering numbers, Sequential fat-shattering dimension
. Supervised online learnability : necessary and sufficient conditions via martingale uniform convergence
5. Deriving algorithms through relaxation and minimax analysis for online learning
6. Computational Learning Theory (Lower bounds)
. Computational hardness for proper learning
. Cryptographic hardness for learning and average case hardness
. Lower bounds through oracle models of optimization
7. Additional topics
. Algorithmic stability tools
. Statistical estimation Vs statistical learning Vs Stochastic optimization
Assignments :
There will be a total of 6 assignments, 5 that count towards your grade (Homework 0 is doesn'c count towards your grade and need not be submitted).
- Homework 0 : [pdf] (don't submit, not for grade)
- Homework 1 : [via cms] (submit solutions via cms or email)
Term project :
There will be a small term project due by the end of the course. The project could be one of :
1. Your choice of research problem approved by me for which you will submit a report by end of term or,
2. Reading two papers related to the course and writing detailed reports about them or,
3. Completing problems worth up to 10 stars from a list of problems which shall be provided soon. (number of stars for a given problem will depend on its difficulty)
Lectures :
- Lecture 1 : Introduction, course details, what is learning theory, learning frameworks [slides]
Reference : [1] (ch 1 and 3)
- Lecture 2 : Recap, no free lunch theorem, minimax formulations for PAC (realizable) setting, non-parametric regression (well specified case), statistical learning and online learning. Relationship between the corresponding minimax rates.
Reference : [1] (ch 5, only sections : 5.2 (no universal data compression or online convex optimization was covered), Section 5.3.1)
- Lecture 3 : Statistical learning, Empirical risk minimization, Minimax rates and uniform convergence, Finite class case, Mdl bound, Uniform Vs Universal rates
[lec 3]
- Lecture 4 : Symmetrization, Rademacher complexity, examples, Binary classification and growth function.
[lec 4]
- Lecture 5 : binary classifciation in statistical lerarning setting, growth function, VC dimension, VC/Sauer/Shelah Lemma, Charecterization of learnability for binary classification problems in statistical learning framework.
[lec 5]
- Lecture 6 : VC dimension review + continued.
[lec 6]
- Lecture 7 : Properties of Rademacher complexity.
[lec 7]
- Lecture 8 : Covering numbers and Pollard bound
[lec 8]
- Lecture 9 : Dudley chaining
[lec 9]
- Lecture 10 : Fat shattering dimension
[lec 10]
- Lecture 11 : Supervised learnability, Intro to Online Learning
[lec 11]
- Lecture 12 : Online Learning, experts algorithm, bit prediction
[lec 12]
- Lecture 13 : Online Learning, minimax rate, sequential Rademacher complexity
[lec 13]
- Lecture 14 : Sequential Rademacher complexity
[lec 14]
- Lecture 15 : Sequential Rademacher complexity, Properties
[lec 15]
- Lecture 16 : Sequential Covers and Fat-shattering Dimension
[lec 16]
- Lecture 17 : Online Convex Optimization
[lec 17]
- Lecture 18 : Online Mirror Descent
[lec 18]
- Lecture 19 : Online Mirror Descent Contd.
[lec 19]
- Lecture 20 : Wrapping up MD, General Learning Algorithms
[lec 20]
- Lecture 21 : Relaxations
[lec 21]
- Lecture 22 : Relaxations, deriving algorithms
[lec 22]
- Lecture 23 : Relaxations, deriving algorithms
[lec 23]
- Lecture 23 : Relaxations, deriving randomized algorithms
[lec 24]
- Lecture 23 : Relaxations, deriving randomized
[lec 25]
Reference Material : (more references and specific links will be added as we go)
- Statistical Learning and Sequential Prediction, A. Rakhlin and K. Sridharan [pdf]
- Introduction to Statistical Learning Theory, O. Bousquet, S. Boucheron, and G. Lugosi [pdf]
- Prediction Learning and Games, N. Cesa-Bianchi and G. Lugosi [link]
- Understanding Machine Learning From Theory to Algorithms, S. Ben David and S. Shalev-Shwartz [link]
- Introduction to Online Convex Optimization, Elad Hazan [link]
- Concentration inequalities, S. Boucheron, O. Bousquet, and G. Lugosi [pdf]
- A Gentle Introduction to Concentration Inequalities, K. Sridharan [pdf]
- On the Vapnik-Chervonenkis-Sauer Lemma by Leon Bottou [link]