Advanced Topics in Machine
Learning
CS678 - Spring 2003 Cornell University Department of
Computer Science |
|
Time and Place |
|
First lecture: January 21st, 2003 Last
lecture: May 1st, 2003
- Tuesday, 1:25pm - 2:40pm in Hollister Hall 314
- Thursday, 1:25pm - 2:40pm in
Hollister Hall 314
Exam: April 1st, 2003 (in class) |
Instructor |
|
Thorsten
Joachims, tj@cs.cornell.edu,
4153 Upson Hall, office hour Wednesday 3:30-4:15 |
Syllabus |
|
This 4 credit course extends and complements
CS478 and CS578. The goal of the course is two-fold. The first part of the
course is an in-depth introduction to advanced learning algorithms in the
area of Kernel Machines, in particular Support Vector Machines and other
margin-based learning methods like Boosting. It also includes an
introduction to the relevant aspects of machine learning theory, enabling
you to understand the current work in the field. This will provide the
basis for the second part of the course, which will discuss current
research topics in machine learning, providing starting points for novel
research. In particular, the course will cover the
following main topics:
Part 1:
- Support Vector Machines and Related Methods: Perceptron,
optimal hyperplane and maximum-margin
separation, soft-margin, SVMs for regression, Gaussian Processes,
Boosting, regularized regression methods
- Learning with Kernels: properties, real-valued
feature vectors, sequences and other structured data, Fisher kernels
- Statistical Learning Theory: no free
lunch, VC theory, PAC-Bayesian, bias/variance, error
bounds, leave-one-out bounds
- Error Estimation and Model Selection:
leave-one-out and cross-validation, holdout testing, bootstrap
estimation
Part 2:
- Transductive Learning: How can one use
unlabeled data to improve performance in supervised learning? What is
the information contained in unlabeled data? What assumptions do we
need to make? How can we design efficient algorithms?
- Learning Complex Structures: What if the
target function is more complex than in classification or regression?
For example, the goal could be not a binary classification function,
but an ordering (i.e. retrieval) function for information retrieval.
Or, what if the input to the learning is not a classification, but
merely pair-wise preferences like "A is preferred over B"?
- Learning Kernels: The kernel defines the
inductive bias of the learning algorithm and is key to achieving good
performance. This makes selecting a kernel one of the most crucial
design decisions. How can we automate the selection process? In
particular, how can one construct a good kernel from data? What are
the situations where this might work? What are the assumptions?
Methods and theory
will be illustrated with practical examples, in particular from the areas of
information retrieval and language technology. |
Lecture Notes, Slides, and
Handouts |
|
Lecture notes and slides are also handed out in class:
|
Homework Assignments |
|
Homework 1: Perceptron and Optimal
Hyperplanes
Homework 2:
Training SVMs and Leave-One-Out
Homework 3:
Kernels and Statistical Learning Theory
|
Reference Material |
|
We will provide reading material and hand it
out in class. It will cover all material presented in this course. For
further reading, we recommended the following books that each cover part
of the syllabus:
- Mitchell, "Machine Learning"
- Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of
Pattern Recognition"
- Shawe-Taylor, Cristianini, "Introduction to Support
Vector Machines"
- Schoelkopf, Smola, "Learning with Kernels"
- Herbrich, "Learning Kernel Classifiers"
- Hastie, Tibshirani, Friedman, "The Elements of
Statistical Learning"
- Duda, Hart, Stork, "Pattern Classification"
- Vapnik, "Statistical Learning Theory"
|
Readings |
|
We will read some of the following papers in
the second half of the course:
Learning Rankings:
- William W. Cohen, Robert E. Schapire, Yoram Singer, Learning
to order things, Journal of Artificial Intelligence Research, 10,
1999. (Steven, 4/15)
- Y. Freund, R. Iyer, R. Schapire, and Y. Singer, An
efficient boosting algorithm for combining preferences, ICML,
1998. (Scott, 4/17)
- T. Joachims, Optimzing
Search Engines using Clickthrough Data, KDD Conference, 2002.
(Thorsten, 4/10)
- R. Herbrich, T. Graepel, and K. Obermayer. Large
Margin Rank Boundaries for Ordinal Regression. Advances in
Large Margin Classifiers , pages 115-132, 2000. (Thorsten, 4/8)
- R. Caruana, S. Baluja, and T. Mitchell, Using
the Future to `Sort Out' the Present: Rankprop and Multitask Learning
for Medical Risk Evaluation, NIPS, 1995. (Rich, 4/10)
Transductive Learning / Learning from Labeled and Unlabeled Data:
- K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text
Classification from Labeled and Unlabeled Documents using EM. Machine
Learning, 39(2/3). pp. 103-134. 2000. (Mark, 4/22)
- T. Joachims, Transductive
Inference for Text Classification using Support Vector Machines.
ICML, 1999. (Thorsten, 4/17)
- A. Blum and T. Mitchell. Combining
Labeled and Unlabeled Data with Co-Training, COLT, 1998. (Andy,
4/24)
- O. Chapelle, J. Weston and B. Sch�lkopf,
Cluster
Kernels for Semi-Supervised Learning. NIPS, 2003. (Phil, 4/22)
- M. Szummer and T. Jaakkola, Partially
labeled classification with Markov random walks, NIPS, 2001. (Filip,
4/22)
- A. Blum, S. Chawla, Learning
from Labeled and Unlabeled Data using Graph Mincuts. ICML, 2001.
(Alan, 4/24)
Learning to Learn / Learning Kernels:
- R. Caruana, Multitask
Learning. Machine Learning 28(1): 41-75, 1997. (Rich, 4/29)
- Sebastian Thrun and Joseph O'Sullivan, Discovering
Structure in Multiple Learning Tasks: The TC Algorithm, ICML,
1996. (Stefan, 4/29)
- N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor, On
Kernel Target Alignment, JMLR. (Thorsten, 5/1)
- T. Jaakkola and D. Haussler. Exploiting
generative models in discriminative classifiers, NIPS, 1998.
(Joshua, 5/1)
Other topics:
- B. Sch�lkopf, J. Platt,
J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating
the support of a high-dimensional distribution. Technical Report
99-87, Microsoft Research, 1999. To appear in Neural Computation,
2001.
and
Ben-Hur et al., Support
Vector Clustering. JMLR, 2, 2001.
- A. J. Smola and B. Sch�lkopf. A
tutorial on support vector regression. NeuroCOLT Technical Report
NC-TR-98-030, Royal Holloway College, University of London, UK, 1998.
To appear in Statistics and Computing, 2001. (pages 1-14
only) (Jingbo, 4/24)
- B. Sch�lkopf, A. Smola, K. M�ller, Kernel Principal Component
Analysis, in: B. Scholkopf, C. Burges, and A. Smola, editors,
Advances in Kernel Methods --- Support Vector Learning. MIT
Press, Cambridge, MA, 1999. 327 -- 352. Short
version or chapter in Support
Vector Learning for background. (Liviu, 4/17)
- John Platt, Large-Margin
DAGs for Multi-Class Classification, NIPS 2000. (Alex, 4/15)
|
Prerequisites |
|
Any of the following:
- CS478
- CS578
- equivalent of any of the above
- permission from the instructors
|
Grading |
|
Grades will be determined based on a written
exam after part 1, a final research project, homework assignments, and student presentations of selected papers.
- 25%: Exam after Part 1
- 30%: Final Project
- 25%: Homework: (3 homeworks max, some programming, some
non-programming)
- 10%: Student Paper Presentation
- 10%: Class Participation
Roughly: A=90-100; B=80-90; C=70-80; D=60-70; F= below
60 |