Advanced Topics in Machine Learning

CS678 - Spring 2002
Cornell University
Department of Computer Science

 
Time and Place
First lecture: January 21st, 2002
Last lecture: May 3rd, 2002
  • Monday, 1:25pm - 2:15pm in Hollister Hall 110
  • Wednesday, 1:25pm - 2:15pm in Hollister Hall 110
  • Friday, 1:25pm - 2:15pm in Hollister Hall 110
Instructors
Lecture Notes, Slides, and Handouts

Lecture notes and slides are handed out in class.

Papers for Student Presentations:

  1. SVM Clustering:
    - B
    . Schölkopf, J. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Technical Report 99-87, Microsoft Research, 1999. To appear in Neural Computation, 2001.
    - Ben-Hur et al., Support Vector Clustering. JMLR, 2, 2001.
    (1-2 students, 20 minutes, April 22/24)
  2. SVM Regression:
    - A. J. Smola and B. Schölkopf. A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK, 1998. To appear in Statistics and Computing, 2001. (pages 1-14 only)
    (1 student, 20 minutes, Feb. 27 or March 1)
  3. Kernel Principal Component Analysis:
    Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller, Kernel Principal Component Analysis, in:  B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods --- Support Vector Learning. MIT Press, Cambridge, MA, 1999. 327 -- 352. Short version or chapter in Support Vector Learning  for background.
    (1 student, 20 minutes, April 12/15/17)
  4. Multi-Class SVMs:
    - John Platt, Large-Margin DAGs for Multi-Class Classification, NIPS 2000.
    (1 student, 20 minutes, Feb. 27 or March 1)
  5. Learning Rankings:
    - William W. Cohen, Robert E. Schapire, Yoram Singer, Learning to order things, Journal of Artificial Intelligence Research, 10, 1999.
    (1 student, 20 minutes, April 1/3).
  6. Boosting/Bagging:
    - Leo Breiman, Arcing Classifiers, Machine Learning, 1998.
    - Eric Bauer, Roni Kohavi, An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants, Machine Learning, 1999.
    (2 students, 30 minutes, Feb. 22/25/27)
  7. Learning to Learn:
    - Sebastian Thrun and Joseph O'Sullivan, Discovering Structure in Multiple Learning Tasks: The TC Algorithm, ICML-96.
    (1 student, 20 minutes, March 6/8)
  8. Clustering:
    - P.S. Bradley, Usama Fayyad, and Cory Reina, Scaling Clustering Algorithms to Large Databases, AAAI-98.
    - P.S. Bradley and Usama Fayyad, Refining Initial Points for K-Means Clustering, ICML-98.
    (1 student, 30 minutes, April 29 or May 1)
  9. Graphical Models:
    -
    (1 student, 20 minutes, March 15/25/27)
  10. ROC and Related Methods:
    -Foster Provost and Tom Fawcett, Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions, KDD-97 (ROC Convex Hull Method)
    -Chris Drummond and Robert C. Holte, Explicitly Representing Expected Cost: An Alternative to ROC Representation, KDD-2000.
    (1 student, 25 minutes, April 19/22/24)
Syllabus
This 4 credit course extends and complements CS478 and CS578, giving in-depth coverage of new and advanced methods in machine learning. In particular, we will connect to open research questions in machine learning, giving starting points for future work.  The content of the course will reflect an equal balance between learning theory and practical machine learning, making an emphasis on approaches with practical relevance. The course will cover the following main topics:
  • Support Vector Machines and Kernel-based Methods: VC theory, optimal hyperplane and maximum-margin separation, soft-margin, SVMs for regression, Mercer kernels, error bounds, leave-one-out bounds,  quadratic programming, connections to related methods (8 lectures)
  • Unsupervised Learning and Clustering: agglomerative clustering, distributional clustering, k-means, Bayesian clustering, principal component analysis, scaling issues for large datasets (6 lectures)
  • Bayes Nets: inference, maximum likelihood estimation, latent variables, expectation/maximization, hidden Markov models, learning structure, causality (5 lectures)
  • Boosting and Bagging: Adaboost, bias/variance, margins (5 lectures)
  • Error Estimation and Model Selection: no free lunch, bias/variance, Bayesian learning, minimum description length, Leave-one-out and cross-validation, holdout testing, bootstrap estimation (3 lectures)
  • Learning to Order Data: learning retrieval functions in information retrieval, learning for ROC analysis  (3 lectures)
  • Inductive Transfer: Learning multiple related tasks (3  lectures)
  • Reinforcement Learning: Markov decision processes, finite state models, Q-learning, dynamic programming (1-2 lecture)

We will illustrate methods and theory with practical examples in the areas of information retrieval, language technology, and medical decision making.

Reference Material
We will provide reading material and hand it out in class. It will cover all material presented in this course. For further reading, we recommended the following books that each cover part of the syllabus:
  • Duda, Hart, Stork, "Pattern Classification"
  • Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition"
  • Shawe-Taylor, Cristianini, "Introduction to Support Vector Machines"
  • Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning"
  • Vapnik, "Statistical Learning Theory"
  • Sutton, Barto, "Reinforcement Learning"
  • Mitchell, "Machine Learning"
Prerequisites
Any of the following:
  • CS478
  • CS578
  • equivalent of any of the above
  • permission from the instructors
Grading
Grades will be determined based on a take-home midterm exam, a final exam, homework assignments, a research project, and student presentations of selected papers.
  • 20%: Homework: (4 homeworks max, some programming, some non-programming)
  • 20%: MidTerm Exam (take home)
  • 20%: Final Exam (in class)
  • 20%: Student Paper Presentations
  • 20%: Final Projects

Roughly: A=90-100; B=80-90; C=70-80; D=60-70; F= below 60