Announcements
05/09/02
Please schedule your final project presentation using this
form
05/05/2003
The final project is due May 12 at 4pm.
The final exam is on May 14 (12:002:30pm, Phillips 219).
Presentations of projects will be scheduled for May 15 and 16
04/25/2003
Solution to assignment4 is out. Due..
01/21/03
Please note that most questions (regarding assignments/projects/general questions about the course)
should be posted to the newsgroup, not sent by email.
If the link above does not work for you, here is how to connect to the newsgroup using Microsoft Outlook Express:
 Go to ToolsAccounts.
 A dialog box will appear. Click on the Add button and select "news".
 Fill in your nickname, email address and use "newsstand.cit.cornell.edu" for the news server.
A folder named "newsstand.cit.cornell.edu" will be created.
 Right click on the folder, select Property. In the "server" tab, check "the server requires me to log on".
Use your netid for the account name, and your Bear Access network password for the password field.
 Click on Toolsnewsgroup to download the list of
newsgroup on the server. Add "cornell.class.cs478" to the list of subscribed
newsgroup.
Additional information on how to access Cornell’s news server using Bear Access and other news application can
be obtained here
Time and Place
 Tuesday, Thursday: 11:4012:55, Thurston 203
Personnel
Course Syllabus (.ps,
.pdf)
Course Information (.doc)
(last modified 1/22)
Academic integrity policy
Checklist (what have we
covered so far):
 Introduction, What is Machine Learning ?
 Nonmetric methods:
 Concept Learning (candidateelimination, inductive bias)
 Decision trees (ID3, C4.5, pruning methods)
 Bayesian Learning:
 Bayesian decision theory
 Sequential inference
 ML and Bayesian parameter estimation
 Hypotheses evaluation using Bayes Theorem
 Bayes optimal classifier
 Gibbs algorithm
 Graphical models
 Bayesian belief networks
 Hidden Markov Models  the evaluation and decoding problems
 Hidden Markov Models  the learning problem
 The EM algorithm
 Nonparametric Techniques:
 Density Estimation
 The nearest neighbor algorithm
 Linear discriminant functions:
 LD functions and decision surfaces
 The perceptron criterion function
 The sumofsquarederror criterion function
 Gradient descent procedures
 Relaxation (errorcorrecting) procedures
 Leastmeansquared (LMS) procedures (also known
as minimumsquarederror MSE procedures)
 Artificial Neural Networks
 Feedforward operation
 Backpropagation algorithm
 Learning curves
 Feature mapping
 Improving performance (practical tips)
 Stochastic methods
 Genetic algorithms
 Genetic programming
 Unsupervised learning
 Mixture densities
 The maximum likelihood estimates
 The iterative EM clustering algorithm
 The kmeans clustering algorithm
 Hierarchical/pairwise clustering
 Principal component analysis
 Multidimensional scaling
 Hypothesis evaluation
 Sample error vs. true error
 Confidence intervals
 Comparing hypotheses
 Comparing learning algorithms (for a specific target function)
 The minimum description length principle
 Algorithmindependent Machine Learning (general principles of ML)
 The no free lunch theorem
 Bias vs. Variance
 Sampling and validation techniques (jackknife, bootstraping)
 Bagging and Boosting
Introduction [mostly Mitchell Ch1] (.ps, .pdf)
Concept Learning [Mitchell Ch2] (.ps
, .pdf)
Decision Trees  part 1 [Mitchell Ch3, Duda/Hart/Stork Ch8]
(.ps
, .pdf)
Decision Trees  part 2 [Mitchell Ch3, Duda/Hart/Stork Ch8]
(.ps
, .pdf)
Bayesian decision theory  part 1 [Duda/Hart/Stork Ch2]
(.ps
, .pdf)
Bayesian decision theory  part 2 [Duda/Hart/Stork Ch2]
(.ps
, .pdf)
Added 03/01/2003
Bayesian decision theory (sequential inference)  part 3
(.ps
, .pdf)
Bayesian learning theory  part 4 [partly from Duda/Hart/Stork Ch3]
(.ps
, .pdf)
Bayesian learning theory  part 5 [mostly Mitchell Ch6]
(.ps
, .pdf)
Bayesian learning theory  part 6 [mostly Mitchell Ch6]
(.ps
, .pdf)
Bayesian networks [mostly Duda/Hart/Stork chapter 2, Mitchell Ch6]
(.ps ,
.pdf)
Hidden Markov Models  part 1 [partly Duda/Hart/Stork chapter 3]
(.ps ,
.pdf)
Hidden Markov Models  part 2 [partly Duda/Hart/Stork chapter 3]
(.ps ,
.pdf)
The EM algorithm
(.ps ,
.pdf)
Nonparametric Techniques [Duda/Hart/Stork Ch4, Mitchell Ch8]
(.ps ,
.pdf)
Linear Discriminant Functions [Duda/Hart/Stork Ch5]
(.ps ,
.pdf)
Artificial Neural Networks I [Duda/Hart/Stork Ch6, Mitchell Ch4]
(.ps ,
.pdf)
Artificial Neural Networks II [Duda/Hart/Stork Ch6, Mitchell Ch4]
(.ps ,
.pdf)
Stochastic methods (genetic algorithms) [Mitchell Ch9, Duda/Hart/Stork Ch7]
(.ps ,
.pdf)
Unsupervised learning I  clustering algorithms [Duda/Hart/Stork Ch10]
(.ps ,
.pdf)
Unsupervised learning II  dimensionality reduction algorithms [Duda/Hart/Stork Ch10]
(.ps ,
.pdf)
Hypothesis evaluation [mostly Mitchell Ch5]
(.ps ,
.pdf)
Algorithmindependent Machine Learning I [Duda/Hart/Stork Ch9]
(.ps ,
.pdf)
Algorithmindependent Machine Learning II [Duda/Hart/Stork Ch9]
(.ps ,
.pdf)
Machine Learning  Overview
(.ps ,
.pdf)
Assignments
Note: you may work on the assignment with (one) another student, but
you have to submit the assignment separately, using your own
words. Acknowledge the other student with whom you worked on the
assignment.

Assignment #1 (ps,pdf)
Solution of assignment 1: part A (problems 1,4)
word
and part B (problems 2,3,5,6,7) ps,
pdf

Assignment #2 (ps,pdf)
Solution of assignment 2 ps,
pdf

Assignment #3
due March 14 at 4pm

Assignment #4 (ps,pdf)
Due April 1st.
Sample input file. The pattern is of length 10. The output format should be:
sequence 1 position 30 THEPATTERN
sequence 2 position 12 THEPATTERN
...
sequence n position 23 THEPATTERN
likelihood ratio: 57.2
New tests for the Gibbs sampling algorithm: report your results on these two
files test1 (L=10),
test2 (L=18)
04/25/2003
Solution of assignment 4  part1 ps,
pdf, and part2 code

Assignment #5
due April 15 at 11am
 Assignment #6 (ps,
pdf) due April 23
sample data, output format
The MDL principle (ps,pdf)
Final Project
You are encouraged to work on the project in couples. Please register
as soon as you know who is your partner.
 Project ideas: Some project ideas are listed here. Check also
projects from previous years (2001,
2002). Original ideas for projects
are most welcome. Graduate students are welcome to suggest a
project which is related to their research topic.
All projects are practical ("experimental") and involve designing and
implementing a learning system.
 Project proposal: one or two paragraphs
specifying the problem you are focusing on, the learning system(s)
that you are going to apply,
any modifications/improvements that you are considering to implement,
and the means by which you are going to evaluate your learner (using a
benchmark or a validation technique, etc).
The goal of the proposal is to
make sure that you chose a feasible project, and that you address the
important issues. Project proposals are due March 28.
 Final project: Due Early May (date TBA).
Information on what should be in the final report is available
here