Announcements
Aug 31 The syllabus might change in order to cover more Data Mining
Sept 11 Homework assignment 1 is available.
Sept 14 The IND package can be installed under Windows. See
instructions here.
Sept 17 IND works on Sun. See instructions here.
Oct 5 Homework assignment 2 is available.
Oct 9 The data set for homework 2 is available here.
Oct 29 The midterm text is here.
Sorry for the delay.
Dec 19 The class photo is available. Click
here
Time and Place
- Tuesday, Thursday: 14:55-16:10, Thurston 205
- Project due: Friday, December 7
- Midterm Exam: Due back Thursday, November 1
- Final Exam: 12:00 - 14:00 Thursday, December 13.
Personnel
|
|
Office Hours |
Office |
| Instructor |
Richard Caruana
|
Tuesday 16:30 - 17:30
Wednesday 13:30 - 14:30
|
Upson 4157 |
| Teaching Assistant |
Alexandru Niculescu-Mizil
|
Monday 11:15 - 13-15
|
Upson 4162 |
Go to top
Course Description:
This implementation-oriented course
presents a broad introduction to current algorithms and approaches in
machine learning, knowledge discovery, and data mining and their
application to real-world learning and decision-making tasks. The
course will also cover empirical methods for comparing learning
algorithms, for understanding and explaining their differences, and
for exploring the conditions under which each is most appropriate.
Tentative Course Syllabus
Textbooks:
Machine Learning by Tom Mitchell
Data Mining:Concepts and Techniques by Jiawei Han, Micheline Kamber
Optional references:
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork
Grading policies:
- 20% Midterm (take home)
- 20% Final (open book in class)
- 30% Assignments (individual)
- 30% Final Project (group project comparing various learning methods)
- Bonus points for class participation
Academic integrity policy
Go to top
- Decision trees.
- k-nearest neighbor.
- Missing values and feature
selection.
- Clustering.
- PCA, MDS and Canopies.
- Fractal dimensions.
Go to top
Assignments
ASSIGNMENT 1 : Due
Thursday October 4, 2001
The assignment description
Download IND package
Instructions for installing the IND
package under Windows
Instructions for installing the IND package on
Sun
ASSIGNMENT 2 : Due
Tuesday October 23, 2001
The assignment description
The data set
Midterm : Due
Thursday November 1, 2001
The questions
ASSIGNMENT 3 : Due
Thursday November 15, 2001
The assignment description
The data set
The ROC code
The simplified ROC code
ASSIGNMENT 4 (Optional) : Due
Wednesday December 19, 2001
The assignment description
Protein data in upper diagonal format
Go to top
Final Project
The project description
The datasets.
NOTE: for problem 1 the training set has 2000 cases and the test set has
10000 cases. For problem 2 the training set has 2000 cases and the test set
has 15000 cases.
You are encouraged to work on the project in groups of one to three.
Final report. Due Friday, Dec 7 .
The credit for the project is 30% of the final grade.
Exceptional projects may get extra credit.
Go to top
Go to top