CS 478 Machine Learning
Project Suggestions
Decision Trees
- Implement and compare various methods for dealing with noisy data
in ID3 such as the chi-square statistic, reduced-error pruning,
minimum description length, etc.
- Implement an incremental version of ID3 (called ID5). Experiment
on different data sets or make (and evaluate) some extensions.
- Apply genetic algorithm to the problem of building reliable
decision trees
Bayes Learning Theory (Bayes classifier, Bayesian Networks, HMMs)
- Build a classifier that can distinguish between two general
sources (authors, languages, protein families). Part of the project is
to approximate the class-conditional probabilities cleverly from a
training data.
- Implement protein/DNA sequence alignment by Bayesian inference
- Given a data with conditional dependencies (and listed by partial
order) learn a Bayesian network that describes the data.
- Apply Gibbs algorithm to align groups of related protein sequences
- Implement Hidden Markov Model for protein family recognition
- Implement speech recognition system using Hidden Markov Models
Nonparametric Techniques
- Combine geometric hashing with the nearest neighbor algorithm
- Implement face recognition system using Principal Component
Analysis (PCA) and nearest-neighbor classifier
Neural Networks
- Enhance and experiment with some neural-network learning
algorithms.
- Apply genetic algorithm to the learning phase of a neural network
- Implement face recognition system using neural networks
- Implement hand written symbol recognition system using neural
networks
Stochastic Methods
- Apply genetic algorithm to the problem of building reliable
decision trees
- Apply genetic algorithm to the learning phase of a neural network
- Implement protein/DNA sequence alignment by genetic algorithm
Unsupervised learning
- Experiment with several clustering techniques
- Implement the deterministic annealing approach for hierarchical
clustering.
- Experiment with validation techniques and model selection techniques
to devise a reliable clustering algorithm.
- Implement face recognition system using Kohonen self organizing maps
- Implement a multi-dimensional scaling technique
Evaluation, validation
- Compare the performance of several different learning
(classification) systems (e.g. decision trees, neural networks,
Bayesian methods) on multiple data sets. Analyze the differences in
performance. Can you suggest improvements based on the results.
- Explore the cross-validation technique on several learning systems.
Meta learners
- Implement and test recent "cotraining" methods. Cotraining uses
unsupervised data with supervised data in order to perform
supervised classification with minimum amount of supervision. It
does this by exploiting information in unclassified data.
- Create a hybrid-learner by combining several different classifiers,
and test it's performance compared to the single constituents learners.
- Create a meta-learner that makes a decision based on
an instance as well as on what other classifiers have to say about the
instance.
Computational Learning Theory
- Experiment with various multiple model (voting) methods such as
bagging or boosting applied to different learning methods.
- Apply the PAC model to an interesting concept language.