CS 478 Machine Learning
Project Suggestions


    Decision Trees

  1. Implement and compare various methods for dealing with noisy data in ID3 such as the chi-square statistic, reduced-error pruning, minimum description length, etc.
  2. Implement an incremental version of ID3 (called ID5). Experiment on different data sets or make (and evaluate) some extensions.
  3. Apply genetic algorithm to the problem of building reliable decision trees

    Bayes Learning Theory (Bayes classifier, Bayesian Networks, HMMs)

  4. Build a classifier that can distinguish between two general sources (authors, languages, protein families). Part of the project is to approximate the class-conditional probabilities cleverly from a training data.
  5. Implement protein/DNA sequence alignment by Bayesian inference
  6. Given a data with conditional dependencies (and listed by partial order) learn a Bayesian network that describes the data.
  7. Apply Gibbs algorithm to align groups of related protein sequences
  8. Implement Hidden Markov Model for protein family recognition
  9. Implement speech recognition system using Hidden Markov Models

    Nonparametric Techniques

  10. Combine geometric hashing with the nearest neighbor algorithm
  11. Implement face recognition system using Principal Component Analysis (PCA) and nearest-neighbor classifier

    Neural Networks

  12. Enhance and experiment with some neural-network learning algorithms.
  13. Apply genetic algorithm to the learning phase of a neural network
  14. Implement face recognition system using neural networks
  15. Implement hand written symbol recognition system using neural networks

    Stochastic Methods

  16. Apply genetic algorithm to the problem of building reliable decision trees
  17. Apply genetic algorithm to the learning phase of a neural network
  18. Implement protein/DNA sequence alignment by genetic algorithm

    Unsupervised learning

  19. Experiment with several clustering techniques
  20. Implement the deterministic annealing approach for hierarchical clustering.
  21. Experiment with validation techniques and model selection techniques to devise a reliable clustering algorithm.
  22. Implement face recognition system using Kohonen self organizing maps
  23. Implement a multi-dimensional scaling technique

    Evaluation, validation

  24. Compare the performance of several different learning (classification) systems (e.g. decision trees, neural networks, Bayesian methods) on multiple data sets. Analyze the differences in performance. Can you suggest improvements based on the results.
  25. Explore the cross-validation technique on several learning systems.

    Meta learners

  26. Implement and test recent "cotraining" methods. Cotraining uses unsupervised data with supervised data in order to perform supervised classification with minimum amount of supervision. It does this by exploiting information in unclassified data.
  27. Create a hybrid-learner by combining several different classifiers, and test it's performance compared to the single constituents learners.
  28. Create a meta-learner that makes a decision based on an instance as well as on what other classifiers have to say about the instance.

    Computational Learning Theory

  29. Experiment with various multiple model (voting) methods such as bagging or boosting applied to different learning methods.
  30. Apply the PAC model to an interesting concept language.