CS 789 THEORY SEMINAR [home]

Speaker:  Brona Brejova and Tomas Vinar
Affiliation: School of Computer Science, University of Waterloo
Date: Monday, October 21, 2002
Title:
Several Topics in Gene Finding

Abstract:

Large-scale genome sequencing projects provide huge amounts of DNA sequence data, but meaningful use of this data in biology requires annotation of the sequences, i.e., discovery of sequence segments with certain biological functions. Prediction of genes is an important step in the annotation process. While there are many gene finding tools available, recent experimental studies show that the best of them predicts only about 50% of the genes entirely correctly.

Our goal is to develop gene finding software that can use multiple sources of available information (e.g., protein databases, DNA sequences of related organisms, complex signal predictors) in addition to the DNA sequence itself. While working on this project, we encountered several interesting problems which have not been previously solved in a satisfactory manner.

In this talk, we discuss several of these problems, including length distribution modeling in hidden Markov models, modeling of dependencies between positions in splicing signals, and design of optimal seeds for finding homologous coding regions.

This is joint work with Dan Brown and Ming Li.