Computer Science 628: Biological Sequence Analysis
Fall, 2004



Alignment of biological sequences (DNA, AA, RNA)  features prominently in modern biological and computational biology research. In this course we will study in detail the statistical and algorithmic challenges that one faces in designing tools for alignment. For example, how do we find an optimal local alignment between a query sequence and a genomic database and how do we know whether or not such an alignment is statistically significant? Following Durbin et al.'s textbook we will start with presenting sequence and multiple sequence alignment algorithms in the context of probabilistic models (extensions of HMM, covariance models). We will also go over Karlin and Altschul and others' work on the statistical analysis of alignments. Building on these "classical" results we will address more current topics such as seed design for the seeded alignment paradigm and alignment questions that came up in recent whole genome comparisons such as the rat-human-mouse one.

Instructor: Uri Keich
Lectures:  Tuesdays & Thursdays 2:55-4:10
Location:  Theory Center 484

Prerequisites: Nothing is set in stone but some familiarity with algorithms, statistics, and probability would make the course easier to digest.

Grade: Your grade will be based on your submitted homework assignments (that would include some programming) as well as on the final exam.


UPDATE  9/3/04: You can now download the relevant slides and papers. Note that you can now only access these links from a Cornell address.

UPDATE 9/8/04: HW assignment #1 is now posted.

UPDATE 9/29/04: HW assignment #2 is now posted.

UPDATE 11/14/04: HW assignment #3 was updated. The file you need is a 0-1 matlab vector.

UPDATE 11/25/04: HW assignment #4 is now posted, slides updated.


Download pages:  slides   papers