Problems and Perspectives in Computational Molecular
Biology
Cornell University
Fall 2001
The next presentation
Monday December 10
Mike Stillman and Harry Tsai (Math)
(1) Using graphical models and genomic expression data to statistically
validate models of genetic regulatory networks. abstract (2) Supervised harvesting of expression trees.
abstract
Future presentations
Previous presentations (+ presentation files)
New Submit your Critiques online
or download a Paper Critiques form
Time and Place
Mondays 1:25 pm to 2:15 pm
Comstock Hall B108
1 credit, S/U only.
Prerequisites:� Permission of instructor.
The seminar is required from students of the Computational Molecular
Biology Program.
Instructors
Golan Yona (CS),
Susan McCouch
(PB),
Marty Wells (BSCB)
This course is cross-listed as CS 726 (Computer Science),
PB 726 (Plant Breeding) and BSCB 726 (Biometrics)
Links
Introduction
This is a weekly seminar series discussing timely topics of computational
molecular biology.� The course addresses methodological approaches to
sequence annotation, protein structure and function relationships,
evolutionary relationships across species.� Statistical and deterministic
computational approaches will be covered and specific and detailed
biological examples will be discussed.
Topics of interest will be discussed in relation to papers prepared by
teams of students and/or faculty.� We will pair students from
biology backgrounds with students from math, computer science and
statistics for paper preparation.� Students will summarize the salient
questions addressed by the paper, the research methods used and the
results obtained.� At the end of the presentation, questions should be
listed on an overhead slide to initiate discussion in the group.
Topics covered during the Fall 2001 semester:
- Advanced sequence analysis and gene identification.
- Threading and matching of sequences to structures.� Methods of fold
recognition.
- Comparison of protein shapes.
- Microarray analysis
- Co-evolution and protein-protein interactions�
Suggested Papers
Sequence analysis
Pairwise comparison/database search
- Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J.,
Zhang, Z., Miller, W. & Lipman, D.J. (1997). Gapped BLAST and
PSI-BLAST: a new generation of protein database search programs.
Nucl. Acids Res. 25, 3389-3402. abstract
- Kann M, Qian B, Goldstein RA. (2000). Optimization of a new score
function for the detection of remote homologs. Proteins 41, 498-503.
abstract
- Andreas Prlic, Francisco S. Domingues, and Manfred
J. Sippl. (2000). Structure-derived substitution matrices for
alignment of distantly related sequences. Protein Eng. 13,
545-550. abstract
- Rost B. (1999). Twilight zone of protein sequence alignments.
Protein Eng. 12, 85-94. abstract
- Hein J, Wiuf C, Knudsen B, Moller MB, Wibling G. (2000).
Statistical alignment: computational properties, homology
testing and goodness-of-fit. J Mol Biol. 302, 265-79.
abstract
- Yu L, White JV, Smith TF. (1998). A homology identification method
that combines protein sequence and structure information.
Protein Sci 7, 2499-510. abstract
- Inge Jonassen, Ingvar Eidhammer, Svenn H. Grindhaug, William
R. Taylor. Searching the Protein Structure Databank with Weak
Sequence Patterns and Structural Constraints. (2000).
J. Mol. Biol. 599-619. abstract
- Gouzy, J., Corpet, F. & Kahn, D. (1999). Whole genome protein
domain analysis using a new method for domain
clustering. Comput Chem 23, 333-340. abstract
Multiple alignments, Profiles
- Lee, C., Grasso, C. & Sharlow, M. (2001). Multiple Sequence
Alignment Using Partial Order Graphs. Bioinformatics.
abstract, paper, figures
- Notredame C, Higgins DG, Heringa J. (2000). T-Coffee: A Novel
Method for Fast and Accurate Multiple Sequence Alignment.
J Mol Biol. 302, 205-17. abstract
- Notredame, C., Holm, L. & Higgins, D. G. (1998). COFFEE: an
objective function for multiple sequence alignments.
Bioinformatics 14, 407-422. abstract
- Taylor WR. (1998). Dynamic sequence databank searching with
templates and multiple alignment. J Mol Biol. 280, 375-406.
abstract
Hidden markov models
- Krogh, A., Brown, M., Mian, I. S., Sjolander, K. & Haussler, D.
(1994). Hidden Markov models in computational biology: Application
to protein modeling. J. Mol. Biol. 235, 1501-1531.
abstract
- Karplus, K., Barrett, C. & Hughey, R. (1998).
Hidden markov models for detecting remote protein homologies.
Bioinformatics 14:10, 846-856. abstract
Alternative representations, sequence-function relationships
- Casari, G., Sander, C. & Valencia, A. (1995). A method to
predict functional residues in proteins. Nat. Struct. Biol. 2,
171-178. abstract
- Solis AD, Rackovsky S. (2000). Optimized representations and maximal information in proteins.
Proteins 38, 149-164. abstract
- Hanke, J., Beckmann, G., Bork, P. & Reich,
J. G. (1996). Self-organizing hierarchic networks for pattern
recognition in protein sequence. Protein Sci. 5, 72-82.
abstract
- Agrafiotis, D. K. (1997). A new method for analyzing protein
sequence relationships based on Sammon maps. Protein Sci. 6,
287-293. abstract
- Hannenhalli, S. S. & Russell, R. B. (2000).
Analysis and prediction of functional sub-types from protein
sequence alignments. J Mol Biol. 303, 61-76. abstract
Structure analysis
Structure comparison (Dali, CE, Structal, Geometric hashing)
- Holm, L. & Sander, C. (1997). Dali/FSSP classification
of three-dimensional protein folds. Nucl. Acids Res. 25, 231-234.
abstract
- Shindyalov, I. N. \& Bourne, P. E. (1998). Protein structure
alignment by incremental combinatorial extension (CE) of the
optimal path. Protein Eng. 11, 739-747. abstract
- Levitt, M & Gerstein, M. (1998). A Unified Statistical Framework
for Sequence Comparison and Structure Comparison.
Proc. Natl. Acad. Sci. USA 95, 5913-5920. abstract
- Nussinov R, Wolfson HJ. (1991). Efficient detection of
three-dimensional structural motifs in biological
macromolecules by computer vision techniques.
Proc Natl Acad Sci USA. 88, 10495-10499. abstract
- Godzik A, Skolnick J. (1994). Flexible algorithm for direct
multiple alignment of protein structures and sequences.
Comput Appl Biosci. 10, 587-596.abstract
- Hadley, C. & Jones, D. T. (1999). A systematic comparison of
protein structure classifications: SCOP, CATH and FSSP.
Structure Fold Des. 7, 1099-1112. abstract
- Jongsun Jung and Byungkook Lee. (2000). Protein structure
alignment using environmental profiles. Protein Eng. 13,
535-543. abstract
Automatic detection of domains
-
Jones, S., Stewart, M., Michie, A., Swindells, M. B., Orengo, C.
& Thornton, J. M. (1998). Domain assignment for protein structures
using a consensus approach: characterization and analysis.
Protein Sci. 7, 233-242. abstract
- Taylor, W. R. (1999). Protein structural domain identification.
Protein Eng. 12, 203-216. abstract
- Xu, Y., Xu, D. & Gabow, H. N. (2000). Protein domain decomposition
using a graph-theoretic approach. Bioinformatics 16, 1091-1104.
abstract
- Holm, L. & Sander, C. (1994). Parser for protein folding units.
Proteins 19, 256-268. abstract
- Kael F. Fischer, Susan Marqusee. (2000). A Rapid Test for
Identification of Autonomous Folding Units in Proteins. J Mol
Biol. 302, 701-12. abstract
- Jonassen I, Eidhammer I, Taylor WR. (1999).
Discovery of local packing motifs in protein structures.
Proteins 34, 206-19. abstract
- Sowdhamini R, Blundell TL. (1995). An automatic method involving
cluster analysis of secondary structures for the identification of
domains in proteins. Protein Sci 4, 506-520. abstract
- Turcotte M, Muggleton SH, Sternberg MJ. (2001). Automated discovery
of structural signatures of protein fold and function.
J Mol Biol 306, 591-605. abstract
Fold recognition, Threading, Structure prediction
- Lemer, C. M., Rooman, M. J. & Wodak, S. J. (1995). Protein
structure prediction by threading methods: evaluation of current
techniques. Proteins 23, 337-355. abstract
-
Mirny, L. A. & Shakhnovich, E. I. (1998). Protein structure
prediction by threading. Why it works and why it does not.
J. Mol. Biol. 283, 507-526. abstract
- Rost, B., Schneider, R. & Sander, C. (1997). Protein fold
recognition by prediction-based threading. J. Mol. Biol.
270, 471-480. abstract
- Bryant, S. H. (1996). Evaluation of threading specificity
and accuracy. Proteins 26, 172-185. abstract
- Jones, D. T. (1999). GenTHREADER: an efficient and reliable
protein fold recognition method for genomic sequences.
J. Mol. Biol. 287, 797-815. abstract
- Karplus, K., Barrett, C., Cline, M., Diekhans, M., Grate, L.
& Hughey, R. (1999). Predicting protein structure using only
sequence information. Proteins 37, 121-125. abstract
- Jaroszewski, L., Rychlewski, L., Zhang, B. & Godzik,
A. (1998). Fold prediction by a hierarchy of sequence,
threading, and modeling methods. (1998). Protein Sci 7,
1431-1440. abstract
- Olmea, O., Rost, B. & Valencia, A. (1999). Effective use of
sequence correlation and conservation in fold recognition. J
Mol Biol. 293, 1221-1239. abstract
- Schmidler, S. C., Liu, J. S. & Brutlag, D. L. (2000). Bayesian
segmentation of protein secondary structure. J Comput
Biol. 7, 233-248. abstract
- Bienkowska JR, Yu L, Zarakhovich S, Rogers RG Jr, Smith TF. (2000).
Protein fold recognition by total alignment probability.
Proteins 40, 451-62. abstract
- Rykunov DS, Lobanov MY, Finkelstein AV. (2000). Search for the most
stable folds of protein chains: III. Improvement in fold recognition
by averaging over homologous sequences and 3D structures.
Proteins. 40, 494-501. abstract
- Kunin V, Chan B, Sitbon E, Lithwick G, Pietrokovski S. (2001).
Consistency analysis of similarity between multiple alignments:
prediction of protein function and fold structure from analysis of
local sequence motifs.J Mol Biol 307, 939-949. abstract
Structural/evolutionary profiles
- Bystroff, C. \& Baker, D. (1998). Prediction of local
structure in proteins using a library of sequence-structure
motifs. J. Mol. Biol. 281, 565-577. abstract
- Kasuya, A. & Thornton, J. M. (1999). Three-dimensional
structure analysis of PROSITE patterns. J. Mol. Biol. 286,
1673-1691.abstract
- Kelley, L. A., MacCallum, R. M. & Sternberg, M. J. (2000).
Enhanced genome annotation using structural profiles in the
program 3D-PSSM. J. Mol. Biol. 299, 499-520. abstract
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D,
Yeates TO. (1999). Assigning protein functions by comparative
genome analysis: protein phylogenetic profiles. Proc Natl
Acad Sci USA. 96, 4285-8. abstract
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D.
(1999). Detecting protein function and protein-protein
interactions from genome sequences. Science 285, 751-3.
abstract
- Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D.
(1999).A combined algorithm for genome-wide prediction of protein
function. Nature 402, 83-6. abstract
- Marcotte EM, Xenarios I, van Der Bliek AM, Eisenberg D. (2000).
Localizing proteins in the cell from their phylogenetic profiles.
Proc Natl Acad Sci USA 97, 12115-20. abstract
Gene expression
Introduction to Microarray Technology
- A Concise Guide to cDNA Microarray Analysis.
Hegde P, Qi R, Abernathy R, Gay C, Dharap S, Gaspard R, Earle-Hughes J,
Snesrud E, Lee NH, and Quackenbush J
Biotechniques (2000), 29(3):548-562. abstract
- Expression profiling using cDNA microarrays.
Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM
Science 1999 Jan 1;283(5398):83-7. abstract, paper
Normalization
- Development of a prostate cDNA microarray and statistical gene
expression analysis package.
Carlisle AJ, Prabhu VV, Elkahloun A, Hudson J, Trent JM, Linehan WM,
Williams ED, Emmert-Buck MR, Liotta LA, Munson PJ, Krizman DB
Mol Carcinog 2000 May;28(1):12-22. abstract, paper
- Normalization for cDNA Microarray Data.
Y. H. Yang, S. Dudoit, P. Luu and T. P. Speed.
UC Berkeley Tech Report December 2000. paper
Imaging and normalization
- Ratio-based decisions and the quantitative analysis of cDNA
micro-array images. Chen Y, Dougherty E, Bittner M.
Journal of Biomedical Optics 2:364 (1997). paper
-
Comparison of methods for image analysis on cDNA microarray data.
Y.H. Yang, M. J. Buckley, S. Dudoit and T.P.Speed
UC Berkeley Tech Report November 2000. paper
Replication
-
Importance of replication in microarray gene expression studies:
Statistical methods and evidence from repetitive cDNA hybridizations.
Lee ML, Kuo FC, Whitmore GA, Sklar J.
Proc Natl Acad Sci USA 2000 Aug 29;97(18):9834-9839. abstract, paper
- Development of a prostate cDNA microarray and statistical gene
expression analysis package.
Carlisle AJ, Prabhu VV, Elkahloun A, Hudson J, Trent JM, Linehan WM,
Williams ED, Emmert-Buck MR, Liotta LA, Munson PJ, Krizman DB
Mol Carcinog 2000 May;28(1):12-22. abstract, paper
- Design and Analysis of Gene Expression Microarray Experiments.
Kerr K, Churchill, G. paper1, paper2
Clustering
-
Cluster analysis and display of genome-wide expression patterns.
M.B. Eisen, P.T. Spellman, P.O. Brown, David Botstein
PNAS Vol. 95, Issue 25, 14863-14868, December 8, 1998
abstract, paper
-
The transcriptional program in the response of human fibroblasts to serum.
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt
LM, Hudson J Jr, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO
Science 1999 Jan 1;283(5398):83-7. abstract, paper
-
Gene shaving' as a method for identifying distinct sets of genes with
similar expression patterns.
T Hastie, R Tibshirani, M B Eisen, A Alizadeh, R Levy, L Staudt, W C Chan,
D Botstein, P Brown
Genome Biology 1(2):research0003.1-0003.21. abstract, paper
Singular value decomposition, pca's, classification
-
Singular value decomposition for genome-wide expression data processing and
modeling.
Alter, O., P. Brown, and D. Botstein
PNAS 97 (18), 10101-10106, 2000. abstract, paper
- Flexible discriminant analysis by optimal scoring.
Hastie, T., Tibshirani, R. & Buja, A.
Journal of the American Statistical Association (1994) 89: 1255-1270.
- Distinct types of diffuse large B-cell lymphoma identified by gene
expression profiling.
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC,
Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr,
Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger
DD, Armitage JO, Warnke R, Staudt LM, et al.
Nature 2000 Feb 3;403(6769):503-11. abstract, paper
Large-scale analysis
- Derisi, J. L., Iyer, V. R. & Brown, P. O. (1997). Exploring
the metabolic and genetic control of gene expression on a
genomic scale. Science 278, 680-686. abstract
- Hastie T, Tibshirani R, Botstein D, Brown P. (2001).
Supervised harvesting of expression trees.
Genome Biol. 2. abstract
- Hartemink AJ, Gifford DK, Jaakkola TS, Young RA. (2001).
Using graphical models and genomic expression data to statistically
validate models of genetic regulatory networks.
Pac Symp Biocomput. 422-33. abstract
Co-evolution, Protein-protein interaction
- Thorne, J. L., N. Goldman and D. T. Jones. (1998). Assessing
the impact of secondary structure and solvent accessibility on
protein evolution. Genetics 149, 445-458. abstract
- David D. Pollock, William R. Taylor, Nick Goldman. (1999).
Coevolving Protein Residues: Maximum Likelihood Identification
and Relationship to Structure. Journal of Molecular Biology
287, 187-198. abstract
- Pazos, F., Helmer-Citterich, M., Ausiello, G. \& Valencia,
A. (1997). Correlated mutations contain information about
protein-protein interaction. J. Mol. Biol. 271, 511-523.
abstract
- Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. (2000).
Co-evolution of Proteins with their Interaction Partners.
J Mol Biol. 299, 283-93. abstract
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA. (1999).
Protein interaction maps for complete genomes based on gene fusion
events. Nature 402, 86-90. abstract
- Gallet X, Charloteaux B, Thomas A, Brasseur R. (2000).
A fast method to predict protein interaction sites from sequences.
J Mol Biol. 302, 917-26. abstract
- Schwikowski B, Uetz P, Fields S. (2000). A network of protein-protein interactions
in yeast. Nat Biotechnol. 18, 1257-61. abstract
- Park J, Lappe M, Teichmann SA. (2001). Mapping protein family
interactions: intramolecular and intermolecular protein family
interaction repertoires in the PDB and yeast. J Mol Biol.
307, 929-38. abstract
Books
- Waterman, M. S. (1995). Introduction to computational biology.
Chapman & Hall, London.
- Setubal, J. C. & Meidanis, J. (1996).
Introduction to computational molecular biology.
PWS Publishing Co., Boston.
- Methods in Enzymology, vol 266 (1996). Edited by R. F. Doolittle.
- Durbin, Eddy, Krogh, Mitchison (1998). Biological sequence analysis.
- Baldi, P. & Brunak, S. (1998). Bioinformatics: the machine
learning approach.
- Bioinformatics: Sequence, structure, and databanks.
Edited by D. Higgins and W. Taylor. Oxford University Press.
Journals
Science
Nature
Nature Structural Biology
Cell
Proceedings of the National Academy of Sciences
JMB
Protein Science
Proteins: Structure, Function, and Genetics
Protein Engineering
Nucleic Acids Research
Bioinformatics
Journal of Computational Biology
Trends in Biochemical Sciences
Molecular Microbiology
Web journals
Science's Next Wave
BioMedNet 'webzine'
GenomeBiology
Paper Search and Misc.
Biochemistry and Molecular Biology Journals
IDEAL homepage
PubMed (Medline)
NEC archive
e-Print archive
citation reports (impact factor of scientific journals)
Background reading
For a survey of the classic algorithms for sequence comparison
and the statistics of sequence alignment you can download one
of the following documents
Recommended books and book chapters on
- Sequence alignment.
Books: Waterman (1995), Setubal & Meidanis (1996),
Durbin, Eddy, Krogh, Mitchison (1998).
Book chapters: Pearson (Methods Enzymol 1996),
Yona & Brenner (Bioinformatics 2000).
- multiple sequence alignment and profiles
Books: Waterman (1995), Setubal & Meidanis (1996),
Durbin, Eddy, Krogh, Mitchison (1998).
Book chapters: Gribskov (Methods Enzymol 1996),
Taylor (Methods Enzymol 1996), Duret & Abdeddaim (Bioinformatics 2000).
- Hidden Markov Models
Books: Durbin, Eddy, Krogh, Mitchison (1998), Baldi & Brunak (1998).
Book chapters: Birney (Bioinformatics 2000)