HMMER: a new generation of homology search software


Database homology searching may be the most important application in computational molecular biology, and since the 1990s, BLAST has been our main workhorse. Since BLAST's introduction, theoretical advances have been made in applying full probabilistic inference to homology searches using hidden Markov model (HMM) approaches.  General adoption of probabilistic methods has been limited by some key problems, including the fact that the popular HMM implementations (including my HMMER software) are computationally demanding.  I will describe HMMER3, a new generation of HMMER that aims to more fully deploy probabilistic inference technology on homology searches, while at the same time attaining BLAST's speed. I will describe HMMER3's statistical inference framework, its probabilistic model of local sequence alignment, new statistical theory for log-likelihood ratio scores summed over all alignments that extends Karlin/Altschul theory for optimal alignment scores, and an implementation that has accelerated HMMER3 100-fold relative to HMMER2.

Sean Eddy is a group leader at the Howard Hughes Medical Institute's Janelia Farm Research Campus near Washington DC. His research interests are in the development of computational algorithms for genome sequence analysis. He is the author of widely used software tools for biological sequence analysis including a software package called HMMER; a coauthor of the Pfam database of protein domains; and a coauthor of the book Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1998). He received a bachelors degree from the California Institute of Technology, a Ph.D. from the University of Colorado at Boulder, and was a postdoctoral fellow at NeXagen Pharmaceuticals and at the MRC Laboratory of Molecular Biology. He was a faculty member in the Department of Genetics at the Washington University School of Medicine for eleven years before moving to Janelia Farm.


Thursday, October 8, 2009

